Picture of stu

Java.next #4: Immutability

This is Part Four of a series of articles on Java.next. In Part Four, I will begin to explore how the Java.next languages (JRuby, Groovy, Clojure, and Scala) deal with concurrency. Concurrency is a big topic, so I will subdivide it, narrowing my focus in this part to how the Java.next languages support immutability.

Why concurrency?

Over the last decade, most programmers have written code with little concern for concurrency. Often this has caused problems later, when programs needed to be used in a more concurrent setting. This is only going to get worse: in the future, everything is concurrent.

Why don't programmers do a better job with concurrency? Because it is hard. Read the excellent Java Concurrency in Practice (JCIP), and you will come away impressed with all the clever work that has gone into making Java a good environment for writing concurrent applications. But may also come away thinking "Who is smart enough to get this kind of code right in production?"

Not many people can, and the experts agree that we need an entirely different paradigm for writing concurrent applications. It may even be the case that supporting concurrency is the number one priority for Java.next.

Why immutability?

One of the most difficult things about writing concurrent programs is deciding how to protect mutable shared state. Java provides a locking primitive, synchronized, but locking primitives are difficult to use. You have to worry about

  • data corruption (if you lock too little)
  • deadlock (if you lock too much)
  • performance degradation (even if you get it right)

To make matters worse, it is generally impractical to write automated tests that explore the various possible interleavings of events.

Immutability solves all these problems of locking, by removing the need for ever locking at all. Immutable state can always be shared safely. Nobody can ever see a corrupt value by definition, as values never change.

All of the Java.next languages support immutability to some degree, as I will show below.

Immutability in JRuby

JCIP uses a ThreeStooges class as an extremely simple example for creating immutable objects in Java. Continuing the theme of that example, here is JRuby's immutable stooges:

require 'set'

class ThreeStooges
  def initialize(*names)
    @stooges = Set.new(names)
    @stooges.freeze
    self.freeze
  end
  def stooge?(name)
    @stooges.member?(name)
  end
end

In Ruby, you can set immutability on a per-instance basis, as opposed to a per-class basis. A particular object can become immutable by calling freeze. If the internal components of that object are not primitive, they will need to freeze as well. So, in the constructor above, I freeze the two things at risk of changing: stooges and self.

While immutability is possible in Ruby, be warned. Heavy use of freeze to enable safe concurrency is far from idiomatic.

Immutability in Groovy

In Groovy, the ThreeStooges class looks and works like Java, but prettier:

class ThreeStooges { 
  private final Set<String> stooges;
  ThreeStooges(String... args) {
    stooges = args as HashSet<String>;
  }
  boolean isStooge(String name) {
    return stooges.contains(name); 
  }
}

The important details here are that the internal stooges collection is

  • final (cannot be changed after the object is created)
  • private (cannot be accessed outside the object itself)

I could also wrap stooges with unmodifiableSet, but since the ThreeStooges class already wraps stooges without exposing a modifying method, the second wrap is redundant.

Immutability in Scala

In JRuby and Groovy, immutability is possible. In Scala, immutability is preferred. The default collection classes are immutable, so ThreeStooges is just

class ThreeStooges(names: String*) {
  private val stooges = Set() ++ names
  def isStooge(name: String) = stooges(name)
}

A few observations here:

  • In Scala, fields must be declared val (immutable) or var (changeable). Scala style encourages you to use val where possible.
  • Scala's Set is immutable.

I find that Scala's inline constructor syntax and braces-free method definition syntax make the Scala definition easier to read than the JRuby or Groovy versions.

Immutability in Clojure

In Clojure, immutability is the default. In fact, everything is immutable, with only two exceptions:

  • calling down into Java APIs (the Clojure version of assembly language)
  • a few data structures which are specifically designed to support Software Transactional Memory (more on this in a later post)

So, the three-stooges can just be a set:

(defn three-stooges [& names] (set names))
(defn is-stooge? [name stooges] (stooges name))

A few observations:

  • [& names] is Clojure's varargs syntax.
  • A set can be placed in function position (first item in a list). So (stooges name) can be read as "find name in the set stooges"

Putting immutables to work

To see how immutables can simplify an implementation, consider the Factorizer servlet example from JCIP. The objectives of this example are to

  • calculate the prime factorization of a number
  • cache the most recent factorization
  • use the cached answer where possible, and track the frequency of cache hits
  • track the total number of factorizations

Here is a Java version of the factorizer using synchronized blocks, and stripped down to bare essentials for clarity:

import java.math.BigInteger;

public class CachedFactorizer {
    private BigInteger ;
    private BigInteger[] lastFactors;
    private long hits;
    private long cacheHits;

    public void factors(BigInteger i) {
        BigInteger[] factors = null;
        synchronized (this) {
            ++hits;
            if (i.equals(lastNumber)) {
                ++cacheHits;
                factors = lastFactors.clone();
            }
        }
        if (factors == null) {
            factors = factor(i);
            synchronized (this) {
                lastNumber = i;
                lastFactors = factors.clone();
            }
        }
        return factors;
    }
}

In order for the CachedFactorizer to be correct and efficient, you need to think carefully about where the synchronized blocks should go. In the example above, there are two synchronized blocks. The first block checks the cache, and updates the counters. It protects:

  • read and write of hits
  • read and write of cacheHits
  • read of lastFactors

The second block updates the cache. It protects:

  • write of lastNumber
  • write of lastFactors

To make sure you understand the approach, ask yourself the following questions:

  • Could the first synchronized block be split into multiple, more granular locks? Would this likely be faster or slower?
  • What about the second block?
  • What operations are unprotected by locks? Is this safe?
  • Would a simple, single-lock approach be correct? Under what circumstances would it perform well?

That's a lot to think about. Now, let's consider the same example, but using immutable data structures.

A Clojure cached-factor

Here is one Clojure approach to cached-factor:

(def #^AtomicLong hits (AtomicLong. 0))
(def #^AtomicReference cache (AtomicReference. {:n nil :factors nil}))
(def #^AtomicLong cache-hits (AtomicLong. 0))

(defn cached-factor [n]
 (.incrementAndGet hits)
 (let [cached (.get cache)]
   (if (= n (cached :n))
     (do (.incrementAndGet cache-hits)
       (cached :factors))
     (let [factors (factor n)]
       (.set cache {:n n :factors factors})
       factors))))

There are several things to notice here:

  • The hits, cache, and cache-hits take advantage of Java 5's atomic wrapper classes. (The #^ClassName syntax adds type information.)
  • There are no synchronized blocks anywhere.
  • Even though there are no synchronized blocks, you still have to think about the semantics of concurrency. The incrementAndGet method is used to update the two hit counters, and the cache is pulled into a local variable to avoid inconsistent reads.

The real key to this approach, however, is storing a Clojure map in an AtomicReference. Because Clojure data structures are immutable, they can benefit from an AtomicReference in a way that mutable classes cannot.

The immutable approach looks only a little simpler in this small example. But the benefit of using immutable data increases when you compose objects together. If you wanted to compose the synchronized version with another object, you would have have to dig back into the internals of both objects, study their locks, and pick a new lock strategy to cover the two objects together. Composing operations with immutable objects is much easier.

A Scala CachedFactor

Since Scala's default collections are immutable, a Scala approach can closely parallel the Clojure code above:

class CachedFactorizer {
  case class Cache(val n: Int, val factors: List[Int])
  val hits = new AtomicLong()
  val cache = new AtomicReference[Cache](Cache(2, calculateFactors(2)))
  val cacheHits = new AtomicLong()

  def factor(n: Int) = {
    hits.incrementAndGet
    val cached = cache.get
    if (cached.n == n) {
      cacheHits.incrementAndGet
      cached.factors
    } else {
      val factors = calculateFactors(n)
      cache.set(Cache(n, factors))
      factors
    }
  }

While this is similar to the Clojure approach, the difference is instructive. While the Clojure version stores the cache as a simple Map, the Scala version introduces a strongly-typed Cache class for the cache of values and their factors. The differences here are intended to idiomatic. Either approach could work in either language.

What about the JRuby and Groovy examples?

Could you write the example above in Groovy, JRuby, or even Java? Yes, but it would be non-idiomatic, even ugly. I am not going to show the JRuby and Groovy versions here, because those languages do not offer any concurrency-specific advantages over Java. Scala and Clojure, on the other hand, don't just make immutable objects possible. They make them easy and idiomatic.

Languages are designed to support certain priorities, inevitably at the expense of others. By making immutability a preferred option (Scala) or the standard (Clojure), these languages are encouraging a different paradigm for concurrent applications.

Languages are not about what they make possible, but about what they make beautiful. Clojure and Scala aim to make concurrent programs beautiful, and their preference for immutability is but the tip of the iceberg. In the next installment of this series, I will explore how Clojure and Scala enable concurrent programs through actors, agents, and software transactional memory.

Notes

  • This article is based on the talk Java.next #4: Concurrency . Check the schedule for a talk near you.
  • Thanks to Greg Vaughn and Glenn Vanderburg for their feedback on a draft of this article.
  • Thanks to Rich Hickey for suggesting several Clojure versions of the factorizer example.
  • Feedback on how to improve these examples is most welcome!
Comments
  1. David R. MacIverSeptember 11, 2008 @ 07:58 AM

    I promised I would smack the next person who touted Scala for concurrency( http://article.gmane.org/gmane.comp.lang.scala/13300/ ), so consider yourself smacked. :-)

    Scala may prefer immutability, but unfortunately a lot of its immutable structures contain instances of surprise synchronization. Trying to use them in a concurrent setting will lead to unexpected and irritating surprises.

  2. AnonSeptember 11, 2008 @ 03:33 PM

    Why don’t you introduce Clojure’s software transactional memory instead of Atomic*?

    Your cached-factors example could be rewritten like:

    (def hits (ref 0)) (def cache (ref {:n nil :factors nil})) (def cache-hits (ref 0))

    (defn cached-factor [n] (dosync (ref-set hits (inc @hits)) (if (= n (:n @cache)) (do (ref-set cache-hits (inc @cache-hits)) (:factors @cache)) (let [factors (factor n)] (alter cache assoc :n n :factors factors) factors))))

  3. Daniel SpiewakSeptember 11, 2008 @ 05:02 PM

    @David

    Not all of Scala’s immutable data structures are implemented improperly. Your IntMap (for one) has no mutable state whatsoever. My port of Rich Hickey’s PersistentVector also has this property.

    As a general rule though, I have to agree that Scala’s poor implementation of complex immutable data structures has led to an unreliability in concurrent scenarios.

    @Stuart

    I’m a little surprised that you didn’t discuss Scala’s actor framework. Continuation passing is a mind-bogglingly powerful way to abstract over concurrent algorithms.

    Immutability is key to good concurrency, but it is just not sufficient to make things all that much easier. To really turn the tide, thicker abstractions are needed. Clojure has this with its transactional model and isolated mutable refs, and Scala’s actor library is quite impressive, but JRuby and Groovy seem to be a bit behind in this field (at least to my knowledge).

  4. jsSeptember 11, 2008 @ 05:55 PM

    Why wouldn’t the java code be nearly identical to the scala code if you used the atomicLongs and AtomicReference in the java code?

  5. Brian GoetzSeptember 11, 2008 @ 06:26 PM

    I was going to leave a comment, but the comment window on this page (under IE7) is two lines by twenty characters…are you trying to tell us something?

  6. Sean CierSeptember 11, 2008 @ 06:57 PM

    Making immutability more natural and explicit is a very worthwhile aim, no argument. Still, fine-grained immutable state is nothing new—lipstick on a pig, if you want to use the sound bite of the moment. I’ll be interested to hear what you have to say about STM (which is definitely the approach I’m most excited about, but is also the longest out and may not even be practical) and shared-nothing messaging (for which all the hype is about Erlang, but Scala’s actors seem to be the Erlang of the JVM world). Language support for data-parallel operations is also a hot topic, especially as GPUs and GPU-like vectorized processors become inreasingly mainstream and important in general-purpose programming.

    BTW, your Java CachedFactorizer example has two major typos (not counting “factor()” vs. “factors()”—assuming that’s straight from JCIP, what on God’s green earth were they thinking?)

    And you really, really need to fix your Comment text box ;-)

  7. Sean CierSeptember 11, 2008 @ 07:07 PM

    Oh, to clarify—wasn’t referring to immutability as a pig, I was referring to the state of high-level concurrency support in current popular languages. Although that state does seem to be pretty nearly immutable, based on the last 30+ years of progress… yep, gotta love threads and locks.

    (Also meant to mention, with regard to data-parallel operations—perhaps Fortress will have a place in Java.next too, eh?)

  8. Muness AlrubaieSeptember 11, 2008 @ 07:53 PM

    Comment box better now? ;) Sorry about that everyone.

  9. Stuart HallowaySeptember 11, 2008 @ 08:14 PM

    @David: Are the problems you describe tactical (poorly implemented individual structures) or strategic (language flaws).

    @Anon: I did in fact write the transactional version. See the ” organizing multiple dosyncs and ref-set vs. commute” thread in the Clojure google group.

    @Daniel: Actors, agents, and STM coming in the next installment(s). Clearly I am failing to create big enough “Coming Soon” markers in these posts. :-)

    @js: Yes, if you created an immutable wrapper class. But it is far from idiomatic to write immutable Java. Idiomatic usage often matters more than what is strictly possible.

    @Brian: Sorry. We have shared state (stylesheets) across multiple web apps, including this blog. Changes got out of sync somehow. Hope you will appreciate the irony.

    @Sean: I hope that Clojure’s STM turns out to be practical now.

    @Muness: Thanks.

  10. Daniel SpiewakSeptember 12, 2008 @ 01:34 AM

    I hesitate to speak for David (especially since I haven’t closely examined Scala’s standard collection implementations as closely) but I think I can answer your question…

    Scala’s immutable collections are generally implemeted using mutable versions under the surface and then locking to ensure thread-safety. This certainly isn’t the case for every collection (the RedBlack tree implementations are correct), but it does hold for important ones like HashMap. This design is definately a product of poor structure implementation rather than language design. David has created a number of purely-immutable data structures for inclusion in v2.7.2; and I have also experimented somewhat in this field, porting Clojure’s vector to Scala. It is certainly possible to write complex persistent (thus, efficient) immutable data structures in Scala, it’s just very hard. :-) Unfortunately, this is pretty much true for any language.

  11. Howard LovattSeptember 12, 2008 @ 03:06 AM

    My pet project is a pattern enforcing compiler (PEC):

    https://pec.dev.java.net/nonav/compile/index.html

    One of the pre-supplied patterns is Immutable. There is also a discussion about converting from mutable to immutable and vice versa:

    http://pec.dev.java.net/nonav/compile/javadoc/pec/compile/immutable/ImmutableValueConversions.html

  12. Paul KingSeptember 12, 2008 @ 11:57 AM

    There is no doubting that Clojure has some nice features for Concurrency. Just on the comments about Groovy. Groovy does give you the .asImmutable() operator which lets you do a similar solution to the Clojure map-based one fairly easily:

    def hits = new AtomicLong(0) def cache = new AtomicReference<map>([:].asImmutable()) def cacheHits = new AtomicLong(0)

    def factor = { n -> hits.incrementAndGet() def cached = cache.get() if (cached[n]) cacheHits.incrementAndGet() else { cached = [(n):factorize(n)] cache.set(cached.asImmutable()) } cached[n] }

    And it would be relatively easy to write an AST Macro that would let you use @Immutable as an annotation that would give you a very nice solution similar to the Scala one. It would look something like this:

    @Immutable class Cache { BigInteger n BigInteger[] factors }

    def hits = new AtomicLong(0) def cache = new AtomicReference<cache>(new Cache(0, [])) def cacheHits = new AtomicLong(0)

    def factor = { n -> hits.incrementAndGet() def cached = cache.get() if (cached.n == n) cacheHits.incrementAndGet() else { cached = new Cache(n, factorize(n)) cache.set(cached) } cached.factors }

  13. James IrySeptember 12, 2008 @ 04:40 PM

    Stuart,

    You ask if Scala’s issues with immutable structures are tactical or strategic. That’s a deeper question than you might realize, with lots of angles. From a language perspective you can say it’s a tactical issue. Scala makes defining immutable structures almost as easy (or hard) as in say Haskell (just more verbose).

    But there’s a strategic issue as well, one that transcends languages. For many data structures there is a limit to performance of immutable versions. Rich Hickey has a terrific implementation of an immutable vector based on Patricia tries. It’s pretty damn fast to fetch an element from it. Daniel Spiewak’s is also pretty damn fast for fetching. But they’re both still a heck of a lot slower than just wrapping a Java array in an immutable interface and fetching from there. If you do that, immutable persistent updates can be made to the latest version by keeping a record of changed fields in older versions and directly modifying the current array inside. Haskell’s DiffArray is based on that. The result is something that is externally immutable but defined internally based on mutation and thus not thread-safe without synchronization.

    I agree with David MacIver that Scala’s standard immutable structures should be internally immutable. For most use cases that’s the right way to go and lack of thread safety or silent synchronization on an immutable structure is surprising without big red warning lights. But I think there’s room for another portion of the library with internally mutated structures that give both some of the performance of imperative structures and many of the reasoning benefits of immutable structures modulo thread safety.

  14. KorepetycjeSeptember 13, 2008 @ 09:39 PM

    Thanks for sharing! I have to try jruby in my next project…

  15. Tom DaviesSeptember 14, 2008 @ 03:50 AM

    I’d like to see CAL/Open Quark (http://openquark.org) on your list of Java.next languages.

    Yes, it has effectively been orphaned by Business Objects, as far as I can tell, but it is a very good implementation of a lazy, pure and statically typed functional language for the JVM. Think ‘Haskell minus some syntactic sugar’.

    And on Safari your comments box is resizable.

  16. Daniel SpiewakSeptember 15, 2008 @ 04:25 AM

    With regards to performance, I’ve run some benchmarks against Rich’s persistent vector implementation (ported from Java to Scala) running against Scala’s ArrayBuffer, which is a fair approximation of Array performance: http://www.codecommit.com/blog/misc/vector_performance.txt (uploaded to avoid comment re-formatting).

  17. Paul KingSeptember 20, 2008 @ 05:27 AM

    There is now a built-in @Immutable annotation in Groovy (should be availabel in 1.6-beta2) which provides a nice compact way to create a class equivalent to a Scala class with only val entries.

  18. antonSeptember 25, 2008 @ 08:16 PM

    thank you for the series!

    i had some relevant posts around the same time on my blog, regarding immutability and concurrency (and actors!): http://blog.splitbody.com/2008/9/12/concurrency-part1 and http://blog.splitbody.com/2008/9/19/concurrency-part-2-actors

  19. vsevolodOctober 01, 2008 @ 09:15 AM

    a little correction: in the java factorizer example you left out lastNumber in private BigInteger ;