Java.next #4: Immutability

This is Part Four of a series of articles on Java.next. In Part Four, I will begin to explore how the Java.next languages (JRuby, Groovy, Clojure, and Scala) deal with concurrency. Concurrency is a big topic, so I will subdivide it, narrowing my focus in this part to how the Java.next languages support immutability.

Why concurrency?

Over the last decade, most programmers have written code with little concern for concurrency. Often this has caused problems later, when programs needed to be used in a more concurrent setting. This is only going to get worse: in the future, everything is concurrent.

Why don't programmers do a better job with concurrency? Because it is hard. Read the excellent Java Concurrency in Practice (JCIP), and you will come away impressed with all the clever work that has gone into making Java a good environment for writing concurrent applications. But may also come away thinking "Who is smart enough to get this kind of code right in production?"

Not many people can, and the experts agree that we need an entirely different paradigm for writing concurrent applications. It may even be the case that supporting concurrency is the number one priority for Java.next.

Why immutability?

One of the most difficult things about writing concurrent programs is deciding how to protect mutable shared state. Java provides a locking primitive, synchronized, but locking primitives are difficult to use. You have to worry about

data corruption (if you lock too little)
deadlock (if you lock too much)
performance degradation (even if you get it right)

To make matters worse, it is generally impractical to write automated tests that explore the various possible interleavings of events.

Immutability solves all these problems of locking, by removing the need for ever locking at all. Immutable state can always be shared safely. Nobody can ever see a corrupt value by definition, as values never change.

All of the Java.next languages support immutability to some degree, as I will show below.

Immutability in JRuby

JCIP uses a ThreeStooges class as an extremely simple example for creating immutable objects in Java. Continuing the theme of that example, here is JRuby's immutable stooges:

  require 'set'

  class ThreeStooges
    def initialize(*names)
      @stooges = Set.new(names)
      @stooges.freeze
      self.freeze
    end
    def stooge?(name)
      @stooges.member?(name)
    end
  end

In Ruby, you can set immutability on a per-instance basis, as opposed to a per-class basis. A particular object can become immutable by calling freeze. If the internal components of that object are not primitive, they will need to freeze as well. So, in the constructor above, I freeze the two things at risk of changing: stooges and self.

While immutability is possible in Ruby, be warned. Heavy use of freeze to enable safe concurrency is far from idiomatic.

Immutability in Groovy

In Groovy, the ThreeStooges class looks and works like Java, but prettier:

  class ThreeStooges { 
    private final Set<String> stooges;
    ThreeStooges(String... args) {
      stooges = args as HashSet<String>;
    }
    boolean isStooge(String name) {
      return stooges.contains(name); 
    }
  }

The important details here are that the internal stooges collection is

final (cannot be changed after the object is created)
private (cannot be accessed outside the object itself)

I could also wrap stooges with unmodifiableSet, but since the ThreeStooges class already wraps stooges without exposing a modifying method, the second wrap is redundant.

Immutability in Scala

In JRuby and Groovy, immutability is possible. In Scala, immutability is preferred. The default collection classes are immutable, so ThreeStooges is just

  class ThreeStooges(names: String*) {
    private val stooges = Set() ++ names
    def isStooge(name: String) = stooges(name)
  }

A few observations here:

In Scala, fields must be declared val (immutable) or var (changeable). Scala style encourages you to use val where possible.
Scala's Set is immutable.

I find that Scala's inline constructor syntax and braces-free method definition syntax make the Scala definition easier to read than the JRuby or Groovy versions.

Immutability in Clojure

In Clojure, immutability is the default. In fact, everything is immutable, with only two exceptions:

calling down into Java APIs (the Clojure version of assembly language)
a few data structures which are specifically designed to support Software Transactional Memory (more on this in a later post)

So, the three-stooges can just be a set:

  (defn three-stooges [& names] (set names))
  (defn is-stooge? [name stooges] (stooges name))

A few observations:

[& names] is Clojure's varargs syntax.
A set can be placed in function position (first item in a list). So (stooges name) can be read as "find name in the set stooges"

Putting immutables to work

To see how immutables can simplify an implementation, consider the Factorizer servlet example from JCIP. The objectives of this example are to

calculate the prime factorization of a number
cache the most recent factorization
use the cached answer where possible, and track the frequency of cache hits
track the total number of factorizations

Here is a Java version of the factorizer using synchronized blocks, and stripped down to bare essentials for clarity:

  import java.math.BigInteger;

  public class CachedFactorizer {
      private BigInteger ;
      private BigInteger[] lastFactors;
      private long hits;
      private long cacheHits;

      public void factors(BigInteger i) {
          BigInteger[] factors = null;
          synchronized (this) {
              ++hits;
              if (i.equals(lastNumber)) {
                  ++cacheHits;
                  factors = lastFactors.clone();
              }
          }
          if (factors == null) {
              factors = factor(i);
              synchronized (this) {
                  lastNumber = i;
                  lastFactors = factors.clone();
              }
          }
          return factors;
      }
  }

In order for the CachedFactorizer to be correct and efficient, you need to think carefully about where the synchronized blocks should go. In the example above, there are two synchronized blocks. The first block checks the cache, and updates the counters. It protects:

read and write of hits
read and write of cacheHits
read of lastFactors

The second block updates the cache. It protects:

write of lastNumber
write of lastFactors

To make sure you understand the approach, ask yourself the following questions:

Could the first synchronized block be split into multiple, more granular locks? Would this likely be faster or slower?
What about the second block?
What operations are unprotected by locks? Is this safe?
Would a simple, single-lock approach be correct? Under what circumstances would it perform well?

That's a lot to think about. Now, let's consider the same example, but using immutable data structures.

A Clojure cached-factor

Here is one Clojure approach to cached-factor:

  (def #^AtomicLong hits (AtomicLong. 0))
  (def #^AtomicReference cache (AtomicReference. {:n nil :factors nil}))
  (def #^AtomicLong cache-hits (AtomicLong. 0))

  (defn cached-factor [n]
   (.incrementAndGet hits)
   (let [cached (.get cache)]
     (if (= n (cached :n))
       (do (.incrementAndGet cache-hits)
         (cached :factors))
       (let [factors (factor n)]
         (.set cache {:n n :factors factors})
         factors))))

There are several things to notice here:

The hits, cache, and cache-hits take advantage of Java 5's atomic wrapper classes. (The #^ClassName syntax adds type information.)
There are no synchronized blocks anywhere.
Even though there are no synchronized blocks, you still have to think about the semantics of concurrency. The incrementAndGet method is used to update the two hit counters, and the cache is pulled into a local variable to avoid inconsistent reads.

The real key to this approach, however, is storing a Clojure map in an AtomicReference. Because Clojure data structures are immutable, they can benefit from an AtomicReference in a way that mutable classes cannot.

The immutable approach looks only a little simpler in this small example. But the benefit of using immutable data increases when you compose objects together. If you wanted to compose the synchronized version with another object, you would have have to dig back into the internals of both objects, study their locks, and pick a new lock strategy to cover the two objects together. Composing operations with immutable objects is much easier.

A Scala CachedFactor

Since Scala's default collections are immutable, a Scala approach can closely parallel the Clojure code above:

  class CachedFactorizer {
    case class Cache(val n: Int, val factors: List[Int])
    val hits = new AtomicLong()
    val cache = new AtomicReference[Cache](Cache(2, calculateFactors(2)))
    val cacheHits = new AtomicLong()

    def factor(n: Int) = {
      hits.incrementAndGet
      val cached = cache.get
      if (cached.n == n) {
        cacheHits.incrementAndGet
        cached.factors
      } else {
        val factors = calculateFactors(n)
        cache.set(Cache(n, factors))
        factors
      }
    }

While this is similar to the Clojure approach, the difference is instructive. While the Clojure version stores the cache as a simple Map, the Scala version introduces a strongly-typed Cache class for the cache of values and their factors. The differences here are intended to idiomatic. Either approach could work in either language.

What about the JRuby and Groovy examples?

Could you write the example above in Groovy, JRuby, or even Java? Yes, but it would be non-idiomatic, even ugly. I am not going to show the JRuby and Groovy versions here, because those languages do not offer any concurrency-specific advantages over Java. Scala and Clojure, on the other hand, don't just make immutable objects possible. They make them easy and idiomatic.

Languages are designed to support certain priorities, inevitably at the expense of others. By making immutability a preferred option (Scala) or the standard (Clojure), these languages are encouraging a different paradigm for concurrent applications.

Languages are not about what they make possible, but about what they make beautiful. Clojure and Scala aim to make concurrent programs beautiful, and their preference for immutability is but the tip of the iceberg. In the next installment of this series, I will explore how Clojure and Scala enable concurrent programs through actors, agents, and software transactional memory.

Notes

This article is based on the talk Java.next #4: Concurrency . Check the schedule for a talk near you.
Thanks to Greg Vaughn and Glenn Vanderburg for their feedback on a draft of this article.
Thanks to Rich Hickey for suggesting several Clojure versions of the factorizer example.
Feedback on how to improve these examples is most welcome!