Picture of stu

PCL -> Clojure, Chapter 17

  • Posted By Stuart Halloway on September 25, 2008
  • Tags

This article is part of a series describing a port of the samples from Practical Common Lisp (PCL) to Clojure. You will probably want to read the intro first.

This article covers Chapter 17, Object Reorientation: Classes.

Creating structs

Common Lisp defines classed with defclass. In Clojure, I can define structs with defstruct:

(defstruct bank-account :customer-name :balance)

The bank-account struct has two basis keys: customer-name and balance. I can specify values for these keys, in the order they were declared, using struct:

user=> (struct bank-account "John Doe" 1000)
{:customer-name "John Doe", :balance 1000}

With struct, all the basis keys are optional:

user=> (struct bank-account)
{:customer-name nil, :balance nil}

If you prefer named parameters, you can use struct-map instead of struct:

user=> (struct-map bank-account :balance 10)
{:customer-name nil, :balance 10}

Very important: structs are still maps. I can specify additional keys that are not part of the basis:

user=> (struct-map bank-account  :balance 10
                                   :customer-name "Jane Doe"
                                   :status :gold)
{:customer-name "Jane Doe", :balance 10, :status :gold}

Accessing structs

The examples below assume an example-account:

(def example-account (struct bank-account "Example Customer" 1000))

Pedants call get to access a structure value:

user=> (get example-account :customer-name)
"Example Customer"

But that's way too much effort. Structures are functions of their keys:

user=> (example-account :customer-name)
"Example Customer"

If the struct keys are symbols, I can go the other way. Symbols are functions of structs:

user=> (:customer-name example-account)
"Example Customer"

Other than symbols, what else can be a structure key? Ah, sweet immutability. Since Clojure data structures are immutable, any of them can function as keys.

I can use assoc and dissoc to get a new map with a key added or removed:

user=> (assoc example-account :status :elite)
{:customer-name "Example Customer", :balance 1000, :status :elite}

user=> (dissoc {:a 1 :b 2} :a)
{:b 2}

But I can't dissoc from example-account because you can never remove a basis key:

user=> (dissoc example-account :customer-name)
java.lang.Exception: Can't remove struct key

Defaults and validation

Since structs are also maps, default values are easy: just merge them. The example below doesn't even use a struct. (Often duck typing is good enough.)

(def account-defaults {:balance 0})
(defn create-account [options]
  (merge account-defaults options))

If I want to validate fields, I can just write a validation function. Here is a validation that simply requires non-false values:

(defn validate-account [account]
  (or (every? account [:customer-name :balance])
      (throw (IllegalArgumentException. "Not a valid account"))))

Of course, if I wanted to create tons of different structs with similar validations, I could build some helpers. Macros + metadata would be one way to go.

Wrapping up

Clojure's structs fill some of the same roles as Common Lisp's classes. The exmaples above show how to create and access structs, and how to add default values and validation.

That said, Clojure's structs are not classes. They do not offer inheritance, polymorphism, etc. In Clojure, those kinds of jobs are handled by the incredibly flexible defmulti (see the previous article for details, especially the references at the end).

Notes

Picture of stu

PCL -> Clojure, Chapter 16

  • Posted By Stuart Halloway on September 25, 2008
  • Tags

This article is part of a series describing a port of the samples from Practical Common Lisp (PCL) to Clojure. You will probably want to read the intro first.

This article covers Chapter 16, Object Reorientation: Generic Functions.

defmulti

In Common Lisp, a generic function defines an abstract operation and a parameter list. In Clojure, a multimethod takes a similar role:

(defmulti draw :shape)

The multimethod's name is multi, and :shape is a dispatch function used to select the actual concrete implementation. (Remember that keywords like :shape are also lookup functions.) Now, I can create one or more methods:

(defmethod draw :square [shape] "TBD: draw a sqaure")
(defmethod draw :circle [shape] "TBD: draw a circle")

The first method will draw things with a :shape of :square, and the second method will draw things with a :shape of :circle:

user=> (draw {:shape :square, :length 10})
"TBD: draw a square"
user=> (draw {:shape :circle, :radius 8})
"TBD: draw a circle"  

The draw multimethod is emulating single inheritance, if you think of an object's :shape value as its type. But the multimethod mechanism is more general.

A more complete example

Let's say that I need to implement account withdrawals. Different kinds of accounts will have different rules:

  • Bank accounts are simple accounts. Withdrawals will work if there is enough money available.
  • Checking accounts attach an overdraft account which can be used to cover large withdrawals.

The multimethod for withdraw could look like this:

(defmulti withdraw :account-type)

The bank account implementation will do a simple withdraw.

(defmethod withdraw :bank [account amount]
  (raw-withdraw account amount))

PCL uses Common Lisp's method combination to share implementation code between the different account types. Clojure's dispatch is much more general, so a general method combination mechanism is not appropriate. I am taking a different approach, pulling the shared code into a helper function raw-withdraw:

(defn raw-withdraw [account amount]
  (when (< (:balance account) amount)
    (throw (IllegalArgumentException. "Account overdrawn")))
  (assoc account :balance (- (:balance account) amount)))

The withdrawal differs from the original PCL implementation in one other way. The original code mutated the account. Since mutation is a no-no, I am instead returning a new account object, associng in the changed balance. In the example below, I am using a let just to show that the original account is unchanged.

(let [original-state {:account-type :bank :balance 100}
      updated-state (withdraw original-state 50)]
  (println original-state updated-state)) 

{:balance 100, :account-type :bank} {:balance 50, :account-type :bank}

The checking account is a little more complex. First, I have to shuttle money in from the overdraft account (if necessary), then raw-withdraw as before:

(defmethod withdraw :checking [account amount]
  (let [over-account (account :overdraft-account)
    over-amount (- amount (:balance account))
    withdrawal-account 
    (if (> overdraft 0)
      (merge account
         {:overdraft-account (withdraw over-account over-amount)
          :balance amount})
      account)]
    (raw-withdraw withdrawal-account amount)))

Again, all the objects are immutable. The merge function returns a new account object (possibly with an overdraft), and the raw-withdraw returns another object:

(let [overdraft {:account-type :checking, :balance 1000}
      original-state {:account-type :checking
              :balance 100
              :overdraft-account overdraft}
      updated-state (withdraw original-state 500)]
  (println original-state)
  (println updated-state))

{:overdraft-account {:balance 1000, :account-type :checking}, 
 :balance 100, 
 :account-type :checking}
{:overdraft-account {:balance 600, :account-type :checking}, 
 :balance 0, 
 :account-type :checking}

Dispatching on more than one parameter

In languages like Java, methods are polymorphic on their first (implicit) parameter. Because multimethods dispatch on arbitrary functions, they can be polymorphic on all of their parameters.

For example, a music library might implement a beat method that is polymorphic on both the drum and the stick:

(defmulti beat (fn [d s] [(:drum d)(:stick s)]))
(defmethod beat [:snare-drum :brush] [drum stick] "snare drum and brush")
(defmethod beat [:snare-drum :soft-mallet] [drum stick] "snare drum and soft mallet")

The first beat method matches only snare drum + brush, etc.:

user=> (beat {:drum :snare-drum} {:stick :brush})
"snare drum and brush"
user=> (beat {:drum :snare-drum} {:stick :soft-mallet})
"snare drum and soft mallet"

If no methods match the dispatch value, Clojure throws an exception:

user=> (beat {:drum :bongo} {:stick :none})
java.lang.IllegalArgumentException: No method for dispatch value
... stack trace elided ...

Or, you can define a :default that will match if no other dispatch value matches:

(defmethod beat :default [drum stick] "default value, if you want one")

user=> (beat {:drum :bongo} {:stick :none})
  "default value, if you want one"

Wrapping up

The PCL chapter demonstrates dispatch based on one or more arguments to a function, and those examples are duplicated above. There are many other things you might do with defmulti, but since they are not covered in PCL I will declare them out of scope here, and point you to some other reading:

  • Clojure objects have metadata, so you could dispatch based on metadata values instead of data values. See mac's post on the mailing list for an example.
  • Dispatch can be based on the state of an object, rather than on some kind of type tag. This lets you treat a rectangle with equal width and height as a square, even if it was created as a rectangle. See my article on dispatch in the Java.next series for an example.
  • Clojure's defmulti allows you to create multiple taxonomies dynamically, and trivially dispatch based on isa relationships in a taxonomy. See Rich's mailing list post introducing this feature.

Notes

Picture of stu

Java.next Overview

As we reach the middle of our second decade of Java experience, the community has learned a lot about software development. Many of our best ideas on how to use a Java Virtual Machine (JVM) are now being baked into more advanced languages for the JVM. These languages tend to provide two significant advantages:

  • They reduce the amount of ceremony in your code, allowing you to focus on the essence of the problem you are solving.
  • They enable some degree of functional programming style. Think of it as a dash of verb-oriented programming to spice up your noun-oriented programming.

I have picked four "Java.next" languages to demonstrate these concepts: Clojure, Groovy, JRuby, and Scala. I have written a series of articles and conference talks describing how these languages can make teams more productive.

This page is the top-level table of contents for Java.next, and I will update the links below as new articles and talks become available.

Articles on Java.next

Conference talks on Java.next

Seeing a talk

If you are interested in hearing me speak on Java.next, check the event schedule, or contact Relevance (info@thinkrelevance.com) to schedule an event near you.

Picture of stu

PCL -> Clojure, Chapter 9

  • Posted By Stuart Halloway on September 24, 2008
  • Tags

This article is part of a series describing a port of the samples from Practical Common Lisp (PCL) to Clojure. You will probably want to read the intro first.

This article covers Chapter 9, Practical: Building a Unit Test Framework.

Tests and reports

To build a minimal testing library, I need nothing more than tests and results. To keep reporting as simple as possible, I will start with console output. The report-result function tests a result, and prints pass or FAIL, plus a form with supporting detail:

(defn report-result [result form]
  (println (format "%s: %s" (if result "pass" "FAIL") (pr-str form))))

Now any function can be a test. The detail message can often be the same form that caused the error, so I will pass the same form twice: once for evaluation, and again (quoted!) for use in the detail message:

(defn test-+ []
  (report-result (= (+ 1 2) 3) '(= (+ 1 2) 3))
  (report-result (= (+ 1 2 3) 6) '(= (+ 1 2 3) 6))
  (report-result (= (+ -1 -3) -4) '(= (+ -1 -3) -4)))

The console output for test-+ looks like this:

user=> (test-+)
pass: (= (+ 1 2) 3)
FAIL: (= (+ 1 2 3) 7)
pass: (= (+ -1 -3) -4)

Inferring the detail message

The fact that I want to pass the same form twice, but with different evaluation semantics, just screams macro. Sure enough, I can clean up the code with a macro:

(defmacro check [form]
  `(report-result ~form '~form))

The macro expands the form twice, once for evaluation and once quoted for the detail message. Now I can replace calls to report-result with simpler calls to check:

(defn test-* []
  (check (= (* 1 2) 3))
  (check (= (* 1 2 3) 6))
  (check (= (* -1 -3) -4)))

Hmm. The calls to check are cleaner than the calls to report-result in the earlier example, but the check itself still looks repetitive. Solution: a better check macro that can handle multiple forms:

(defmacro check [& forms]
  `(do
     ~@(map (fn [f] `(report-result ~f '~f))  forms)))

The quoting and unquoting is a little more complex--play around with macroexpand-1 to see how it works.

With the better check in place, test functions are quite simple:

(defn test-rem []
  (check (= (rem 10 3) 1)
     (= (rem 6 2) 0)
     (= (rem 7 4) 3)))

Aggregating results

So far I have tests and console output. Next, I need some way to aggregate a set of checks into a single, top-level "checks passed" or "checks failed".

I would like to simply and together all the individual checks, but that does not quite work. As in many languages, Clojure's and short-circuits and stops evaluating when it encounters a logical false. That's no good here: Even if one test fails, I still want all the tests to run.

Since it is a question of optional evaluation, a macro is appropriate. The combine-results macro works like and, but it always evaluates all the forms:

(defmacro combine-results [& forms]
  `(every? identity (list ~@forms)))

Now check can use combine-results instead of do.

(defmacro check [& forms]
  `(combine-results
    ~@(map (fn [f] `(report-result ~f '~f)) forms)))

All existing functionality still works, and now I can see a useful return value from a test.

user=> (test-*)
pass: (= (* 2 4) 8)
pass: (= (* 3 3) 9)
true

Capturing test names

Tests ought to have names. In fact, tests ought to support multiple names. You can imagine a test detail report saying:

Check math->addition->associative passed: ...

Where associative is the name of a check, addition is the name of a function, and math is the name of another function that called addition.

First, I need a variable to store a sequence of names:

(def *test-name* [])

Printing the variable as part of a result is easy:

(defn report-result [result form]
  (println (format "%s: %s %s" 
           (if result "pass" "fail") 
           (pr-str *test-name*) 
           (pr-str form)))
  result)

Now for the hard part: populating the collection of names. For this, I will introduce a deftest macro:

(defmacro deftest [name & forms]
  `(defn ~name []
     (binding [*test-name* (conj *test-name* (str '~name))]
       ~@forms)))

The macro expansion perfomed by deftest is nothing new: deftest turns around and defns a new function named name. The interesting part is the call to binding, which rebinds *test-name* to a new collection built from the old *test-name* plus the name of the current test.

The new binding of *test-name* is visible anywhere inside the dynamic scope of the binding form. The dynamic scope includes any function calls made inside the binding, and their function calls, and so on ad infinitum ... or until another binding performs the same trick again. This gives exactly the semantics we want:

  • The dynamic scope allows callers to influence callees without having to pass test-name an an argument all over the place. Nested functions "remember" a stack of their caller's names through *test-name*.
  • The unwinding of the dynamic scope protects readers of *test-name* outside a binding. Code after the binding will never see the values *test-name* takes during the binding.
  • Dynamic bindings are thread-local (and therefore thread-safe).

With deftest in place, I can defined a hierarchy of nested tests:

(deftest test-*
  (check (= (* 2 4) 8)
     (= (* 3 3) 9)))

(deftest test-math
  ; TODO: test rest of math
  (test-*))

(deftest test-all-of-nature
  ; TODO: test rest of nature
  (test-math))

Calling test-all-of-nature will demonstrate multiple levels of nested name in a test report:

user=> (test-all-of-nature)
pass: ["test-all-of-nature" "test-math" "test-*"] (= (* 2 4) 8)
pass: ["test-all-of-nature" "test-math" "test-*"] (= (* 3 3) 9)
true                         

From here, better formatting of the console message is just mopping up.

Wrapping up

When I first read Practical Common Lisp, this was my favorite chapter. The testing library evolves quickly and naturally to a substantial feature set. (In case you didn't keep count, the entire "framework" is less than twenty lines of code.)

Try implementing the unit-testing example in your language of choice. Don't just implement the finished design. Work through each of the iterations described above:

  1. tests and results
  2. inferring the detail message
  3. aggregating results
  4. capturing test names

I would love to hear about your results, and I will link to them here.

Notes

Picture of stu

PCL -> Clojure, Chapter 8

  • Posted By Stuart Halloway on September 23, 2008
  • Tags

This article is part of a series describing a port of the samples from Practical Common Lisp (PCL) to Clojure. You will probably want to read the intro first.

This article covers Chapter 8, Macros: Defining Your Own.

Rolling your own

Lisp macros gain their power by controlling argument evaluation. In a normal Lisp function all arguments are evaluated when calling a function. Consider this call to function foo:

(foo a b)

Arguments a and b are evaluated, and then passed to function foo. If foo were a macro, however, all bets would be off. Then foo's arguments might be evaluated in bizarre orders, or not at all.

This may seem a little crazy until you consider a simple if:

(if monday (wake-up) (sleep))

if cannot possibly be a normal Lisp function. If it were, you would always evaluate both wake-up and sleep, regardless of the value of monday.

As the if example suggests, control flow is an obvious use case for macros. PCL demonstrates custom macros by defining a new control flow macro named do-primes.

Preparing for do-primes

In order to implement do-primes, I will need a primeness test. For clarity, I will divide this into two functions. First, a simple helper to detect factors.

(defn divides? [candidate-divisor dividend]
  (zero? (rem dividend candidate-divisor)))

Now I can tell when one number divides another:

user=> (divides? 7 42)
true
user=> (divides? 11 42)
false

A prime is simply a number with no divisors greater than one. I am a busy guy, so I won't check all the natural numbers, only those from two up to the square root of the number being tested. Here is a simple primeness test:

; yes, I know there are faster ways.  
(defn prime? [num]
  (when (> num 1)
    (every? (fn [x] (not (divides? x num)))
        (range 2 (inc (int (Math/sqrt num)))))))

Sequences of primes

My eventual objective is to call do-primes like this:

(do-primes i 100 200 
  (print (format "%d " i)))

where i is the loop variable and runs the primes from 100 to 200. Because Clojure has nice support for infinite sequences, I find it easier to begin by thinking in terms of the pure math. So, here is a function that returns the sequence of primes starting from a number:

(defn primes-from [number]
  (filter prime (iterate inc number)))

(iterate inc number) returns an infinite sequence starting with number and then incrementing by one for each subsequent element. The filter then whittles this down to numbers that are prime.

This sequence is infinite, so don't try to view it from the console. Take your primes a few at the time:

user=> (take 5 (primes-from 1000))
(1009 1013 1019 1021 1031)

Now I need a simple helper that begins with primes-from, but cuts off the sequence at a chosen end:

(defn primes-in-range [start end]
  (for [x (primes-from start) :while (<= x end)] x))

The for is a list comprehension. It takes all the (primes-from start), but only while those numbers are still less than or equal to end.

do-primes

Now I am finally ready to write the macro do-primes:

(defmacro do-primes [var start end & body]
  `(doseq ~var (primes-in-range ~start ~end) ~@body))

Macros work in two steps: expansion followed by normal Lisp evaluation. The expansion phase is like a template substitution, but with the full power of Lisp at your disposal.

In the definition of do-primes above, the syntax-quote (`) identifies the static part of the template:

  • For symbols, syntax-quote resolves the name to a fully qualified symbol (with some exceptions we don't need to worry about in this example).
  • For lists, syntax-quote will recursively syntax-quote the contained forms.

The unquote (~) and splicing-unquote (~@) provide the dynamic part of the template by exempting their forms from syntax quoting rules.

Your reaction at this point should be "That's a lot of ugly punctuation." Fear not, macroexpand-1 will ease the pain. macroexpand-1 will show you how Clojure expands the macro, without executing the expanded result. This gives you a chance to experiment with the rules for quoting and unquoting. Here is an example:

user=> (macroexpand-1 '(do-primes i 1 10 (print i)))
(clojure/doseq i (pcl.chap_08/primes-in-range 1 10) (print i))

Looking back at the definition of do-primes, here is what happened:

  • doseq expanded to the fully-qualified clojure/doseq. (I haven't covered namespaces yet, but the clojure namespace contains most of the Clojure core.)
  • i, 1, and 10 are direct expansions from the macro call.
  • primes-in-range is one of the helper functions I wrote ealier. In the sample repository, I have placed this in the pcl/chap_08 namespace, hence the expansion.
  • body contains a list of things I want to do with my primes, specifically ((print i)). That is almost what I need, except a few too many parens. The "splice" part of splicing unquote gets rid of the extra parens, splicing the list into the template. This is exactly what I need to match the doseq signature.

Now I can do-primes:

user=> (do-primes i 100 150 
  (print (format "%d " i)))
101 103 107 109 113 127 131 137 139 149

Wrapping up

The easiest way to write a macro is to work backwards. Write the form that you want the macro to expand into, and then test interactively with macroexpand-1 until you have a macro that expends correctly.

Macros are hard, and I have skipped some of the building blocks here. Check out the chapter in PCL.

Notes

The sample code is available at http://github.com/stuarthalloway/practical-cl-clojure.

Picture of stu

PCL -> Clojure, Chapter 11

  • Posted By Stuart Halloway on September 22, 2008
  • Tags

This article is part of a series describing a port of the samples from Practical Common Lisp (PCL) to Clojure. You will probably want to read the intro first.

This article covers Chapter 11, Collections.

Sequence basics

PCL describes a group of basic collection functions: count, find, position, remove, and substitute. Clojure supports count for a variety of list-like types:

user=> (count (quote (1 2 3)))
3
user=> (count [1 2 3])
3
user=> (count #{1 2 3})
3
user=> (count "characters")
10                          

These types, and any others than implement a basic first/rest protocol, are called sequences in Clojure. A sequence is logically a list, but may be implemented using other data structures.

In addition to generic sequence functions, some sequences have specific functions unique to their underlying data structure. Clojure defines find for maps to return the matching key/value pair:

user=> (find {:lname "Doe", :fname "John"} :fname)                   
[:fname "John"]

Or, you could just place the map itself in function position, and get back the matching value for a key:

user=> ({:lname "Doe", :fname "John"} :fname)
"John"

The Clojure core does not define find for other collection types. But the implementation is a one-liner using some. For example, to ask if a collection contains the number 2:

user=> (some #(= % 2) [1 2 3])
true

Clojure-contrib wraps the some idiom into a function named includes?.

The rest of the "basic" functions have similar stories: The Clojure core tends to support them directly where they are efficient (constant time) operations. Where they would take longer (e.g. linear time), the operations can be written as one-liners atop higher-order functions.

Higher-order functions

CL includes higher order versions of the basic functions described above. These higher-order versions take an additional parameter, which is a function that acts as a filter. Here are some examples.

First, a collection of days for the examples to work against:

; for re-split
(use 'clojure.contrib.str-utils)
(def days (re-split #" " "Sun Mon Tues Wed Thurs Fri Sat"))

Now I can find the weekdays that start with "S":

user=> (filter #(.startsWith % "S") days)
("Sun" "Sat")

Or simply count the days that start with "S":

user=> (count (filter #(.startsWith % "S") days)) 
2

In an immutable world, remove is the opposite of find. I can get a collection with all "S" days removed by reversing the previous filter with complement:

user=> (filter (complement #(.startsWith % "S") days))
("Sun" "Sat")

To replace all "S" days with "Weekend!" I can use map:

user=> (map #(if (.startsWith % "S") "Weekend!" %) days)
("Weekend!" "Mon" "Tues" "Wed" "Thurs" "Fri" "Weekend!")

Sorting

Sorting is easy:

user=> (sort days)
("Fri" "Mon" "Sat" "Sun" "Thurs" "Tues" "Wed")

Sorting by criteria is also easy:

user=> (sort-by #(.length %) days)
("Sun" "Mon" "Wed" "Fri" "Sat" "Tues" "Thurs")

Combining sequences

The concat function concatenates sequences.

user=> (concat [1 2 3] [4 5 6])
(1 2 3 4 5 6)

Note that the resulting sequence is lazy. So, concat can return without walking each input sequence. In other words, the (take 5 ...) below does not have to wait (forever!) for all the powers of 2 to be generated:

user=> (take 5 (concat (quote (1/4 1/2)) powers-of-2))
(1/4 1/2 1 2 4)

What if one of the sequences passed to concat blows up instead of returning a sequence?

user=> (take 2 (concat '(1 2 3) (throw (Error. "Not a sequence"))))
java.lang.Error: Not a sequence

Here concat fails because its second argument is not a sequence. As it happens, I have an even lazier option than concat. The lazy-cat function does not even look at each argument until it is forced to do so:

user=> (take 2 (lazy-cat '(1 2 3) (throw (Error. "Not a sequence"))))
(1 2)

Lazy sequences have many uses, but take some getting used to. One mistake to avoid is trying to inspect a lazy infinite sequence from the REPL. The REPL tries to print the entire sequence, which will take forever (literally). Hence the (take 2 ...) wrappers above.

Subsequences

It is often interesting to take subsequences from the beginning, middle, or end of a collection. Clojure supports this in a general way with take and drop. You have already seen take, which returns the first part of a collection:

user=> (take 2 days)
("Sun" "Mon")                                      

For the end of a collection, I can use drop:

user=> (drop 2 days)
("Tues" "Wed" "Thurs" "Fri" "Sat")

For the middle of a collection, I can use take and drop together:

user=> (take 5 (drop 1 days))
("Mon" "Tues" "Wed" "Thurs" "Fri")

The take-nth function takes only every nth item of a collection. To demonstrate take-nth, I will begin by defining a lazy collection of the natural-numbers:

(def natural-numbers (iterate inc 1))

The call to iterate produces a collection that starts with 1 and generates subsequent members by calling inc. You can verify that these are the natural numbers by taking a few of them.

user=> (take 10 natural-numbers)
(1 2 3 4 5 6 7 8 9 10)

Now I can write an intuitive definition for the even and odd numbers in terms of the natural numbers:

(def odd-numbers (take-nth 2 natural-numbers))
(def even-numbers (take-nth 2 (drop 1 natural-numbers)))

Predicates

Clojure provides a number of functions that test boolean predicates, including every?, not-any?, and not-every?, and some. Here are a few examples, using the days collection defined above.

Does every day start with "S"?

user=>(every? #(.startsWith % "S") days)
false

Is there some day that starts with "M"?

user=>(some #(.startsWith % "M") days)
true

Map and reduce

map take a function and one or more sequences. It returns a new sequence which is the result of applying the function to the item(s) in each sequence. So, to take the product of numbers from two sequences:

user=> (map * '(1 2 3 4 5) '(10 9 8 7 6))
(10 18 24 28 30)

If I want to control the type of collection returned, I can use into:

user=> (into [] (map * '(1 2 3 4 5) '(10 9 8 7 6)))
[10 18 24 28 30]

reduce walks down a collection, applying function f of two arguments to the first two arguments, then applying f to the result of the first call and the next element. This is very useful for operations that process a sequence and return a single value. For example, I can sum a sequence:

user=> (reduce + [1 2 3 4 5])
15

Or find the max value of a sequence:

user=> (reduce max [1 2 3 4 5])
5

Maps

Maps (hash tables in CL) can be iterated just like any other sequence type, bearing in mind that the function you pass in should expect a key/value pair. Given the following map of names to scores:

user=> (def scores {:john 18, :jane 21, :jim 14})                   7
#'user/scores

I can find all the people who scored above 15:

user=> (filter (fn [[k,v]] (> v 15)) scores)
([:jane 21] [:john 18])

Notice how the destructuring bind ([[k,v]]) makes it easy to bind k and v separately, without introducing a temporary variable pair that I don't really need.

Wrapping up

Lisp excels at processing lists. Clojure offers similar capabilities, but generalized to sequences, which can be lists, vectors, maps, sets, or other list-like collections.

Clojure's support for lazy collections allows a different style for collection processing that I will continue to explore in later articles in this series.

Notes

Revision history

  • 2008/09/22: initial version
  • 2008/10/15: removed var-quoting per Douglas's comment. Thanks!
Picture of stu

PCL -> Clojure, Chapter 10

  • Posted By Stuart Halloway on September 19, 2008
  • Tags

This article is part of a series describing a port of the samples from Practical Common Lisp (PCL) to Clojure. You will probably want to read the intro first.

This article covers Chapter 10, Numbers, Characters, and Strings.

Lisp with Java guts

Because Clojure is Java under the covers, you can always use Java's support for numbers, characters, and strings. The Java interop syntax is clean and simple, so it is idiomatic to call Java directly, rather than write wrappers to make code look more Lisp-like. A few examples follow:

user=> (Math/pow 3 3)
27.0

user=> (.compareTo "a" "b")
-1

user=> (Character/toLowerCase \A)
\a

Numbers are numbers

In Clojure, as in most Lisps, numbers are numbers. They don't do irritating things like overflow:

user=> (* 1000000 1000000 1000000)
1000000000000000000

Also, integer division is exact:

user=> (/ 10 3)
10/3

Under the covers, Clojure's numeric representation switches Java types as necessary to do the right thing.

user=> (class (* 1000 1000))
java.lang.Integer
user=> (class (* 1000000 1000000))
java.lang.Long
user=> (class (* 1000000 1000000 1000000))
java.math.BigInteger

Study the API docs

Clojure does a good job of balancing the purity of math (Lisp) and the practical reality of efficient representation (Java's primitives). But you still have to know your way around. Some things are wrapped in Lisp, and some things aren't. For example, numbers support the mathematical comparison operators, but Strings use Java's compareTo:

For example:

user=> (< 1 2)
true
user=> (< "a" "b")
java.lang.ClassCastException: java.lang.String
user=> (.compareTo "a" "b")
-1
user=> (.compareTo 1 2)
-1

If you aren't sure whether there is a Lispy wrapper for some functionality, you can check the API docs, or just try it in the REPL.

Wrapping up

For numbers, characters, and strings, Clojure provides some of the trappings a Lisp programmer would expect, e.g. exact integer division. But under the covers, it's all Java. If you don't find what you need in the Clojure API, drop to Java using the interop syntax.

Notes

Picture of stu

PCL -> Clojure, Chapter 7

  • Posted By Stuart Halloway on September 18, 2008
  • Tags

This article is part of a series describing a port of the samples from Practical Common Lisp (PCL) to Clojure. You will probably want to read the intro first.

This article covers Chapter 7, Macros: Standard Control Constructs.

Rolling your own

Common Lisp control constructs are generally part of the standard library, not the core language. Ditto for Clojure. If you don't find a control construct you want, you can always roll it yourself. For example, Clojure doesn't have an unless, so here goes:


(defmacro unless [condition & body]
  `(when (not ~condition)
     ~@body))

defmacro differs from Common Lisp in two important ways.

  • The argument list is a vector [...], not a list (...). (This is true for functions as well, I just hadn't mentioned it yet). Clojure gives vectors, sets, and maps equal billing with lists by giving them their own literal syntax.
  • Clojure uses different reader macros for unquote and unquote-splicing. Where CL uses , and ,@, Clojure uses ~, and ~@.

The avoidance of commas in read macros is a well-considered decision. Commas are whitespace in Clojure. This often results in an interface that is simultaneously human-friendly and list-y. The following two expressions are equivalent:


{:fn "John" :ln "Doe"}
{:fn "John", :ln "Doe"}

The latter form makes the map more readable, and more similar to other languages.

doseq

Common Lisp provides dolist for iterating a list. Clojure works in terms of sequences, which are collections that can be traversed in a list-like way. The Clojure analog to dolist is doseq. It can work with lists:


user=> (doseq x '(1 2 3) (println x))
1
2
3

doseq also works with maps. Note the destructuring bind since I care only about the values:


user=> (doseq [_ v] {:fn "John" :ln "Doe"} (println v))
John
Doe

In fact, doseq works with any kind of sequence (hence the name). (iterate inc 1) produces an infinite collection incrementing up from 1. (take 5 ...) pulls a finite set of 5 elements from a collection.


user=> (doseq x (take 5 (iterate inc 1)) (println x)))
1
2
3
4
5

Don't try to doseq an infinite collection, and don't say I didn't warn you.

dotimes

Common Lisp provides dotimes for iteration with counting. Here is the Clojure version of PCL's multiplication table example:


user=>(dotimes x 10
        (dotimes y 10
          (print (format "%3d " (* (inc x) (inc y)))))
        (println))
  1   2   3   4   5   6   7   8   9  10 
  2   4   6   8  10  12  14  16  18  20 
  3   6   9  12  15  18  21  24  27  30 
  4   8  12  16  20  24  28  32  36  40 
  5  10  15  20  25  30  35  40  45  50 
  6  12  18  24  30  36  42  48  54  60 
  7  14  21  28  35  42  49  56  63  70 
  8  16  24  32  40  48  56  64  72  80 
  9  18  27  36  45  54  63  72  81  90 
 10  20  30  40  50  60  70  80  90 100 

There is one subtle difference worth mentioning. When you write macros in a Lisp, you have some control over how many parentheses get involved. Common Lisp tends to introduce parentheses around the "setup" part of the macro, e.g.


(dotimes (x 10) ...)

Clojure avoids these parentheses, e.g.


(dotimes x 10 ...)

I find Clojure's "avoid grouping parens" approach easier both to read and remember. YMMV.

CL do and loop

Common Lisp provides some more general control constructs: do and loop. Clojure's functions of the same name serve very different purposes. Clojure's do is equivalent to CL's progn, and Clojure's loop works with recur.

You could write Clojure macros to emulate CL's do and loop, but you probably won't want to. Instead, you can use list comprehensions or lazy sequences, which I will introduce later in this series.

Wrapping up

Like CL, Clojure defines control structures using macros. Also like CL, Clojure has control structures that are functional, plus some that are evaluated for their side effects. Clojure's control structures tend to use fewer parentheses.

Clojure does not duplicate CL's general purpose imperative control structures. Instead, you can often use list comprehensions and lazy sequences.

Notes

The sample code is available at http://github.com/stuarthalloway/practical-cl-clojure.

Picture of stu

PCL -> Clojure, Chapter 5

  • Posted By Stuart Halloway on September 16, 2008
  • Tags

This article is part of a series describing a port of the samples from Practical Common Lisp (PCL) to Clojure. You will probably want to read the intro first.

This article covers Chapter 5, Functions.

Passing arguments to functions

One of the cool things about Common Lisp is the variety of ways to pass arguments to functions. You can pass arguments

  • by position or by name
  • as a fixed or variable list
  • with preset defaults for some arguments

Clojure provides many of the same things, but the syntax is different. Here is a function that takes two required arguments, and two optional arguments.

(defn foo [a b & [c d]]
  (list a b c d))

Here are some examples of calling foo:

user=> (foo 1 2)
(1 2 nil nil)

user=> (foo 1 2 3 4)
(1 2 3 4)

You can also pass values by name, using a map literal. The function bar expects named arguments a and b. The :or clause specifies a default for b:

(defn bar [{:keys [a b] :or {b 10}}] (list a b))

Examples of calling bar:

user=> (bar {:a 1})
(1 10)

user=> (bar {:a 1 :b 2})
(1 2)

Another nicety is defining an optional parameter's default value in terms of another parameter. Here is a Clojure approach that defaults height to width when creating a rectangle:

(defn make-rectangle 
  ([width] (make-rectangle width width))
  ([width height] {:width width :height height}))

Usage:

user=> (make-rectangle 20)
{:width 20, :height 20}

user=> (make-rectangle 10 20)
{:width 10, :height 20}

Corner cases

Common Lisp also supports discovering whether a user specified a parameter. This comes in handy if you want to detect that the user re-specified the default for some parameter. I didn't find a built-in way to do this in Clojure. Here is one approach:

(defn which-args-supplied [{:keys [a b c] :as all :or {c 4}}]
  (let [c-supplied (contains? all :c)]
    (list a b c c-supplied)))

The :as :all collects the entire arguments list into all. Then, I use contains? to detect whether the user specified a value for c.

Usage:

user=> (which-args-supplied {:a 1})
(1 nil 4 false)

user=> (which-args-supplied {:a 1 :c 4})
(1 nil 4 true)

If you wanted to make this feel more like Common Lisp, you could write a macro.

Common Lisp also supports a return-from macro to "return" from the middle of a function. This encourages an imperative style of programming, which Clojure discourages.

However, you can solve the same problems in a different way. Here is the return-from example, rewritten in a functional style so that no return-from is needed:

(defn pair-with-product-greater-than [n]
  (take 1 (for [i (range 10) j (range 10) :when (> (* i j) n)] [i j])))

Usage:

user=> (pair-with-product-greater-than 30)
([4 8])

user=> (pair-with-product-greater-than 50)
([6 9])

Dynamic invocation

To demonstrate funcall and apply, PCL uses a function to plot a histogram. Clojure provides an equivalent apply. The PCL version of plot also uses CL's loop. Instead, I will use doseq and dotimes for looping:

(defn plot [f min max step]
  (doseq i (range min max step)
    (dotimes _ (apply f [i]) (print "*"))
    (println)))

The _ is idiomatic for "I don't plan to use this value". In this case I want to do something n times, but the individual iterations do not need to know their ordinal.

With plot in place, we can plot any function of one argument, over any range. (The usefulness of this is sharply limited by the horizontal resolution of your output device). Some examples:

user=> (plot #(Math/pow % 2) 1 5 1)
*
****
*********
****************

user=> (plot identity 2 10 2)
**
****
******
********

Wrapping up

No big surprises here. Function invocation is flexible and powerful in Clojure.

Notes

The sample code is available at http://github.com/stuarthalloway/practical-cl-clojure.

Picture of stu

PCL -> Clojure, Chapter 6

  • Posted By Stuart Halloway on September 16, 2008
  • Tags

This article is part of a series describing a port of the samples from Practical Common Lisp (PCL) to Clojure. You will probably want to read the intro first.

This article covers Chapter 6, Variables.

Closures (and STM!)

Functions can close over their lexical environment, in both Common Lisp and Clojure. The PCL example for this is the canonical simple counter. You might think you could do something like this in Clojure:

; closes over count, but WON'T WORK
(def counter (let [count 0] #(inc count)))

This poor counter can only count once:

user=> (counter)
1

user=> (counter)
1

inc returns the increment of counter, but it doesn't change counter. You might look for a setf equivalent in Clojure, but you won't find one. Clojure data structures are immutable.

Of course, we don't really need mutable data here. What we need is a mutable reference to data, that can change to point to a different value later. Clojure provides this via Software Transactional Memory (STM).

Using STM, I can create a ref with ref, and alter a ref with alter. The alter must be inside a transaction, a.k.a. a dosync block.

With all that in mind, here's counter:

(def counter (let [count (ref 0)] #(dosync (alter count inc))))

This counter actually works. (It is also thread safe).

user=> (counter)
1

user=> (counter)
2

Nothing stops multiple functions from closing on the same variables. Here is a function that returns an incrementer, decrementer, and accessor, all sharing the same counter:

(defn counters []
  (let [count (ref 0)]
    (list #(dosync (alter count inc))
          #(dosync (alter count dec))
          #(deref count))))

Note that deref does not require a dosync wrapper.

Creating variables

Common Lisp provides defvar and defparameter to create variables. Between them, these forms provide ways to specify an initial value, documentation string, and init-once semantics.

Clojure's support for these ideas lives partially in clojure.contrib.def. Having used def, I can use defvar to specify an initial value and a documentation string:

(defvar gap-tolerance 0.0001
  "Tolerance to be allowed in widget gaps")

You can then access the value or the docstring:

user=> gap-tolerance
1.0E-4
user=> (doc gap-tolerance)
-------------------------
pcl.chap_06/gap-tolerance
  Tolerance to be allowed in widget gaps

For one-time initialization, you can use init-once:

(def
 #^{:doc "Count of widgets made so far"}
 widget-count)
(init-once widget-count 0)

In addition to init-once, the form above also demonstrates an alternate docstring syntax. The #^{...} invokes the metadata reader macro. Here the metadata is a docstring, but the mechanism is general. Clojure also uses metadata for visibility (e.g. private), and for adding type information.

Dynamic Binding

In Common Lisp, you can rebind global (dynamic) variables with let. In Clojure, there is a separate binding macro for this purpose. The following example demonstrates the difference between bind and let in Clojure:

(defn demo-bindings []
  (let [a "let a" b "let b"]
    (print-a-and-b "let"))
  (binding [a "bound a" b "bound b"]
    (print-a-and-b "binding")))

(defn print-a-and-b [from]
  (println (format "From %s: [a=%s] [b=%s]" from a b)))

This prints:

user=> (demo-bindings)
From let: [a=global a] [b=global b]
From binding: [a=bound a] [b=bound b]
  • The let creates new lexical bindings for a and b, which shadow the global variables. Inside the let's call to print-a-and-b these local bindings are not in scope.
  • The binding creates new dynamic bindings for a and b. These are thread local, but visible for the entire dynamic scope of the binding, including the call to print-a-and-b.

In Clojure, it is idiomatic to use names like *foo* for variables intended for dynamic rebinding.

Wrapping up

Basic creation and binding of variables in Clojure should make sense to a Common Lisp programmer.

Notes

The sample code is available at http://github.com/stuarthalloway/practical-cl-clojure.

Picture of stu

PCL -> Clojure, Chapter 3

  • Posted By Stuart Halloway on September 16, 2008
  • Tags

This article is part of a series describing a port of the samples from Practical Common Lisp (PCL) to Clojure. You will probably want to read the intro first.

This article covers Chapter 3, Practical: A Simple Database.

Defining a database

The code examples begin with a simple function for creating records. Here is a Clojure approach:

(defstruct cd :title :artist :rating :ripped)

A cd has four defined slots: title, artist, rating, and ripped. (Note that this is pretty different from the approach taken in PCL. Peter is introducing language features one a time, and hasn't covered structs at this point in the book.)

Next, I need a way to keep an in-memory database of cd information. In PCL, this is done by changing a mutable global. Since Clojure uses immutable data structures, I will take a different approach, and pass the db object (a collection of some kind) to every method.

(defn add-records [db & cd] (into db cd))

add-records lets us add an arbitrary number of cds to an existing db collection. The use of into allows us to defer the choice of collection type.

I also want an init-db function to populate initial values for the collection. Again I will avoid mutable data and have init-db return the database.

(defn init-db []
  (add-records #{} 
          (struct cd "Roses" "Kathy Mattea" 7 true)
          (struct cd "Fly" "Dixie Chicks" 8 true)
          (struct cd "Home" "Dixie Chicks" 9 true)))

The #{} is a set literal.

Console output

Next, dump-db can print a summary of the database to the console. Again, since there is no mutable data, I will pass the database to dump-db

(defn dump-db [db]
  (doseq cd db
    (doseq [key value] cd 
      (print (format "%10s: %s\n" (name key) value)))
    (println)))

The calls to doseq are loops: the outer doseq loops over all cds in the db, and the inner one loops over each key/value pair in a cd record. The call to format wraps a Java call to String.format. The output from dump-db looks like this, when called on (init-db):

 title: Roses
artist: Kathy Mattea
rating: 7
ripped: true

 title: Fly
artist: Dixie Chicks
rating: 8
ripped: true

 title: Home
artist: Dixie Chicks
rating: 9
ripped: true

Console input

Reading in a new record is a little more tricky. I need a generic input function, plus number validation (for the rating field) and yes/no validation (for the ripped field). First, I will write a prompt-read function that displays a prompt and then waits for input:

(defn prompt-read [prompt]
  (print (format "%s: " prompt))
  (flush)
  (read-line))

Now I can write prompt-for-cd, which will prompt for each of the four fields, and then combine them into a cd:

(defn prompt-for-cd []
  (struct 
   cd 
   (prompt-read "Title")
   (prompt-read "Artist")
   (parse-integer (prompt-read "Rating"))
   (y-or-n-p "Ripped [y/n]")))

The prompt-for-cd function cannot compile, because it depends on two other functions that I haven't written yet. First, parse-integer:

(defn parse-integer [str]
  (try (Integer/parseInt str) 
       (catch NumberFormatException nfe 0)))

This function demonstrates two bits of Java interop: the Integer/parseInt syntactic sugar for invoking static methods, and the try/catch special form. The resulting function will coerce any non-numeric input to zero.

Common Lisp has a built-in y-or-n-p for yes/no input, buy Clojure doesn't. So I will write one:

(defn y-or-n-p [prompt]
  (= "y"
     (loop []
       (or 
        (re-matches #"[yn]" (.toLowerCase (prompt-read prompt)))
        (recur)))))

The call to recur will repeat the loop until the user enters "y" or "n". Since re-matches returns the match, there is no need to capture the input in a local variable.

File load and save

It would be nice to save and reload the database from a file. Since the database is just Clojure data, I can use Clojure's built-in support for serializing objects: print and read. Combine these with the beautifully-named spit and slurp, and file I/O is done:

(use 'clojure.contrib.duck-streams)
(defn save-db [db filename]
  (spit 
   filename 
   (with-out-str (print db))))

(defn load-db [filename] 
  (with-in-str (slurp filename)
    (read)))

For spit you will need the clojure-contrib library on your classpath.

Querying the database

Now, the fun begins. I would like to have a mini-DSL for querying the database. For example, I might say:

(filter (artist-selector "Dixie Chicks") (init-db))

This should return all items in the database where the artist is "Dixie Chicks". For this to work, artist-selector needs to be a higher-order function, i.e. a function that returns another function:

(defn artist-selector [artist]
  #(= (:artist %) artist))

There are two interesting bits of syntactic sugar here.

  • The #(...) creates an anonymous lambda, where % represents the first argument. For example, #(inc %) is the same as (fn [n] (inc n)).
  • The call (:artist %) uses the keyword :artist in function position to look up the corresponding value in %.

A more general query

artist-selector is cool, but it could be much more general. How about a where function that creates a test for any number of criteria? A general form would look like:

(filter (where {:artist "Dixie Chicks"}) (init-db))

But now I can add multiple criteria. How about all the Dixie Chicks albums that I rated 8?

(filter (where {:artist "Dixie Chicks" :rating 8}) (init-db))

Here's where:

(defn where [criteria]
  (fn [m]
    (loop [criteria criteria] 
      (let [[k,v] (first criteria)]
        (or (not k)
            (and (= (k m) v) (recur (rest criteria))))))))

This is a little more complex than artist-selector:

  • The loop processes the criteria, one key/value pair at the time.
  • The let uses a destructuring bind to pull the key and value into k and v.
  • The and joins together multiple criteria.
  • The recur advances to the next criterion.
  • The (or (not k) ... allows the recursion to terminate when no criteria remain.

Notice that the variable name is now m instead of cd. While writing a CD database, I have accidentally produced a completely general purpose where function. Oops. :-)

I wrote the where to explicitly demonstrate implementing sequence operations with recur. You will often find that Clojure has already defined the sequence operation you need. In this case, there where applies its criteria to every object in the sequence. Sure enough, there is an every? that will simplify things:

(defn simpler-where [criteria]
  (fn [m]
    (every? (fn [[k v]] (= (k m) v)) criteria)))

Updating the database

A nice feature of functional design is composability. We have used where to perform lookup queries with filter, but the same code can be used for update operations. For example, maybe I want to set the rating for all Dixie Chicks albums to 10:

(update (some-db) (where {:artist "Dixie Chicks"}) {:rating 10})

update is simply:

(defn update [db criteria updates]
  (into (empty db) 
   (map (fn [m] 
      (if (criteria m) (merge m updates) m))
    db)))
  • Each cd in the db gets mapped into either itself, or itself+updates, depending on whether the criteria match.
  • The (into (empty db) ... gives back a collection of the same type as db. This will work for out set, but also means that the update function is generalized for other sequence types.

Wrapping up

Look back at where we have been. That's a lot of functionality for several dozen lines of code! The music database now supports

  • console-based input and output
  • file-based persistence
  • generic exact-match queries
  • arbitrary updates to matches

Clojure's unique features helped.

  • The map literal {:key value} made creating the query DSL simple.
  • Java interop was easy and unobtrusive.
  • Immutable programming wasn't so hard, after all. No state mutated in the code above, and yet nothing seemed too alien, did it?

Notes

The sample code is available at http://github.com/stuarthalloway/practical-cl-clojure.

Revision history

  • 2008/09/16: initial version
  • same day! : removed clojure.contrib.string format (format now in Clojure core)
  • 2008/09/17: batch of changes suggested by Rich Hickey. Most of these changes can be summarized as "It doesn't have to be a list!" See the git commit for details.
  • 2008/10/03: added link out to clojure-contrib, based on feedback from Hans Hübner.
  • 2008/10/28: fixed parameter reversal in update example, based on feedback from Chanwoo Yoo.
Picture of stu

PCL -> Clojure

My current leisure-time project is porting the examples from Peter Seibel's excellent Practical Common Lisp (PCL) to Clojure.

I think Clojure is interesting for three reasons:

  1. Clojure is Lisp, but minus historical baggage.
  2. Clojure gives full access to the JVM and Java libraries.
  3. Clojure groks concurrency and state.

My ground rules are simple:

  • I am not going to port everything, just the code samples that interest me as I re-read Practical Common Lisp.
  • Where Peter introduced Common Lisp features in a planned progression, I plan to use whatever Clojure feature come to mind. So I may jump straight into more "advanced" topics, even in the intro chapters.

Please do not assume that this port is a good introduction to Common Lisp! I am cherry-picking examples that are interesting to me from a Clojure perspective. If you want to learn Common Lisp, read PCL. In fact, you should probably read the relevant chapters in PCL first, no matter what.

The Series

Talks

I am available to give conference talks on Clojure. Check the schedule for an event near you, or contact Relevance (info@thinkrelevance.com) to schedule an event.

Notes

Picture of stu

Java.next series translated to Russian

  • Posted By Stuart Halloway on September 16, 2008

Andrey Lipatkin is translating my Java.next series into Russian.

Of course I had to check the translation. Not knowing Russian myself, I pasted parts of Andrey's translation into Google translate. Here's my ideas on checked exceptions, after translation into Russian by a human and back to English by Google:

"Checked exceptions were unsuccessful experiment. Java code because they only swell, nothing is not contributing to a better handling errors. Worse, checked exceptions are present headache for support on the border between layers of abstractions. Introduction of new types of exceptions should not lead to re!"

I couldn't have said it better myself. That's exactly how I feel when reading code littered with catch blocks.

Picture of stu

Java.next #4: Immutability

This is Part Four of a series of articles on Java.next. In Part Four, I will begin to explore how the Java.next languages (JRuby, Groovy, Clojure, and Scala) deal with concurrency. Concurrency is a big topic, so I will subdivide it, narrowing my focus in this part to how the Java.next languages support immutability.

Why concurrency?

Over the last decade, most programmers have written code with little concern for concurrency. Often this has caused problems later, when programs needed to be used in a more concurrent setting. This is only going to get worse: in the future, everything is concurrent.

Why don't programmers do a better job with concurrency? Because it is hard. Read the excellent Java Concurrency in Practice (JCIP), and you will come away impressed with all the clever work that has gone into making Java a good environment for writing concurrent applications. But may also come away thinking "Who is smart enough to get this kind of code right in production?"

Not many people can, and the experts agree that we need an entirely different paradigm for writing concurrent applications. It may even be the case that supporting concurrency is the number one priority for Java.next.

Why immutability?

One of the most difficult things about writing concurrent programs is deciding how to protect mutable shared state. Java provides a locking primitive, synchronized, but locking primitives are difficult to use. You have to worry about

  • data corruption (if you lock too little)
  • deadlock (if you lock too much)
  • performance degradation (even if you get it right)

To make matters worse, it is generally impractical to write automated tests that explore the various possible interleavings of events.

Immutability solves all these problems of locking, by removing the need for ever locking at all. Immutable state can always be shared safely. Nobody can ever see a corrupt value by definition, as values never change.

All of the Java.next languages support immutability to some degree, as I will show below.

Immutability in JRuby

JCIP uses a ThreeStooges class as an extremely simple example for creating immutable objects in Java. Continuing the theme of that example, here is JRuby's immutable stooges:

require 'set'

class ThreeStooges
  def initialize(*names)
    @stooges = Set.new(names)
    @stooges.freeze
    self.freeze
  end
  def stooge?(name)
    @stooges.member?(name)
  end
end

In Ruby, you can set immutability on a per-instance basis, as opposed to a per-class basis. A particular object can become immutable by calling freeze. If the internal components of that object are not primitive, they will need to freeze as well. So, in the constructor above, I freeze the two things at risk of changing: stooges and self.

While immutability is possible in Ruby, be warned. Heavy use of freeze to enable safe concurrency is far from idiomatic.

Immutability in Groovy

In Groovy, the ThreeStooges class looks and works like Java, but prettier:

class ThreeStooges { 
  private final Set<String> stooges;
  ThreeStooges(String... args) {
    stooges = args as HashSet<String>;
  }
  boolean isStooge(String name) {
    return stooges.contains(name); 
  }
}

The important details here are that the internal stooges collection is

  • final (cannot be changed after the object is created)
  • private (cannot be accessed outside the object itself)

I could also wrap stooges with unmodifiableSet, but since the ThreeStooges class already wraps stooges without exposing a modifying method, the second wrap is redundant.

Immutability in Scala

In JRuby and Groovy, immutability is possible. In Scala, immutability is preferred. The default collection classes are immutable, so ThreeStooges is just

class ThreeStooges(names: String*) {
  private val stooges = Set() ++ names
  def isStooge(name: String) = stooges(name)
}

A few observations here:

  • In Scala, fields must be declared val (immutable) or var (changeable). Scala style encourages you to use val where possible.
  • Scala's Set is immutable.

I find that Scala's inline constructor syntax and braces-free method definition syntax make the Scala definition easier to read than the JRuby or Groovy versions.

Immutability in Clojure

In Clojure, immutability is the default. In fact, everything is immutable, with only two exceptions:

  • calling down into Java APIs (the Clojure version of assembly language)
  • a few data structures which are specifically designed to support Software Transactional Memory (more on this in a later post)

So, the three-stooges can just be a set:

(defn three-stooges [& names] (set names))
(defn is-stooge? [name stooges] (stooges name))

A few observations:

  • [& names] is Clojure's varargs syntax.
  • A set can be placed in function position (first item in a list). So (stooges name) can be read as "find name in the set stooges"

Putting immutables to work

To see how immutables can simplify an implementation, consider the Factorizer servlet example from JCIP. The objectives of this example are to

  • calculate the prime factorization of a number
  • cache the most recent factorization
  • use the cached answer where possible, and track the frequency of cache hits
  • track the total number of factorizations

Here is a Java version of the factorizer using synchronized blocks, and stripped down to bare essentials for clarity:

import java.math.BigInteger;

public class CachedFactorizer {
    private BigInteger ;
    private BigInteger[] lastFactors;
    private long hits;
    private long cacheHits;

    public void factors(BigInteger i) {
        BigInteger[] factors = null;
        synchronized (this) {
            ++hits;
            if (i.equals(lastNumber)) {
                ++cacheHits;
                factors = lastFactors.clone();
            }
        }
        if (factors == null) {
            factors = factor(i);
            synchronized (this) {
                lastNumber = i;
                lastFactors = factors.clone();
            }
        }
        return factors;
    }
}

In order for the CachedFactorizer to be correct and efficient, you need to think carefully about where the synchronized blocks should go. In the example above, there are two synchronized blocks. The first block checks the cache, and updates the counters. It protects:

  • read and write of hits
  • read and write of cacheHits
  • read of lastFactors

The second block updates the cache. It protects:

  • write of lastNumber
  • write of lastFactors

To make sure you understand the approach, ask yourself the following questions:

  • Could the first synchronized block be split into multiple, more granular locks? Would this likely be faster or slower?
  • What about the second block?
  • What operations are unprotected by locks? Is this safe?
  • Would a simple, single-lock approach be correct? Under what circumstances would it perform well?

That's a lot to think about. Now, let's consider the same example, but using immutable data structures.

A Clojure cached-factor

Here is one Clojure approach to cached-factor:

(def #^AtomicLong hits (AtomicLong. 0))
(def #^AtomicReference cache (AtomicReference. {:n nil :factors nil}))
(def #^AtomicLong cache-hits (AtomicLong. 0))

(defn cached-factor [n]
 (.incrementAndGet hits)
 (let [cached (.get cache)]
   (if (= n (cached :n))
     (do (.incrementAndGet cache-hits)
       (cached :factors))
     (let [factors (factor n)]
       (.set cache {:n n :factors factors})
       factors))))

There are several things to notice here:

  • The hits, cache, and cache-hits take advantage of Java 5's atomic wrapper classes. (The #^ClassName syntax adds type information.)
  • There are no synchronized blocks anywhere.
  • Even though there are no synchronized blocks, you still have to think about the semantics of concurrency. The incrementAndGet method is used to update the two hit counters, and the cache is pulled into a local variable to avoid inconsistent reads.

The real key to this approach, however, is storing a Clojure map in an AtomicReference. Because Clojure data structures are immutable, they can benefit from an AtomicReference in a way that mutable classes cannot.

The immutable approach looks only a little simpler in this small example. But the benefit of using immutable data increases when you compose objects together. If you wanted to compose the synchronized version with another object, you would have have to dig back into the internals of both objects, study their locks, and pick a new lock strategy to cover the two objects together. Composing operations with immutable objects is much easier.

A Scala CachedFactor

Since Scala's default collections are immutable, a Scala approach can closely parallel the Clojure code above:

class CachedFactorizer {
  case class Cache(val n: Int, val factors: List[Int])
  val hits = new AtomicLong()
  val cache = new AtomicReference[Cache](Cache(2, calculateFactors(2)))
  val cacheHits = new AtomicLong()

  def factor(n: Int) = {
    hits.incrementAndGet
    val cached = cache.get
    if (cached.n == n) {
      cacheHits.incrementAndGet
      cached.factors
    } else {
      val factors = calculateFactors(n)
      cache.set(Cache(n, factors))
      factors
    }
  }

While this is similar to the Clojure approach, the difference is instructive. While the Clojure version stores the cache as a simple Map, the Scala version introduces a strongly-typed Cache class for the cache of values and their factors. The differences here are intended to idiomatic. Either approach could work in either language.

What about the JRuby and Groovy examples?

Could you write the example above in Groovy, JRuby, or even Java? Yes, but it would be non-idiomatic, even ugly. I am not going to show the JRuby and Groovy versions here, because those languages do not offer any concurrency-specific advantages over Java. Scala and Clojure, on the other hand, don't just make immutable objects possible. They make them easy and idiomatic.

Languages are designed to support certain priorities, inevitably at the expense of others. By making immutability a preferred option (Scala) or the standard (Clojure), these languages are encouraging a different paradigm for concurrent applications.

Languages are not about what they make possible, but about what they make beautiful. Clojure and Scala aim to make concurrent programs beautiful, and their preference for immutability is but the tip of the iceberg. In the next installment of this series, I will explore how Clojure and Scala enable concurrent programs through actors, agents, and software transactional memory.

Notes

  • This article is based on the talk Java.next #4: Concurrency . Check the schedule for a talk near you.
  • Thanks to Greg Vaughn and Glenn Vanderburg for their feedback on a draft of this article.
  • Thanks to Rich Hickey for suggesting several Clojure versions of the factorizer example.
  • Feedback on how to improve these examples is most welcome!
Picture of jgehtland

Small Things, Loosely Joined, Written Fast

  • Posted By Justin Gehtland on September 03, 2008

UPDATE: Re-posted for the RailsConf Europe attendees. Thanks for the warm welcome; great crowd. Hope the end wasn’t TOO abrupt. ;-)

Thanks to everyone who came out to my talk yesterday at RailsConf. That has to have been my favorite speaking engagement of the last year. Great crowd at a great show.

I got several requests for the slides and source code. The slides themselves are kind of useless; I don’t believe in writing slides that make people read. Slides are there for either entertainment or reminder purposes only. However, you can download them here if you might find them helpful.

On the other hand, source code is a much more useful take-away from a technical talk. You can download the code here. However, that source code is going to be useless without some background.

Things you will need to install

There are a variety of things you will need in order to get the sample to run. Most are gems, but one is a Java library. None of it is too hard.

RubyCas Server

First, head over to the QuickStart guide, which will walk you through downloading and installing the gem. In particular, go and look in the config file and modify it to match your specific configuration. For the purposes of my demo, I used the webrick server settings, and put it on port 80 with no SSL. However, if you want to test with SSL on, webrick is still the easiest solution since it supports SSL out of the box; if you go Mongrel, you’ll have to front it with Apache, and for local testing, that’s overkill.

You will then need to set up two databases; one for rubycas-server itself, and one for user accounts for the authenticator. Here’s what my settings look like:

database:
  adapter: mysql
  database: casserver
  username: root
  password: 
  host: localhost

authenticator:
  class: CASServer::Authenticators::SQL
  database:
    adapter: mysql
    database: campus_users
    username: root
    password: 
    server: localhost
  user_table: users
  username_column: username
  password_column: password

Don’t forget to actually create those databases. ;-)

The projects require rubycas-client, which you can install as a gem, but is embedded in the source code as a plugin.

UPDATE: Don’t forget to actually start rubycas-server.

> sudo rubycas-server

ActiveMQ

For the Rideshare part of the application, I used ActiveMQ as the back end messaging system. I used the 5.1.0 release for the demo; it runs on the JDK that is installed on the Mac without problem, no need to fiddle with your JDK install. Other platforms, just read the release notes and make sure you have the right bits.

ActiveMQ is configured to work on standard local ports out of the box, and the Stomp connector is enabled by default, so all you should have to do to get it up and running for the sample app is navigate to the root directory of the exploded tarball and execute:

> bin/activemq

That’s it. You will, however, have to install the Stomp gem for the sample to work.

> sudo gem install stomp --include-dependencies

To get it all running

  1. Install RubyCas server and its databases. Populate the rubycas central user data with some accounts that have the same logins as the cas_user values in the fixtures from the application. For example, in the users.yml fixtures, there are records with cas_user=jgehtland. Make sure the RubyCas server authentication table has a user with login=jgehtland.
  2. Install ActiveMQ and get it running.
  3. Unzip the source code.
  4. Create the databases for the two apps and load their fixtures into the development environment.
  5. Run the Enrollr application on port 3000.
  6. Run the DormPickR application on port 3001.
  7. Point another browser window at localhost:3001/sender/send_ride_notice?city=Atlanta to pump messages into ActiveMQ. You can change the city and resend to create multiple pending messages.

That’s pretty much it. Drop a comment if that isn’t working for you (it is absolutely possible I forgot a required step or gem; if so, I’ll fix the instructions).

I’m also planning on writing up the talk as a series of posts starting later today that address each of the major areas in much more detail, so check back when you can.

Thanks again if you came to the talk!

Picture of jgehtland

Berlin: First Impressions

  • Posted By Justin Gehtland on September 02, 2008

I landed in Berlin this morning for RailsConf Europe. On Thursday, I’ll be giving a slightly-rewritten version of my Small Things, Loosely Joined and Written Fast talk where I focus on using tools like single sign-on, multi-database AR and better testing to achieve more robust applications. Having never been to Germany before, I was happy to finally arrive. The most obvious difference between Berlin and my other European travels so far is that, even though everybody here speaks English (just like everywhere else I’ve traveled) they don’t subtitle all the signs and written material with English. Which is quite refreshing, but makes it difficult for the jetlagged and underprepared Americans fresh off the plane to navigate. Luckily for me, the bus terminals have great maps.

Tonight I’ll be exploring the city, and tomorrow I’ll be taking in some talks and getting ready for my own on Thursday. I’m especially interested in Frederick Cheung and Paul Butcher’s talk “Intellectual Scalability”, which, as described, sounds like a pretty close match to mine. Luckily, their’s is the day before mine, and I’m pretty good at changing my talks at the last minute. Should be fun!