Picture of stu

The Case for Clojure

  • Posted By Stuart Halloway on October 19, 2009
  • Tags

The case for Clojure is richly detailed and well-documented. But sometimes you just want the elevator pitch. OK, but I hope your building has four elevators:

  • "Concurrency is not the problem! State is the problem. Clojure's sweet spot is any application that has state."
  • "Don't burn your legacy code! Clojure is a better Java than Java."
  • "Imperative programming and gratuitous complexity go hand in hand. Write functional Clojure and get shit done."
  • "Design Patterns are a disease, and Clojure is the cure."

Any one of these ideas would justify giving Clojure a serious look. The four together are a perfect storm. Over the next few months I will be touring the United States and Europe to expand on each of these themes:

  • "Clojure's sweet spot is any application that has state." At Oredev, I will be speaking on Clojure and Clojure concurrency. I will demonstrate how Clojure's functional style simplifies code, and how mere mortals can use Clojure's elegant concurrency support.
  • "Clojure is a better Java than Java." I will be speaking on Clojure and other Java.next languages at NFJS in Atlanta and Reston.
  • "Design Patterns are a disease, and Clojure is the cure." I will sing a siren song of Clojure to my Ruby brethren at RubyConf.
  • "Write functional Clojure and get shit done." At QCon, I will give an experience report on Clojure in commercial development. Our results so far: green across the board. Both Relevance and our clients have been pleased with Clojure.

Want to learn about all four of these ideas, and more? Early in 2010, I will be co-teaching the Pragmatic Studio: Clojure with some fellow named Rich Hickey.

If you can't make it to any of the events listed above, at least you can follow along with the slides and code. All of my Clojure slides are Creative-Commons licensed, and you can follow their development on Github. Likewise, all the sample code is released under open-source licenses, including

And remember: think globally, but act only through the unified update model.

Picture of stu

Brian's Functional Brain, Take 1.5

  • Posted By Stuart Halloway on October 07, 2009
  • Tags

Last week, Lau wrote two excellent sample apps (and blog posts) demonstrating Brian's Brain in Clojure. Continuing with the first version of that example, I am going to demonstrate

  • using different data structures
  • visual unit tests
  • JMX integration (gratuitous!)
  • an approach to Clojure source code organization

Make sure you read through Lau's original post first, and understand the code there.

Data Structures

In a functional language like Clojure, it is easy to experiment with using different structures to represent the same data. Rather than being hidden in a rat's nest of mutable object relationships, your data is right in front of you in simple persistent data structures. In Lau's implementation, the board is represented by a list of lists, like so:

(([:on 0 0] [:off 0 1] [:on 0 2]) 
 ([:on 1 0] [:on 1 1] [:off 1 2]))

Each cell in the list knows its state (:on, :off, or :dying), and its x and y coordinates on the board. The board data structure is used for two purposes:

  • the step function applies the rules of the automaton, returning the board's next state
  • the render function draws the board on a Swing panel

These two functions have slightly different needs: the step function cares only about the state of adjacent cells, and can ignore the coordinates, while the render function needs both.

How hard would it be to convert the data to a form that stores only the state? Not hard at all:

(defn without-coords [board]
  (for [row board]
    (for [[state] row] state)))

You could write without-coords as a one-liner using map, but I prefer how the nested fors visually call out the fact that you are manipulating two dimentional data.

Without the coords, the board is easier to read:

((:on :off :on)) 
 (:on :on :off))

If you choose to store the board this way, you will need to get the coordinates back for rendering. That's easy too, again using nested fors to demonstrate that you are transforming two-dimensional data:

(defn with-coords [board]
  (for [[row-idx row] (indexed board)]
    (for [[col-idx val] (indexed row)]
         [val row-idx col-idx])))

So, with a couple of tiny functions, you can easily convert between two different representations of the data. Why not use both formats, picking the right one for each function's needs?

When performance is critical, there is another advantage to using different data formats: caching. Consider the step funtion, which uses the rules function to determine the next value of each cell:

(defn rules
  [above [_ cell _ :as row] below]
  (cond
   (= :on    cell)                              :dying
   (= :dying cell)                              :off  
   (= 2 (active-neighbors above row below))     :on   
   :else                                        :off  ))

The "without coordinates" format used by step and rules passes only exactly the data needed. As a result, the universe of legal inputs to rules is small enough to fit in a small cache in memory. And in-memory caching is trivial in Clojure, simply call memoize on a function. (It turns out that for this particular example, the calculation is simple enough that memoize won't buy you anything. Lau's second post demonstrates more useful optimizations: transients and double-buffering. But in some problems a cacheable funtion result is a performance lifesaver.)

If you use comprehensions such as Clojure's for to convert inputs to exactly the data a function needs, your functions will be simpler to read and write. This "caller makes right" approach is not always appropriate. When it is appropriate, it is far less tedious to implement than the related adapter pattern from OO programming.

Since multiple data formats are so easy, you can use yet another format for testing.

Testing

Brian's Brain is a simulation in two dimensions, it would be nice to write tests with a literal, visual, 2-d representation. In other words:

; this sucks
(is (= :on (rules (cell-with-two-active-neighbors))))

; this rocks
O..
...  => O     
..O

In the literal form above the O is an :on cell, and the . is an :off cell.

Creating this representation is easy. The board->str function converts a board to a compact string form:

(defn board->str
  "Convert from board form to string form:

   O.O         [[ :on     :off  :on    ]
   |.|     ==   [ :dying  :off  :dying ]
   O.O          [ :on     :off  :on    ]]
"
  [board]
  (str-join "\n" (map (partial str-join "") (board->chars board))))

The board->chars helper is equally simple:

(def state->char {:on \O, :dying \|, :off \.})
(defn board->chars
  [board]
  (map (partial map state->char) board))

With the new stringified board format, you can trivially write tests like this:

(deftest test-rules
  (are [result boardstr] (= result (apply rules (str->board boardstr)))
       :dying  "...
                .O.
                ..."

       :off    "O.O
                ...
                O.O"

       :on     "|||
                O.O
                |||"))

The are macro makes it simple to run the same tests over multiple inputs, and with liberal use of whitespace the tests line up visually. It isn't perfect, but I think it is good enough.

One last note: the string format used in tests is basically ASCII art, so you can have a console based GUI almost for free:

(defn launch-console []
  (doseq [board (iterate step (new-board))]
    (println (board->str board))))

JMX Integration

Ok, JMX integration is gratuitous for an example like this. But clojure.contrib.jmx is so easy to use I couldn't resist. You can store the total number of iterations perfomed in a thread-safe Clojure atom:

(def status (atom {:iterations 0}))

Then, just expose the atom as a JMX mbean.

(defn register-status-mbean []
  (jmx/register-mbean (Bean. status) "lau.brians-brain:name=Automaton"))

Yes, it is that easy. Create any Clojure reference type, point it at a map, and register a bean. You can now access the iteration counter from a JMX client such as the jconsole application that ships with the JDK.

To make the mbean report real data, wrap the automaton's iterations in an update-stage helper function that both does the work, and updates the counter.

(defn update-stage
  "Update the automaton (and associated metrics)."
  [stage]
  (swap! stage step)
  (swap! status update-in [:iterations] inc))

If you haven't seen update-in (and its cousins get-in and assoc-in) before, go and study them for a moment now. They make working with non-trivial data structures a joy.

You might disagree with my choice of atoms. With a pair of references, you could keep the iteration count exactly coordinated with the simulator. Or, with a reference plus an agent you push the work of updating the iteration count out of the main loop. Whatever you choose, Clojure makes it easy to both (a) implement state and (b) keep the statefulness separate from the bulk of your code.

Source Code Organization

Lau's original code weighed in at a trim 67 lines. Now that the app supports three different data formats, a console UI, and JMX integration, it is up to around 150 lines. How should we organize such a monster of an app? Two obvious choices are:

  • put everything in one file and one namespace
  • split out namespaces by functional area, e.g. automaton, swing gui, and console gui

I don't love either approach. The single file approach is confusing for the reader, because there are multiple different things going on. The multiple namespace approach is a pain for callers, because they get weighed down under a bunch of namespaces to do a single thing.

A third option is immigrate. With immigrate you can organize your code into multiple namespaces for the benefit of readers, and then immigrate them all into a blanket namespace for casual users of the API. But immigrate may be too cute for their own good.

Instead, I chose to use one namespace for the convenience of callers, and mutilple files to provide sub-namespace organization for readers of code. I mimiced the structure Tom Faulhaber used in clojure-contrib's pprint library: a top level file that calls load on several files in a subdirectory of the same name (minus the .clj extension):

lau/brians_brain.clj  
lau/brians_brain/automaton.clj
lau/brians_brain/board.clj
lau/brians_brain/console_gui.clj
lau/brians_brain/swing_gui.clj

I also used this layout for clojure-contrib's JMX library.

Parting Shots

Over the course of Lau's two exampples and this one, you have seen:

  • an initial working application in under 100 lines of code
  • transforming data structures for performance optimization
  • transforming data structures for readability
  • a second (console) gui
  • optimizing the Swing gui with double buffering
  • optimizing with transients
  • visual tests
  • easy addition of monitoring with JMX

And here are some things you haven't seen:

  • classes
  • interfaces
  • uncontrolled mutation
  • broken concurrency

Would it be possible to write a threadsafe Brian's Brain using mutable OO? Of course. Is there a benefit to doing so? I would love to hear your thoughts on the subject, especially in the form of code.

Further Reading

Picture of stu

Rifle-Oriented Programming with Clojure

  • Posted By Stuart Halloway on August 12, 2009
  • Tags

Any comparison of hot JVM languages is likely to note that “Clojure is not object-oriented.” This is true, but it may lead you to the wrong conclusions. It’s a little like saying that a rifle is not arrow-oriented. In this article, you will see some of the ways that Clojure addresses the key concerns of OO: encapsulation, polymorphism, and inheritance.

This is a whirlwind tour, and we won't have time to cover the full details of all the Clojure code you will see. When we are done, I hope you will decide to explore for yourself. You can download and start using Clojure by following the instructions on the getting started page.

Just Enough Clojure Syntax

Clojure has vectors, which are accessed by integer indexes:

[1 2 3 4]
-> [1 2 3 4]

(get [:a :b :c :d :e] 2)
-> :c

In the preceding example, the initial [1 2 3 4] is input that you enter at the Read-Eval-Print Loop (REPL). The -> indicates the response from the REPL.

Clojure has maps, which are key/value collections:

{:fname "Stu", :lname "Halloway"}
-> {:fname "Stu", :lname "Halloway"}

Sets contain a set of values, and their literal form is preceded with a hash. Here is the set of English vowels, using backslash to introduce a character literal:

#{\a \e \i \o \u}
-> #{\a \e \i \o \u}

Lists are singly-linked lists, and are enclosed with parentheses. Lists are special: Not only are they data, they also act as the syntax for invoking functions. The list below invokes the plus (+) function:

(+ 1 2 3 4 5)
-> 15

Collections themselves act as functions. They take an argument which is the key/index to look up:

([:a :b :c :d :e] 2)
-> :c

({:name "Stu" :ext 101} :name)
-> "Stu"

Enough syntax, let's get started.

Encapsulation

Encapsulation is the hiding of implementation details so that clients of your code do not accidentally become dependent on them. In object-oriented languages, this is ususally done at the class level. A class has public methods, private implementation details, and various other scopes in between.

Clojure accomplishes the purposes of encapsulation in three ways: closures, namespaces, and immutability.

Closures

A closure closes over (remembers) the environment at the time it was created. For example, the function make-counter below closes over the initial value passed via init-val:

(defn make-counter [init-val] 
  (let [c (atom init-val)] #(swap! c inc)))

Let’s break this down:

  • defn defines a new function, named make-counter, that takes a single argument init-val.
  • The let binds the name c to a new atom.
  • The atom creates a threadsafe, deadlock-proof mutable reference to a value.
  • The octothorpe (#) prefix introduces an anonymous function
  • The call to swap! updates the value referenced by c by calling inc on it.
  • The value of the let is the value of its last expression. This let returns a function that increments a counter, which is then the return value of make-counter.

The atom c is private to the function returned by make-counter. The only public thing you can do is increment it by one:

(def c (make-counter))
-> #'user/c

(c)
-> 1

(c)
-> 2

(c)
-> 3

The counter example returned a single function, but nothing stops you from returning multiple functions. These multiple functions can then share private state. The new version of make-counter below returns two functions: one to increment the counter, and one to reset it.

(defn make-counter [init-val] 
  (let [c (atom init-val)] 
    {:next #(swap! c inc)
     :reset #(reset! c init-val)}))

This new make-counter returns a map whose :next value increments the counter, and whose :reset value resets it:

(def c (make-counter 10))
-> #'user/c

((c :next))
-> 11

((c :next))
-> 12

((c :reset))
-> 10

Why the double parentheses above? Two functions calls: The inner function call looks up the appropriate function, and the outer one calls it.

Closing over data is far more general than the simplistic model offered by private, protected, public, friend, et al. in OO languages. By combining multiple lets and multiple return values from a function, you can create arbitrary encapsulation strategies.

Similar encapsulation possibilities are available in any language that supports closures. Douglas Crockford describes a similar idiom in JavaScript.

Namespaces

A Clojure namespace groups a set of related data and functions. Inside a namespace, a Clojure var can refer to a function or to data, and can be public or private.

For example, Chris Houser’s error-kit library implements a condition/restart system for Clojure.

(with-handler
  (vec (map int-half [2 4 5 8]))
    (handle *number-error* [n]
      (continue-with 0))) 

In the code above, with-handler, handle, and continue-with are public vars of the clojure.contrib.error-kit namespace. The int-half is a demo function that blows up on odd inputs. When a *number-error* occurs, the handler causes execution to continue with the value 0. (Note how this is more flexible than try/catch exception handling, which cannot recover back into the middle of some operation.)

Internally, error-kit keeps track of available handlers and continues using these private vars:

(defvar- *handler-stack* () 
  "Stack of bound handler symbols")
(defvar- *continues* {} 
  "Map of currently available continue forms")

The trailing minus sign on the end of defvar- marks the vars as private. These vars are implementation details, and are invisible to code outside the clojure.contrib.error-kit namespace.

Immutability

In OO languages, another purpose of encapsulation is to prevent object A from modifying or corrupting the private data used by object B.

In Clojure, this problem does not exist. Data structures are immutable. They cannot possibly be corrupted, or changed in any way, period. You can write query functions that return “private” state, without any fear of data corruption.

Polymorphism

For our purposes here, polymorphism is the ability to choose a different method implementation based on the type of the caller. So for example:

Flyer a = new Airplane();
Flyer b = new Bird();
a.fly();
b.fly();

a.fly() and b.fly() do different things because they are called on different concrete types.

Clojure provides a generalization of polymorphism called multimethods. A multimethod definition begins with defmulti, and then has a name, plus a dispatch function that is used to select the actual implementation: To mimic polymorphism, simply dispatch on the class of the argument:

(defmulti fly class)

Individual methods of a multimethod begin with defmethod, then the multimethod name, then the object that must match the dispatch function. Finally, you get the argument list in a vector, followed by the implementation of the method. For example:

(defmethod fly Bird [b] (flap-wings b))
(defmethod fly Airplane [a] (turn-propeller a))

Unlike polymorphism, multimethods do not limit you to dispatching on class. You can dispatch based on any arbitrary function of the method arguments. So for example, a bank account might have a :type entry that is used to determine the interest rate:

(defmulti interest :type)
(defmethod interest :checking [a] 0)
(defmethod interest :savings [a] 0.05M)

The :type attribute is a convention, but nothing prevents you from dispatching on a different attribute, or even dispatching on more than one at the same time! For example, the service-charge multimethod below dispatches on two different facets of the same object: the object’s account-level (::Basic or ::Premium) and its :tag: (::Checking or ::Savings)

(defmulti service-charge 
  (fn [acct] [(account-level acct) (:tag acct)]))
(defmethod service-charge [::Basic ::Checking]   [_] 25)
(defmethod service-charge [::Basic ::Savings]    [_] 10)
(defmethod service-charge [::Premium ::Checking] [_] 0)
(defmethod service-charge [::Premium ::Savings]  [_] 0)

The _ is a legal name, and is used idiomatically to indicate that an argument will be ignored. (There is no need to even look at the argument, since all the work has been done in choosing which method to dispatch to!) This example also demonstrates two other concepts:

  • The double-colon prefix resolves a keyword in a namespace. This prevents name collisions among keywords, just as object-oriented langauges use namespaces to prevent name collisions between type names.
  • account-level is a function (not shown here), not a simple key lookup. It returns ::Premium or ::Basic based on the the account type and the current balance. Thus an account can dynamically change its account level as its balance changes.

As you can see, multimethods are far more general than polymorphism. Instead of being limited to type-based dispatch, multimethods can dispatch on any arbitrary function of an argument list. This allows programming models that more closely resemble reality: after all, what real-world entities are limited to a single type hierarchy, and forbidden to change types over time?

Inheritance

In OO languages, inheritance allows you to create a derived type that reuses the behavior of a base type. For example:

class Person {
  String fullName() { /* impl details */ }
}
class Employee extends Person {
  AddressBookItem companyDirectoryEntry() { /* impl details */ }
}

This kind of reuse is so natural in Clojure that it doesn’t even have a name. For example, here is a function that returns the full name of a person, based on first and last names:

(defn full-name [p]
  (str (:first-name p) " " (:last-name p)))

Employees are like people, but have other properties and behaviors, such as a telephone extension. The company-directory-entry returns a vector of an employee's full name and telephone extension, like this:

(defn company-directory-entry [p]
  [(full-name p) (:extension p)])

Notice that company-directory-entry “reuses” the person-ness of its argument p by calling full-name on it. There is no special inheritance ceremony required to set this up, you just call functions when you need them.

You can pass either a person or an employee to full-name. For company-directory-entry, though, you must have an employee. Or, more accurately, you must have something that resembles an employee, to the extent of having a :first-name, :last-name, and :extension. This is an example of duck typing: if it walks like a duck and quacks like a duck, we assume it is a duck, without asking it to present its IDuck papers.

Many Functions, Few Types

The example above demonstrates another negative consequence of idiomatic OO style: the over-specification of data types. The return value of companyDirectoryEntry is given its own unique type, AddressBookItem. Each new data type like AddressBookItem requires its own life-support system: constructors, accessors, equals, hashCode, and so on.

In Clojure, an address book item would simply be a vector or a map. No new types, and no life support system required. Moreover, an address book item can be manipulated with any of the large arsenal of functions in Clojure's sequence library.

To see the problem with overspecifying types, consider this method from the Apache Commons:

// From Apache Commons Lang, http://commons.apache.org/lang/
public static int indexOfAny(String str, char[] searchChars) {
    if (isEmpty(str) || ArrayUtils.isEmpty(searchChars)) {
	return -1;
    }
    for (int i = 0; i < str.length(); i++) {
	char ch = str.charAt(i);
	for (int j = 0; j < searchChars.length; j++) {
	    if (searchChars[j] == ch) {
		return i;
	    }
	}
    }
    return -1;
}

The purpose of indexOfAny is to find the index of the first occurrence of one of the searchChars that appears in str. Note the unnecessary specificity of types: it works only with strings and character arrays.

Here's the Clojure version, using the sequence library's map, iterate, and for forms:

(defn indexed [coll] (map vector (iterate inc 0) coll))
(defn index-filter [pred coll]
  (when pred 
    (for [[idx elt] (indexed coll) :when (pred elt)] idx)))

Here is an example calling index-filter:

(index-filter #{\a \e \i \o \u} "Lts f cnsnts nd n vwel")
-> (20)

The expression above finds the index of the first vowel in the string "Lts f cnsnts nd n vwel", that is, 20. But index-filter is more general than the Commons version in several ways:

1. index-filter returns all the matches, not just one.

(index-filter #{\a \e \i \o \o} "The quick brown fox")
-> (2 6 12 17)

2. index-filter works with any sequence, not just a string of characters. For example, the call below works against a range of integers:

(index-filter #{2 3 5 7} (range 6))
-> (2 3 5)

3. index-filter works with any predicate, not just a test against a character array. In the example below, the predicate is an anonymous function that tests for strings longer than three characters:

(index-filter #(> (.length %) 3) ["The" "quick" "brown" "fox"])
-> (1 2)

That is a lot of extra power, especially given that the function is shorter, easier to write, and easier to read (given some Clojure experience, of course) than the Commons version.

Conclusion

Clojure solves the same problems that OO solves, but it solves them in different ways. Instead of encapsulation, polymorphism, and inheritance, you have closures, namespaces, pure functions, immutable data, and multimethods. Idiomatic OO gives you a bloated type system with duplicated code hidden away behind encapsulation boundaries and little hope for thread safety. Clojure offers a radical alternative: a lean type system, a rich function library, and language-level concurrency support that is usable by mere mortals.

There is a lot more to Clojure than we have covered here: lazy and infinite sequences, destructuring, macros, software transactional memory, agents, seamless Java interop, and more. But those are topics for another day.

[This article was originally published in the May 2009 issue of NFJS, the Magazine. I will be speaking about Clojure at several upcoming NFJS events, come join the fun.]

Picture of stu

Clojure in the Field

  • Posted By Stuart Halloway on July 03, 2009
  • Tags

Clojure is getting a lot of positive buzz, but what is it like building and shipping a real application? In this talk, we will cover the good, the bad, and the ugly of commercial Clojure development, including:

  • BDD meets FP: how we adapted spec and test practices for functional code
  • To wrap or not to wrap: working with Java libraries
  • The learning curve: training a team of non-Clojurists on Clojure
  • What you will wish for: third-party libraries you should know about, plus some that don't exist yet
  • Good names or die: living without object context
  • Shipping it: how to manage staging and deployment of Clojure code

Experience with Clojure is useful but not a prerequisite; we will introduce the key concepts as we go.

Relevance staff deliver technical talks at events around the world. Contact us to schedule "Clojure in the Field" for your organization, or view our curriculum if you are interested in a complete course.

Picture of stu

Rifle-Oriented Programming with Clojure

  • Posted By Stuart Halloway on May 27, 2009
  • Tags

If you come to Clojure from an object-oriented background, you may not know where to start. It is sort of like looking at a rifle for the first time and asking "But where do I put the arrows?"

Clojure solves the traditional problems of OO (and then some!) but it does it in different ways. To learn how to translate your arrows (encapsulation, polymorphism, and inheritance) into bullets, check out my new article in the May issue of NFJS, the Magazine.

Also: I'll be spending the summer on the NFJS circuit, talking about Clojure, Git, and other good things. Come see us.

Picture of stu

Programming Clojure Beta 9 Is Out

  • Posted By Stuart Halloway on April 04, 2009
  • Tags

Programming Clojure Beta 9 is now available. We are almost done, and most of the changes in this Beta are small.

What's new:

  • the notation conventions have changed to make console output and REPL results more distinct
  • a new subsection covering functions on vectors
  • a new subsection on sort functions
  • an example using map with more than one collection
  • an example using the :while option to a sequence comprehension
  • examples now use the new letfn form where appropriate
  • replicate is gone -- use repeat instead
  • an index!

I have also made the book more strict in discussions of laziness. Clojure sequences are lazy, but function evaluation is eager. In the Clojure community we often ignore this distinction, saying things like "iterate is lazy" when we really should say "iterate returns a lazy sequence." The book now uses the latter formulation.

To make sure you have the latest, greatest version of the sample code from the book, go and grab the github repo.

Thanks again to everyone who has been offering feedback. Keep the feedback coming!

Picture of stu

Clojure Interview on InfoQ

  • Posted By Stuart Halloway on January 31, 2009
  • Tags

Werner Schuster over at InfoQ has interviewed me about Clojure. Check it out.

Picture of stu

Programming Clojure Beta 6 is Out

  • Posted By Stuart Halloway on January 30, 2009
  • Tags

Programming Clojure Beta 6 is now available. What’s new:

  • A new Chapter, “Functional Programming,” exlains recursion, TCO, laziness, memoization, and trampolining in Clojure.
  • A new Section, “Creating Java Classes in Clojure,” shows how to create and compile Java classes from Clojure.
  • The snake example has been moved into the concurrency chapter. The snake demonstrates how to divide your model layer into two parts: a functional model and a mutable model.
  • The STM example in the introduction now uses alter instead of commute, which allowed a race condition. Given the problem domain of the example, the race condition was acceptable. However, explaining this in the introductory chapter would have been distracting.
  • The lancet runonce function now uses locking instead of an an agent. Yes, you can do plain old-fashioned locking in Clojure! Agents are unsuitable for the kind of coordination lancet requires, because you cannot await an agent while inside another agent.

While I am talking about lancet: it now has its own repository. I have added integration with clojure.contrib.shell-out, so lancet can now call either Ant tasks or other applications. It is still far from being a replacement from Ant or Rake, but maybe your contributions can change that!

The book is now prose-complete, so if you have been waiting for the whole story, this is it. A number of readers have made suggestions for additional topics. If you have suggestions, please add them as comments to this post. For reasons of space and time, most new topics will appear in a series of articles on this blog, not in the book itself.

Clojure and clojure-contrib continue to evolve. To make sure you have the latest, greatest version of the sample code from the book, go and grab the github repo.

Thanks again to everyone who has been offering feedback. I have cleared almost 400 entries from the errata/suggestion page.  Keep the feedback coming!

Picture of stu

Programming Clojure Beta 5 is Out

  • Posted By Stuart Halloway on January 13, 2009
  • Tags

Programming Clojure Beta 5 is now available. What's new:

A new chapter, Clojure in the Wild, with information on

I have rewritten Section 1.1 "Clojure Coding Quick Start." All dependent libraries (Clojure, clojure-contrib, etc.) are now prebuilt and included with the sample code for the book. This should make it much easier for people to get started.

I have rewritten Section 2.6 "Metadata" to clearly explain the difference between user metadata and compiler metadata.

Clojure and clojure-contrib continue to evolve. To make sure you have the latest, greatest version of the sample code from the book, go and grab the github repo.

Thanks to everyone who has been offering feedback. I have cleared over 300 entries from the errata/suggestion page.  Keep the feedback coming!

Picture of stu

On Lisp -> Clojure, Chapter 9

  • Posted By Stuart Halloway on December 17, 2008
  • Tags

This article is part of a series describing a port of the samples from On Lisp (OL) to Clojure. You will probably want to read the intro first.

This article covers Chapter 9, Variable Capture.

Macro Argument Capture

Macros and "normal" code are written in the same language, and can share access to names, data, and code, This is the source of their power, but can also cause subtle bugs. What happens if a macro caller and a macro implementer both try to use the same name? The macro can "capture" the name, leading to unintended consequences.

OL begins with an example of argument capture, a broken definition of for. Here is a similar macro in Clojure:

(defmacro bad-for [[idx start stop] & body]
  `(loop [~idx ~start limit ~stop]
     (if (< ~idx ~stop)
       (do
     ~@body
     (recur (inc ~idx) limit)))))

The problem is the name limit introduced inside the macro. If you call bad-for after binding the name limit, strange things will happen. What if you try:

(let [limit 5] 
  (bad-for [i 1 10] 
    (if (> i limit) (print i))))

Presumably the intent here is to print some numbers greater than five. But in many Lisps, this would print nothing, because the bad-for macro invisibly binds limit to ten.

Clojure catches this problem early, and fails with a descriptive error:

(let [limit 5] (bad-for [i 1 10] (if (> i limit) (println i))))
-> java.lang.Exception: Can't let qualified name: ol.chap-09/limit

Clojure makes it difficult to accidentally capture limit, by resolving symbols into a namespace. The limit inside the macro resolves to ol.chap-09/limit, and there is no name collision.

Of course, you do not want your macros to use a shared global name either! What you really want is for macros to use their own guaranteed-unique names. Clojure provides this via auto-gensyms. Simple append # to limit in the bad-for example above, and you get good-for:

(defmacro good-for [[idx start stop] & body]
  `(loop [~idx ~start limit# ~stop]
     (if (< ~idx limit#)
       (do
     ~@body
     (recur (inc ~idx) limit#)))))

Now the macro will use a unique generated name like limit__395, and callers can use good-for as expected:

(let [limit 5] (good-for [i 1 10] (if (> i limit) (println i))))
6
7
8
9

Symbol Capture

Another form of unintended capture is symbol capture, where a symbol in the macro unintentionally refers to a local binding in the environment. OL demonstrates the problem with this example:

First, w is a global collection of warnings that have occurred when using a library. In Clojure:

(def w (ref []))

This is different from the OL implementation because in Clojure data structures are immutable, and mutable things must be wrapped in a reference type that has explicit concurrency semantics. In the code above the ref wraps the immutable [].

Second, the gripe macro adds a warning to w, and returns nil. gripe is intended to be used when bailing out of a function called with bad arguments. In Clojure:

(defmacro gripe [warning]
  `(do
     (dosync (alter w conj ~warning))
     nil))

Again, this is fairly different from OL because you must be explicit about mutable state. To update w you must use a transaction (dosync) and a specific kind of update function (such as alter).

Third, there is a library function sample-ratio that performs some kind of calculation, the details of which are irrelevant to the example. sample-ratio also uses gripe to warn and bailout for certain bad inputs. In Clojure:

(defn sample-ratio [v w]
  (let [vn (count v) wn (count w)]
    (if (or (< vn 2) (< wn 2))
      (gripe "sample < 2")
      (/ vn wn))))

This is practically identical to the OL version, since there is no mutable state to (directly) deal with.

Since we are talking about symbol capture, you can probably guess the problem: What happens when the global w for warnings collides with the local w argument in sample-ratio?

In Common Lisp, this sort of capture would cause the error message to be added to the wrong collection: the local samples w instead of the global warnings w.

In Clojure, this just works. The global w resolves into a namespace, and does not collide with the local one.

More Complex Macros

Clojure's namespaces and auto-gensyms take care of many common problems in macros, but what if you really want capture? You can capture symbols by unquoting them with the unquote character (~, a tilde) and then requoting them with a non-resolving quote character (', a single quote). For example, here is a bad version of gripe that goes out of its way to do the wrong thing and capture w:

(defmacro bad-gripe [warning]
  `(do
     (dosync (alter ~'w conj ~warning))
     nil))

I am not going to show more complex macros that really need this feature. My point here is to show that Clojure doesn't make macros safer by compromising their power. You can still do nasty things, you just have to be more deliberate about it.

Interestingly, Clojure protects you from bad-gripe, even after you go to the trouble of introducing inappropriate symbol capture. Here is a bad-sample-ratio that uses the buggy bad-gripe:

(defn bad-sample-ratio [v w]
  (let [vn (count v) wn (count w)]
    (if (or (< vn 2) (< wn 2))
      (bad-gripe "sample < 2")
      (/ vn wn))))

If you try to call bad-sample-ratio with bad inputs, bad-gripe will not be able to modify the wrong collection:

 (bad-sample-ratio [] [])
-> java.lang.ClassCastException: clojure.lang.PersistentVector cannot\
   be cast to clojure.lang.Ref

Now you see how having immutability as the default can protect you from bugs. The global w is an explicitly mutable reference. But the local w is an implicitly immutable vector. When bad-gripe tries to update the wrong collection, it is thwarted by the fact that the collection is immutable.

Wrapping up

Clojure makes simple macros easier and safer to write. The combination of namespace resolution and auto-gensyms prevents many irritating bugs.

Clojure still has the power to write more complex macros when you need it. With the right combination of unquoting and quoting, you can undo the safety net and write any kind of macro you want.

One final note: Because they are ported straight from Common Lisp, many of the examples here are not idiomatic Clojure. In Clojure most uses of imperative loops such as good-for would be replaced by a more functional style. A good example of this is Clojure's own for, which performs sequence comprehension.

Notes

Other Resources

If you find this series helpful, you might also like:

Revision history

  • 2008/12/17: initial version
Picture of stu

On Lisp -> Clojure

  • Posted By Stuart Halloway on December 12, 2008
  • Tags

I am porting the examples from the macro chapters of Paul Graham's On Lisp (OL) to Clojure.

My ground rules are simple:

  • I am not going to port everything, just the code samples that interest me as I re-read On Lisp.
  • Where Paul introduced macro features in a planned progression, I plan to use whatever Clojure feature come to mind. So I may jump straight into more "advanced" topics.

Please do not assume that this port is a good introduction to Lisp! I am cherry-picking examples that are interesting to me from a Clojure perspective. If you want to learn Lisp, read OL. In fact, you should probably read the relevant chapters in OL first, no matter what.

The Series

Note: Fogus is also porting On Lisp to Clojure.

Other Stuff

If you find this series helpful, you might also like:

Talks

I am available to give conference talks on Clojure. Check the schedule for an event near you, or contact Relevance (info@thinkrelevance.com) to schedule an event.

Notes

Picture of stu

On Lisp -> Clojure, Chapter 7

  • Posted By Stuart Halloway on December 12, 2008
  • Tags

This article is part of a series describing a port of the samples from On Lisp (OL) to Clojure. You will probably want to read the intro first.

This article covers Chapter 7, Macros.

A Few Simple Macros

OL begins with a simple nil! macro that sets something to nil. nil! is implemented as a macro in Common Lisp (CL) nil needs to generate a special form. Clojure puts much more careful boundaries around mutable state, so most Clojure data structures are not set-able at all. The few things that can be set are reference types, each with an explicit API and concurrency semantics.

Because setters go through an explicit API instead of a special form, the Clojure nil! does not need to be macro at all. Here is a nil! for Clojure atoms:

(defn nil! [at]
  (swap! at (fn [_] nil)))

The swap! function is specific to atoms. Usage for nil! looks like:

(def a (atom 10))
(nil! a)
@a
-> nil  

The next interesting macro in OL is nif, which demonstrates the use of backquoting. One way to implement Clojure nif is:

((use '[clojure.contrib.fcase :only (case)])
(defmacro nif [expr pos zer neg]
  `(case (Integer/signum ~expr) 
     -1 ~neg
     0 ~zer
     1 ~pos))

There are a few interesting differences from CL here:

  • Clojure unquoting uses ~ and ~@ instead of CL's , and ,@. This allows Clojure to treat commas as whitespace.
  • Clojure does not have a built-in signum, but it has access to all of Java, including Integer/signum.
  • Clojure's case is not part of core, and is provided by Clojure Contrib.

Defining Simple Macros

OL demonstrates the "fill in the blanks" approach to writing macros:

  • Write the desired expansion.
  • Write the desired macro invocation form.
  • Use backquoting to create a template based on the desired expansion.
  • Use unquoting to substitute forms from the macro invocation into the template.

As examples, OL uses our-when and our-while. The Clojure equivalents are:

(defmacro our-when [test & body]
  `(if ~test
     (do
       ~@body)))
(defmacro our-while [test & body]
  `(loop []
     (when ~test
       ~@body
       (recur))))

There is one interesting new thing here. Clojure' loop/recur is an explicit way to denote a self-tail-call so that Clojure can implement it with a non-stack-consuming iteration. (Clojure cannot optimize tail calls in a generic way due to limitations of the JVM.)

It is also worth noting that while loops are uncommon in Clojure. They rely on side effects that change the result of test, and most Clojure functions avoid side effects.

Destructuring in Macros

Both Clojure and CL support destructuring in macro definitions. The OL example of this is a when-bind macro. Here is a literal translation in Clojure:

(defmacro when-bind [bindings & body]
  (let [[form tst] bindings]
    `(let [~form ~tst]
       (when ~form
     ~@body))))

The [form tst] is a destructuring bind. The first element of bindings binds to form, and the second element to tst. Usage looks like this:

 (when-bind [a (+ 1 2)] (println "a is" a))
a is 3

Do not use the when-bind as defined above. Clojure provides a better version called when-let:

; from Clojure core
(defmacro when-let
  [bindings & body]
  (if (vector? bindings)
    (let [[form tst] bindings]
      `(let [temp# ~tst]
         (when temp#
           (let [~form temp#]
             ~@body))))
    (throw (IllegalArgumentException.
             "when-let now requires a vector for its binding"))))

when-let adds two features not present in when-bind:

  • when-let requires that the binding form be a vector. This leads to the "arguments in square brackets" style that distinguishes Clojure from many Lisps.
  • when-let introduces a temporary binding temp# using Clojure's auto-gensym feature.

The temporary binding of temp# keeps the binding form from being expanded directly into the when, because some binding forms are not legal for evaluation. The following output shows the difference:

 (when-bind [[a & b] [1 2 3]] (println "b is" b))
->java.lang.Exception: Unable to resolve symbol: & in this context 
(when-let [[a & b] [1 2 3]] (println "b is" b))
-> b is (2 3)

If it is not clear to you why when-bind doesn't work, try calling macroexpand-1 on both the forms above.

Wrapping up

The concepts in OL Chapter 7 translate fairly directly from Common Lisp into Clojure. The bigger differences are choices of idiom. Many of the examples in Common Lisp presume mutable state. In the typical Clojure program these forms would be in the minority.

Notes

Revision history

  • 2008/12/12: initial version
Picture of stu

Living Lazy, Without Variables

  • Posted By Stuart Halloway on December 01, 2008
  • Tags

Programmers coming to functional languages for the first time cannot imagine life without variables. I address this head-on in the Clojure book. In Section 2.7 (free download here), I port an imperative method from the Apache Commons Lang to Clojure. First the Java version:

// From Apache Commons Lang, http://commons.apache.org/lang/
public static int indexOfAny(String str, char[] searchChars) {
  if (isEmpty(str) || ArrayUtils.isEmpty(searchChars)) {
      return -1;
  }
  for (int i = 0; i < str.length(); i++) {
      char ch = str.charAt(i);
      for (int j = 0; j < searchChars.length; j++) {
        if (searchChars[j] == ch) {
            return i;
        } 
      }
  }
  return -1;
}

And now the Clojure code. I have shown the supporting function indexed as well:

(defn indexed [s] (map vector (iterate inc 0) s))
(defn index-of-any [s chars]
  (some (fn [[idx char]] (if (get chars char) idx)) 
          (indexed s)))

There are many things I like about the Clojure version, but I want to focus on something I didn't mention already in the book. A reader thought the Clojure version did too much work:

...the [Java] version can be seen as *more efficient* when a match is found because scanning stops right there, whereas "indexed" constructs the whole list of pairs, regardless of whether or not a match WILL be found....

The reader's assumption is reasonable, but incorrect. Clojure's sequence library functions are generally lazy. So the call to indexed is really just a promise to generate indexes if they are actually needed.

To see this, create a logging-seq that writes to stdout every time it actually yields an element:

(defn logging-seq [s]
  (if s
    (do (println "Iterating over " (first s))
    (lazy-cons (first s) (logging-seq (rest s))))))

Now, you can add logging-seq to indexed so that each element of indexed is of the form [index, element, logged-element].

(defn indexed [s] (map vector (iterate inc 0) s (logging-seq s)))

Test the modified indexed function at the Clojure REPL:

user=> (indexed "foo")
Iterating over  f
(Iterating over  o
[0 \f \f] Iterating over  o
[1 \o \o] [2 \o \o])

As you can see, the indexed sequence is only produced as needed. (At the REPL it is needed to print the return value.)

Finally, you can test indexed-of-any and see that Clojure only produces enough of the sequence to get an answer. For a match on the first character, it only goes to the first character:

(index-of-any "foo" #{\f})
Iterating over  f
0

If there is no match, index-of-any has to traverse the entire string:

(index-of-any "foo" #{\z})
Iterating over  f
Iterating over  o
Iterating over  o
nil

So give up on those variables, and live lazy!

Picture of stu

Clojure Wins Again

  • Posted By Stuart Halloway on November 21, 2008
  • Tags

Steve Yegge's most recent post takes a right angle turn about a third of the way through, and begins a comparison of Emacs Lisp and JavaScript.

And the winner is ... Clojure!

OK, Steve didn't say that. What he did do was call out things he liked about JavaScript and Emacs Lisp.

For JavaScript:

  • momentum
  • (namespace) encapsulation
  • delegation (polymorphism?)
  • properties (by Steve's definition)
  • serialize to source

For Emacs Lisp:

  • Macros
  • S-Expressions

I first picked up Clojure looking for many of the same things that Steve wants. I found them. Clojure can do all the things on both lists above. (Serialize to source isn't formal yet, but check the mailing list. And of course, you will have to judge "momentum" for yourself.)

The scary thing is that Clojure wins the language war before you even learn about its signature features. When I started exploring Clojure, I quickly realized it had everything I wanted, which could be summarized as "Lisp that really embraces the Java platform."

Then Clojure changed the definition of what I wanted. Now I also want

If you have half an hour, watch a compelling vision of what software development will look like in 2010.

Picture of stu

Clojure Beta Book Available

  • Posted By Stuart Halloway on November 05, 2008
  • Tags

The Clojure Beta book is now available. Here's the Table of Contents. (Chapters with an asterisk are included in this beta.)

  • Preface*
  • Getting Started*
  • Exploring Clojure*
  • Working with Java*
  • Unifying Data with Sequences*
  • Functional Programming
  • Concurrency*
  • Macros
  • Multimethods
  • Third-Party Libraries
  • Case Study

Because this is a Beta book, and Clojure is continuing to evolve, there will be errata. Please let me know any problems you find, and I will address them in the next Beta.

Other Clojure resources