Sunday, September 9, 2012

Beautiful Clojure

I ran across some beautiful code today.

I have the book Beautiful Code, though I've only read the chapter by Douglas Crockford on writing a simplified JavaScript parser using Top Down Operator Precedence (say that three times fast). And I skimmed the MapReduce chapter once. It's on my TODO list. Along with a lot of other things. So much awesome CS to do, so little ...

So that's not the beautiful code I'm referring to.

I spent the last few days going back to do some problems at 4Clojure. I finished about half of them a number of months ago and now it's time to make some progress on them again.

So I picked an "easy" one (so they claim). And got stumped. It was Problem 28: Flatten a Sequence. In the end, after only coming up with solutions involving mutable state (for shame), I decided to learn from the masters and look at the source code to clojure.core/flatten.

/* ---[ Beautiful Clojure, version 1 ]--- */

If you haven't seen it before, it's quite interesting:

It treats a nested collection, such as [1 2 [3 [4 5] 6]], like a tree, where the collection and its subcollections are branches (non-leaf nodes) and the values are the leaf-nodes.

It depends on tree-seq to do most of the work. I won't reproduce it here but take a look to understand it better.

After grokking the idea of treating the data structure like a tree, my next question was "why does flatten have to filter the results?". Here's why:

test=> (def ds [1 2 [3 4 [5 "a"]]])
test=> (tree-seq sequential? seq ds)
([1 2 [3 4 [5 "a"]]] 1 2 [3 4 [5 "a"]] 3 4 [5 "a"] 5 "a")

I see, tree-seq returns both branches (with their children) and leaf-nodes. To get just the leaf-nodes, I have to filter out the branches.

That is a fascinating idea, but I have to admit it wasn't what I expected the code to flatten to look like.

Then I solved the problem on 4clojure and was eager to see how others had solved it. I found two right away that immediately struck me as beautiful code.

/* ---[ Beautiful Clojure, version 2 ]--- */

The first, from nikkomega, uses mapcat, which works like map but calls (apply concat) on the result. I am in awe of its elegance:

Use map to iterate over each element in the data structure. If it is a "value", like 3, then return '(3). If it is a collection (more precisely implements the Sequential Clojure interface), then recursively dive into that collection.

In the end the outer map returns a sequence: '('(1) '(2) '(3) '(4) '(5) '("a")) (using the collection I listed above). Applying concat gives us the flattened sequence: '(1 2 3 4 5 "a").

Like clojure.core/flatten, this one also returns a LazySeq, but we get that for free - both map and concat return lazy-seqs.

/* ---[ Beautiful Clojure, version 3 ]--- */

The second solution that caught my eye is from unlogic. In the end it is pretty much the same algorithm, but uses reduce to iterate over all the elements and concats the results as it goes:

/* ---[ Beautiful Groovy? ]--- */

So I wondered, could I do this in Groovy, a language that provides some nice functional programming features?

Here is my best shot:

It prints:

[1, 2, 3, 4, 5, a]
[1, 2, 3, 4, 5, a]

(Disclaimer: I'm still learning Groovy so if you see a better way, I'd be happy to hear it.)

inject is (unfortunately) the name for reduce in Groovy (and other languages, like Ruby). In Groovy, there is no mapcat (that I know of) so I wrote it using inject and +, the latter being the equivalent of Clojure's concat when applied to collections.

You'll notice that I am using mutable state here: the shovel operator, <<, pushes onto and modifies in place the data structure I'm building up and the concat in flat3 has to be assigned back to x. I'd like to see a version that doesn't use immutable state, but perhaps even here it is justified? No less than Rich Hickey stated in his seminal "Are We There, Yet?" presentation that he's "not a fundamentalist, but rather a pragmatist": there are times when working with local variables in a contained setting that mutating state is a pragmatic and safe thing to do. Does that apply here?

Comments welcome.

No comments:

Post a Comment