Saturday, January 12, 2013

Go Concurrency Constructs in Clojure, part 3: why go-lightly?

There's a legendary example called the concurrent prime sieve, which is kind of an amazing thing. It was the first truly beautiful concurrent program I think I ever saw.

--Rob Pike, Google IO 2012 conference talk

/* ---[ Why go-lightly? ]--- */

In part 1 and part 2 of this blog series, I introduced the basics of Go channels, Go routines and the Go select statement. I then walked through initial implementations of these ideas in the go-lightly library and how to use the lamina channel and facilities to do Go-style CSP concurrent programming.

If the lamina library, which is 2+ years old now (thus reasonably mature and stable) and under active development, can be used for this, why am I proposing a new library? Well, I might have built one anyway just to get familiar with CSP style programming and improve my Clojure skills, but ultimately, I do think there is a good justification for us to consider a new library focused just around this.

The lamina library is fundamentally focused an asynchronous event-driven programming. Since dealing with callbacks can get messy and is hard to structure in a functional way, the core construct and central metaphor of Zach Tellman's approach to async programming is a channel that is used for putting and pulling events. Since a key focus of async event-driven programming is to avoid blocking, there are very few blocking operations in the lamina library. One case where it is provided is that you can choose to wait on pulling a value out of a channel. This is the part we've seen in use to emulate Go channels.

However, the primary use case for lamina channels is an event queue, which means you want it to be unbounded and non-blocking, especially for events being put onto the queue. Thus, lamina uses a Java ConcurrentLinkedQueue underneath.

Go channels, however, come in two flavors: bounded, blocking queues of size 0 (every put has to have a corresponding take) and bounded, asynchronous queues of a size you specify in the make function. The lamina channel really maps to neither, though in some scenarios it can be used for async queues or blocking queues where you need to block on read (but not write).

As I discussed in the first blog entry, Java's util.concurrent library already provides these Go channel types and even more variations on them. The bounded, blocking queue maps to a SynchronousQueue or a TransferQueue (if you only use the transfer and take methods). The bounded, asynchronous queue maps to LinkedBlockingQueue.

Thus, go-lightly proposes to wrap these Java concurrent queues, specifically facilitating a Go-style CSP concurrency semantics.

/* ---[ Why bounded blocking queues? ]--- */

So what is a use case where I really need a bounded blocking queue?

First from here on out I will use the term channel or Go channel to refer to a blocking queue of size 0 and buffered channel to refer to a non-blocking queue of arbitrary size: this is the Go terminology.

Rephrasing the question - when would I need a channel and not a buffered channel? With a buffered channel you "fire-and-forget" and let some other thread pluck it off the buffered channel when it's ready.

A channel, on the other hand, is a synchronization mechanism between threads/routines similar to a CountDownLatch, CyclicBarrier or join of a fork-join model, except you not only synchronize threads, but pass messages between them, so it is synchronizing communication tool.

/* ---[ Beautiful concurrency ]--- */

The golang site provides an example a concurrent prime sieve algorithm that, as implemented, requires blocking channels. If you were to use a lamina channel or buffered channel you'd potentially have some threads running way ahead of the others unnecessarily consuming memory and wasting CPU cycles.

This is the "first truly beautiful concurrent program" Pike referred to in his Google IO 2012 talk.

Let's look at the Go implementation from the Golang website first:

The Generate and Filter functions absolutely need to synchronize - when they push data onto a channel, they need to wait until the consumer (a chained filter function or main) is ready to pull it off.

Here is a Clojure version using go-lightly:

Happily, the implementations are pretty much the same line-for-line.

/* ---[ Next ]--- */

In the next blog entry, I will contrast the Go CSP concurrency model to the Clojure concurrency model and add some functions to allow channels to interoperate with the Clojure seq abstraction.

/* ---[ Resources ]--- */

Both of the prime sieve examples are available in the GitHub go-lightly repo:

Blog entries in this series:

1 comment:

  1. Reading your article is such a privilege. It does inspire me, I hope that you can share more positive thoughts. Visit my site too. The link is posted below.