Sunday, January 13, 2013

Go Concurrency Constructs in Clojure, part 4: idioms and tradeoffs

The Go approach to concurrent software can be characterized as: Don't communicate by sharing memory, share memory by communicating.... You use the channel to pass the data back and forth between the Go routines and make your concurrent program operate that way.

--Rob Pike, Google IO 2012 conference talk
Programmers know the benefits of everything and the tradeoffs of nothing.

--Rich Hickey, Strangeloop 2011 talk "Simple Made Easy"

/* ---[ Functional Clojure Idioms ]--- */

In the preface to Eloquent Ruby, Russ Olsen relates a story that after teaching a Ruby class one of his students complained that his Ruby programs tended to end up looking like his Java programs. I remember that same experience when I first learned Ruby in the early 2000s. In fact, I set up conventions for myself in Ruby to try to "force" it to be more like Java. I hadn't fully grokked that changing languages does not mean just learning the syntax and libraries. It means adopting the idioms, the approaches and even the constraints that the designer put into the language and that have arisen in the language community. It often means changing the way you think. That is certainly true of Clojure, perhaps more than any language I've ever learned. (Side note: I haven't learned Haskell yet!)

Go is intended primarily to be systems-programming language, with a strong focus on writing concurrent server programs. While it does include some more "modern" functional features, such as closures and first class functions, it is not a functional programming language.

These blog posts and the go-lightly library are my attempt to think about how to adopt a Go-like CSP concurrency programming style into Clojure. But we should also think about when to adopt this style of programming. I don't have an answer yet and I'm writing this library to explore this area.

The Go-channel model is a message passing model, which you could view as a poor-man's Actor model, something Rich Hickey considered for the Clojure language and decided to leave out. He outlines those reasons in the page (see the "Message Passing and Actors" section in particular).

On the blog lead-in above, I quote Pike saying that Go's model is to share memory by communicating, rather than the other way around. Hickey argues that the message-passing model is a complex model. Remember that "complex", in the Clojure community, is an objective measure of how entwined things are. Sharing memory by communicating is more complex as you have to coordinate entities, particularly if you have blocking waits. If one depends directly on the other, you have an entangled system. Coordinating multiple entities can be difficult and with blocking operations can lead to deadlocks. If you use timeouts to overcome potential deadlocks, then you have to add special logic to your code to deal with it. Sharing memory with immutable values or STM-protected values is often a simpler, less complected, model.

Go (synchronous) channels are for synchronizing threads or routines. When you need to synchronize in other languages you have constructs like CountDownLatches, CyclicBarriers, waiting on a future, a join in a fork-join model or, the lowest level, mutexes and semaphores. The synchronous channel provides an easier model that also allows message passing. But remember that easy is not necessarily simple [footnote 1] and consider the tradeoffs.

/* ---[ Channels as Queues ]--- */

Go buffered channels, on the other hand, are not synchronous communication tools. They are queues for asynchronous workflows. By decoupling producer(s) and consumer(s), they are less complected. Hickey, in his Simple Made Easy talk, has a table listing paired concepts where one is more complex and the other simple. On his chart, Actors are juxtaposed with queues: Actors are complex, queues are simple. And in the 2012 Clojure conj keynote, Hickey stated that queues have been underemphasized in the Clojure community so far.

Thus, as far as channels go, the asynchronous buffered ones are more idiomatic in Clojure than synchronous channels. In fact, an async concurrent queue is used in the Clojure Programming book's webcrawler example. For contrast, I implemented this webcrawler example using go-lightly.

On the other hand, from what I've seen, synchronous channels are very idiomatic in Go, and perhaps even preferred over buffered channels. That is the impression I've gotten from watching Pike's talks and reading a few threads on the golang mailing list. For example, see this thread where one participant says:

Go channels can be asynchronous, but most of the time that's not
what you want. When communicating between goroutines running on the
same machine a synchronous send/recv improves program flow. Synchronous
channels have a lot of advantages by making program flow predictable
and easier to think about.

I'll leave it there as an open question deserving careful thought.

/* ---[ Making channels more idiomatic ]--- */

As I've been doing various example programs with go-lightly, I've noticed that the code structure can be more imperative than functional, in part because channels are not composable data structures. You can't pass a channel to map, reduce or filter, since it does not implement the ISeq interface.

To remedy this, I've added four functions to go-lightly to allow you to treat it like a seq when retrieving from it.

The first two functions convert the current values on a channel to a seq or a vector without removing them from the channel. The latter two functions remove or drain the channel either immediately (non-lazy) or as you read from it in a lazy fashion.

(channel->seq chan)
"Takes a snapshot of all values on a channel without removing the values from the channel. Returns a (non-lazy) seq of the values.

(channel->vec chan)
"Takes a snapshot of all values on a channel without removing the values from the channel. Returns a vector of the values."

(drain chan)
"Removes all the values on a channel and returns them as a non-lazy seq."

(lazy-drain chan)
"Lazily removes values from a channel. Returns a Cons lazy-seq until it reaches the end of the channel (as determined by getting a nil value when asking for the next value on the channel)."

All the sequences will end once a nil value is pulled off the queue, which represents the end of the queue. Since the lazy-drain function is lazy if something else added to the queue before the end of the queue is reached, it will read that additional value, where the non-lazy drain method will not.

A REPL session will illustrate how these work.

First let's look at channel->seq using a buffered channel:

  ;; define a channel with capacity of 7
  user=> (def ch (go/channel 7))
  user=> (dotimes [i 6] (.put ch i))
  user=> ch
  #<LinkedBlockingQueue [0, 1, 2, 3, 4, 5]>

  ;; grab the values into a non-lazy seq
  user=> (def cseq (go/channel->seq ch))
  user=> cseq
  (0 1 2 3 4 5)
  user=> (type cseq)

  ;; the values have not been removed from the channel
  user=> ch
  #<LinkedBlockingQueue [0, 1, 2, 3, 4, 5]>

  ;; if a value is removed from the channel the seq is not affected
  user=> (.take ch)
  user=> ch
  #<LinkedBlockingQueue [1, 2, 3, 4, 5]>
  user=> cseq
  (0 1 2 3 4 5)

channel->vec behaves the same way except it returns a vector, not a seq.

Next let's look at the drain functions using a buffered channel:

  user=> (def ch (go/channel 7))
  user=> (dotimes [i 6] (.put ch i))
  user=> ch
  #<LinkedBlockingQueue [0, 1, 2, 3, 4, 5]>

  ;; calling drain returns a seq of all the values on the
  ;; channel and removes them
  user=> (def dseq (go/drain ch))
  user=> (type dseq)
  user=> dseq
  (0 1 2 3 4 5)
  user=> ch
  #<LinkedBlockingQueue []>

  ;; add more elements to the queue; the seq is not affected
  user=> (dotimes [i 6] (.put ch (+ 100 i)))
  user=> ch
  #<LinkedBlockingQueue [100, 101, 102, 103, 104, 105]>
  user=> dseq
  (0 1 2 3 4 5)

  ;; now let's lazily drain the queue into a lazy-seq Cons
  user=> (def zseq (go/lazy-drain ch))
  user=> (type zseq)

  ;; realize the first three elements - takes only those
  ;; off the channel
  user=> (take 3 zseq)
  (100 101 102)
  user=> ch
  #<LinkedBlockingQueue [103, 104, 105]>

  ;; take more than are on the channel - get only what's available
  user=> (take 100 zseq)
  (100 101 102 103 104 105)
  ;; the channel is now empty
  user=> ch
  #<LinkedBlockingQueue []>

  ;; what if we try to take/read them again? They are still
  ;; in the lazy-seq since it caches the results
  user=> (take 100 zseq)
  (100 101 102 103 104 105)

  ;; we can use higher order functions - composability!
  user=> (map str (filter odd? zseq))
  ("101" "103" "105")

These functions also work with synchronous channel, but are not as useful. In particular lazy-seq faces a race condition with producers that try to transfer multiple consecutive values as shown below:

  ;; create a synchronous channel
  user=> (def c (go/channel))

  ;; queue up 6 values to be put onto the queue but
  ;; only one can go on at a time waiting for a consumer
  user=> (go/go (dotimes [i 6] (.transfer c i)))
  #<core$future_call$reify__6110@5d47522a: :pending>
  user=> c

  ;; channel->vec and channel->seq will grab one value since
  ;; a producer is waiting for a consumer
  #<LinkedTransferQueue [0]>
  user=> (go/channel->vec c)
  user=> c
  #<LinkedTransferQueue [0]>

  ;; drain also takes only the first value and also removes it
  ;; from the channel, allowing the next val to be put on the channel
  user=> (go/drain c)
  user=> c
  #<LinkedTransferQueue [1]>

  ;; lazy-drain looks like it works!
  user=> (take 2 (go/lazy-drain c))
  (1 2)
  user=> c
  #<LinkedTransferQueue [3]>

  ;; but it has a race condition with the producer, so may
  ;; not get everything we "queued" up to transfer
  user=> (go/lazy-drain c)
  (3 4)
  user=> c
  #<LinkedTransferQueue [5]>

/* ---[ Next ]--- */

I've now created a go-lightly wiki with fairly extensive documentation and I've implemented a number of example applications using go-lightly.

A couple of things you may want to look into if you find this topic interesting:

  • I've added formal abstractions for the Channel types in go-lightly. Channel, BufferedChannel and TimeoutChannel all implement the GoChannel protocol.
  • As mentioned above, I have done a go-lightly centric implementation of a simple web crawler app based on the example at the end of Ch. 4 from the O'Reilly Clojure Programming book. This will provide a good contrast between the two concurrency approaches.
  • I have added the ability to preferentially read from one or more channels in a select or selectf.
  • I implemented Pike's Chinese Whispers example in go-lightly to see how many "Go routines" could be spawned in Clojure compared to Go. This is certainly an area where the JVM is less powerful than Go.

/* ---[ Resources and Notes ]--- */

[1] If you've spent much time in the Clojure community, you know I'm referring to the distinction that Hickey drew between the concepts of easy, a subjective concept, and simple, an objective one in his Simple Made Easy presentation. If you haven't watched it, well, listen to Bodil. (Jump back)

Blog entries in this series:

The Clojure go-lightly library on GitHub:

No comments:

Post a Comment