ThornyDev

An Exercise in Profiling a Go Program

2015-07-02T18:25:00.001-04:00

Recently, while working on my current project ogonori, a Go client for the OrientDB database, I found that I had a defect in the code that encodes and decodes integer values in the way that the OrientDB binary network protocol requires (namely zigzag encoding, followed by encoding that output as a variable length integer).

After fixing the issue, first with the encoder and then with the decoder, I decided that I should do an exhaustive test of all 64 bit integers: start with MinInt64 (-9223372036854775808), zigzag-encode it, varint encode it, then varint decode it and zigzag-decode it and you should get back the number you started with. Increment by 1 and try it again, until you reach MaxInt64 (9223372036854775807).

(Note: I only have to use the Min/Max range of signed integers, since OrientDB is a Java database and only allows signed ints.)

I ran a small range of the possible 64-bit integer space and found that doing this exhaustive test was going to take a very long time. Since I have 8 CPUs on my system, I decided to first parallelize the test into 8 separate goroutines, each taking 1/8 of the total range:

With this code, I spawn 8 threads, running 10 goroutines. Eight of them do the encoding/decoding test and if any integer encode/decode fails the test it is written to "failure channel" of type chan string, which the main goroutine monitors.

A sync.WaitGroup (a counting semaphore) is created and shared among the goroutines. When each "range tester" finishes, it calls Done() on the WaitGroup to decrement the semaphore. The final (nameless) goroutine waits until all "range tester" goroutines have finished and then closes the single shared failure channel.

Closing of the failure channel, causes the loop over that channel in the main goroutine to exit and the whole program finishes.

`/* ---[ Performance Baseline ]--- */`

I fired this up with the following smaller testrange:

    ranges := []testrange{
        {100000001, 150000001},
        {200000001, 250000001},
        {300000001, 350000000},
        {400000001, 450000000},
        {500000001, 550000000},
        {600000001, 650000000},
        {700000001, 750000000},
        {800000001, 850000000},
    }

and ran top. To my surprise I was only using ~400% CPU, rather than ~800% (the max my system supports):

$ top -d1
 PID USER      PR  ... S  %CPU %MEM     TIME+ COMMAND                                                       
1736 midpete+  20  ... S 420.9  0.0   1:31.33 ogonori

I then looked at the CPU usage of each thread using the -H option to top and saw that my 8 range-tester goroutines were each using only about 50% CPU. And that there was a 9th thread that was also consistently using 40 to 50% CPU. My guess was that this was a GC thread.

$ top -d1 -H
 PID USER      PR  ...  S %CPU %MEM     TIME+ COMMAND                                                        
1740 midpete+  20  ...  S 50.1  0.0   0:21.47 ogonori                                                        
1744 midpete+  20  ...  R 50.1  0.0   0:21.52 ogonori                                                        
1742 midpete+  20  ...  S 49.2  0.0   0:21.38 ogonori                                                        
1736 midpete+  20  ...  S 47.2  0.0   0:21.53 ogonori                                                        
1738 midpete+  20  ...  S 46.2  0.0   0:22.11 ogonori                                                        
1745 midpete+  20  ...  R 46.2  0.0   0:20.37 ogonori                                                        
1741 midpete+  20  ...  S 45.2  0.0   0:21.41 ogonori                                                        
1743 midpete+  20  ...  R 42.3  0.0   0:21.26 ogonori                                                        
1739 midpete+  20  ...  S 40.3  0.0   0:21.35 ogonori                                                        
1737 midpete+  20  ...  S  3.9  0.0   0:02.07 ogonori

So I have an algorithm that should be trivially parallelizable with no shared memory and no contention (in theory), but it was only using half the CPU available to it. Hmmm...

Next I ran the test on my system several times to get a baseline performance metric:

$ time ./ogonori -z  # the -z switch tells the ogonori code to only this
                     # benchmark rather than the usual OrientDB tests

Run1: real    3m44.602s
Run2: real    3m42.818s
Run3: real    3m28.917s

Avg ± StdErr:  218.8 ± 5 sec

Then I remembered I had not turned off the CPU power saving throttling on my Linux system (it was set to ondemand), so I ran the following script and repeated the benchmarks:

#!/bin/bash
for i in /sys/devices/system/cpu/cpu[0-7]
do
    echo performance > $i/cpufreq/scaling_governor
done

$ time ./ogonori -z
Run1: real    2m12.605s
Run2: real    2m12.382s
Run3: real    2m13.172s
Run4: real    2m18.992s
Run5: real    2m17.538s
Run6: real    2m14.437s

Avg ± StdErr:  134.9 ± 1 sec

Wow, OK. So that alone gave me about a 60% improvement in throughput. Off to a good start.

`/* ---[ Profiling the Code ]--- */`

If you've never read Russ Cox's 2011 blog post on profiling a Go program, put it on your list - it is a treat to read.

Using what I learned there, I profiled the zigzagExhaustiveTest code to see how and where to improve it.

$ ./ogonori -z -cpuprofile=varint0.prof

I then opened the .prof file with golang's pprof tool and looked at the top 10 most heavily used functions:

$ rlwrap go tool pprof ogonori xvarint0.prof
  # Using rlwrap gives you bash-like behavior and history

(pprof) top 10
171.48s of 255.92s total (67.01%)
Dropped 171 nodes (cum <= 1.28s)
Showing top 10 nodes out of 36 (cum >= 8.78s)
      flat  flat%   sum%        cum   cum%
    45.98s 17.97% 17.97%     45.98s 17.97%  scanblock
    25.63s 10.01% 27.98%     33.58s 13.12%  runtime.mallocgc
    19.20s  7.50% 35.48%    111.35s 43.51%  g/q/o/o/b/varint.ReadVarIntToUint
    14.94s  5.84% 41.32%     15.62s  6.10%  bytes.(*Buffer).grow
    12.44s  4.86% 46.18%     12.44s  4.86%  runtime.MSpan_Sweep
    11.87s  4.64% 50.82%     15.93s  6.22%  bytes.(*Buffer).Read
    11.33s  4.43% 55.25%     21.56s  8.42%  bytes.(*Buffer).WriteByte
    11.18s  4.37% 59.62%     11.18s  4.37%  runtime.futex
    10.13s  3.96% 63.57%     19.16s  7.49%  bytes.(*Buffer).Write
     8.78s  3.43% 67.01%      8.78s  3.43%  runtime.memmove

(pprof) top10 -cum
110.32s of 255.92s total (43.11%)
Dropped 171 nodes (cum <= 1.28s)
Showing top 10 nodes out of 36 (cum >= 25.50s)
      flat  flat%   sum%        cum   cum%
         0     0%     0%    147.62s 57.68%  runtime.goexit
     2.94s  1.15%  1.15%    147.49s 57.63%  main.func·018
    19.20s  7.50%  8.65%    111.35s 43.51%  g/q/o/o/b/varint.ReadVarIntToUint
         0     0%  8.65%     77.81s 30.40%  GC
    45.98s 17.97% 26.62%     45.98s 17.97%  scanblock
     4.90s  1.91% 28.53%     38.48s 15.04%  runtime.newobject
    25.63s 10.01% 38.55%     33.58s 13.12%  runtime.mallocgc
     6.65s  2.60% 41.15%     31.39s 12.27%  g/q/o/o/b/varint.VarintEncode
         0     0% 41.15%     30.48s 11.91%  System
     5.02s  1.96% 43.11%     25.50s  9.96%  encoding/binary.Read

We can see that a significant percentage of time (>30%) is being spent in GC, so the program is generating a lot of garbage somewhere - plus the cost of generating new heap data, which the runtime.mallocgc figure tells me is at least 13% of the program run time.

Remember that there are four steps to my algorithm:

zigzag encode (varint.ZigzagEncodeUInt64)
varint encode (varint.VarintEncode)
varint decode (varint.ReadVarIntToUint)
zigzag decode (varint.ZigzagDecodeInt64)

The zigzag encode/decode steps are simple bit manipulations, so they are fast. Typing web at the pprof prompt launches an SVG graph of where time was spent. The zigzag functions don't even show up - they were dropped off as being too small (not shown here).

So I needed to focus on steps 2 and 3 which take (cumulatively) 43.5% and 12.3%, respectively.

Since varint.ReadVarIntToUint is the biggest offender let's look at it in detail in the pprof tool:

I've marked the biggest time sinks with an arrow on the left side. Generally one should start with the biggest bottleneck, so let's rank these by cumulative time (2nd col):

->  32.41s   111:   err = binary.Read(&buf, binary.LittleEndian, &u)
->  16.83s    73:   n, err = r.Read(ba[:])
->  15.93s   106:   buf.WriteByte(y | z)
->  14.82s    88:   var buf bytes.Buffer
->   8.53s   110:   padTo8Bytes(&buf)

First, it is very interesting how expensive creating a bytes.Buffer is. But first we need to deal with binary.Read.

Because I'm only ever passing in uint64's, the only real functionality I'm using in this function is:

*data = order.Uint64(bs)

`/* ---[ Optimization #1 ]--- */`

But it's even worse. If you look back at varint.ReadVarIntToUint you'll see that I'm creating a bytes.Buffer and copying bytes into it only so that I can pass that Buffer (as an io.Reader) into the binary.Read function:

err = binary.Read(buf, binary.LittleEndian, &u)

which then immediately copies all those bytes back out of the buffer:

if _, err := io.ReadFull(r, bs); err != nil {
    return err
}

So this is nothing but wasteful data copying and the heap allocations for it.

binary.Read also does a type switch where a good percentage of time is spent

     2.01s      4.49s    151:       switch data := data.(type) {

and, as stated, the only useful method ever called in it is:

 -->  460ms      2.76s    167:          *data = order.Uint64(bs)

So I should try just calling binary.LittleEndian.Uint64(bs) directly.

Here's the revised varint.ReadVarIntToUint function (with everything inlined for easier reading and profiling analysis):

This change also removes the padTo8Bytes method that wrote one byte at a time to the bytes.Buffer and took >3% of program time itself.

Now let's rerun the benchmarks:

Run 1: real  0m27.182s
Run 2: real  0m27.053s
Run 3: real  0m28.200s
Run 4: real  0m25.762s
Run 5: real  0m26.031s
Run 6: real  0m26.813s

Avg ± StdErr:  26.8 ± 0.4 sec

Outstanding! Throughput increased 5x (134.9/26.8). And using top, I see that the goroutines are consuming nearly all available CPU:

$ top -d1
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
12983 midpete+  20   0  352496   5768   2736 R 763.7  0.0   1:35.64 ogonori

$ top -d1 -H
  PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND                                                       
13231 midpete+  20   0  286960   5772   2744 R 97.5  0.0   0:22.51 ogonori                                                       
13225 midpete+  20   0  286960   5772   2744 R 91.7  0.0   0:22.47 ogonori                                                       
13227 midpete+  20   0  286960   5772   2744 R 90.7  0.0   0:23.09 ogonori                                                       
13232 midpete+  20   0  286960   5772   2744 S 90.7  0.0   0:22.26 ogonori                                                       
13235 midpete+  20   0  286960   5772   2744 R 90.7  0.0   0:09.72 ogonori                                                       
13230 midpete+  20   0  286960   5772   2744 R 88.7  0.0   0:22.14 ogonori                                                       
13233 midpete+  20   0  286960   5772   2744 R 73.1  0.0   0:22.70 ogonori                                                       
13228 midpete+  20   0  286960   5772   2744 R 71.2  0.0   0:22.39 ogonori                                                       
13229 midpete+  20   0  286960   5772   2744 R 70.2  0.0   0:23.09 ogonori

I also used pprof to profile this run, so let's examine compare the cumulative top10 before and after:

Before (reprinted from above):

(pprof) top10 -cum
110.32s of 255.92s total (43.11%)
Dropped 171 nodes (cum <= 1.28s)
Showing top 10 nodes out of 36 (cum >= 25.50s)
      flat  flat%   sum%        cum   cum%
         0     0%     0%    147.62s 57.68%  runtime.goexit
     2.94s  1.15%  1.15%    147.49s 57.63%  main.func·018
    19.20s  7.50%  8.65%    111.35s 43.51%  g/q/o/o/b/varint.ReadVarIntToUint
         0     0%  8.65%     77.81s 30.40%  GC
    45.98s 17.97% 26.62%     45.98s 17.97%  scanblock
     4.90s  1.91% 28.53%     38.48s 15.04%  runtime.newobject
    25.63s 10.01% 38.55%     33.58s 13.12%  runtime.mallocgc
     6.65s  2.60% 41.15%     31.39s 12.27%  g/q/o/o/b/varint.VarintEncode
         0     0% 41.15%     30.48s 11.91%  System
     5.02s  1.96% 43.11%     25.50s  9.96%  encoding/binary.Read

After:

(pprof) top15 -cum
63680ms of 65970ms total (96.53%)
Dropped 33 nodes (cum <= 329.85ms)
Showing top 15 nodes out of 18 (cum >= 930ms)
      flat  flat%   sum%        cum   cum%
    2280ms  3.46%  3.46%    64470ms 97.73%  main.func·018
         0     0%  3.46%    64470ms 97.73%  runtime.goexit
   17760ms 26.92% 30.38%    34190ms 51.83%  g/q/o/o/b/varint.ReadVarIntToUint
    5890ms  8.93% 39.31%    26370ms 39.97%  g/q/o/o/b/varint.VarintEncode
    8550ms 12.96% 52.27%    16360ms 24.80%  bytes.(*Buffer).Write
    9080ms 13.76% 66.03%    11500ms 17.43%  bytes.(*Buffer).Read
    1460ms  2.21% 68.24%     7550ms 11.44%  runtime.newobject
    4370ms  6.62% 74.87%     6090ms  9.23%  runtime.mallocgc
    5650ms  8.56% 83.43%     5650ms  8.56%  runtime.memmove
    4580ms  6.94% 90.37%     4580ms  6.94%  bytes.(*Buffer).grow
     680ms  1.03% 91.41%     1630ms  2.47%  bytes.(*Buffer).Reset
    1500ms  2.27% 93.68%     1500ms  2.27%  encoding/binary.littleEndian.Uint64
         0     0% 93.68%     1030ms  1.56%  GC
     950ms  1.44% 95.12%      950ms  1.44%  bytes.(*Buffer).Truncate
     930ms  1.41% 96.53%      930ms  1.41%  runtime.gomcache

More good news. In the previous version, GC was taking 30% of the total CPU time. Now, more than 90% of the time is now being spent in the two main workhorse methods: varint.ReadVarIntToUint and varint.VarintEncode. GC time has been reduced to 1.5%!

I suspect the reason that goroutines in the earlier code version only took 40-50% of a CPU is because GC was the contention point. Garbage Collection in golang is a stop-the-world affair, so all other threads are paused until it finishes. By reducing GC to only 1.5%, now the range-testing goroutines can spend far more time running - approaching 100%.

`/* ---[ Optimization #2 ]--- */`

Are there further improvements we can make? Since the program now spends 40% of its time in varint.VarintEncode, let's look at that function in detail:

Almost 75% of the time in this function is spent writing to the io.Writer (a bytes.Buffer). We write one byte at a time to it. Perhaps it would be better to write it all to a byte slice first and then issue one w.Write.

The new code is then:

And the next round of benchmarks are:

real    0m38.899s
real    0m45.135s
real    0m38.047s
real    0m42.377s
real    0m32.894s
real    0m37.962s
real    0m38.926s
real    0m37.870s

Avg ± StdErr: 39.0 ± 1.2

Hmm, not good. It looks like this second revision caused my code to go backwards in performance by 30%. To be sure, I reverted the change and re-ran the benchmarks with only optimization #1 again: they returned to the ~25s/run timeframe I saw before. So it is true: this second change made things worse.

And the analysis of top agreed: the goroutines were no long using 90%+ CPU:

$ top -d1
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                      
22149 midpete+  20   0  286960   5776   2744 R 593.9  0.0   1:06.66 ogonori 

$ top -d1 -H
  PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND                                                       
22205 midpete+  20   0  229620   7812   2744 R 74.6  0.0   0:10.68 ogonori                                                       
22201 midpete+  20   0  229620   7812   2744 R 73.6  0.0   0:10.14 ogonori                                                       
22202 midpete+  20   0  229620   7812   2744 S 71.6  0.0   0:10.77 ogonori                                                       
22207 midpete+  20   0  229620   7812   2744 S 70.6  0.0   0:10.97 ogonori                                                       
22206 midpete+  20   0  229620   7812   2744 R 68.7  0.0   0:11.14 ogonori                                                       
22204 midpete+  20   0  229620   7812   2744 R 65.7  0.0   0:10.07 ogonori                                                       
22199 midpete+  20   0  229620   7812   2744 R 56.9  0.0   0:10.98 ogonori                                                       
22203 midpete+  20   0  229620   7812   2744 R 53.0  0.0   0:11.19 ogonori                                                       
22197 midpete+  20   0  229620   7812   2744 R 43.2  0.0   0:11.17 ogonori                                                       
22200 midpete+  20   0  229620   7812   2744 S 17.7  0.0   0:09.95 ogonori                                                       
22198 midpete+  20   0  229620   7812   2744 S  3.9  0.0   0:00.77 ogonori

Let's look at the pprof data for revision #2:

(pprof) top10 -cum
59.31s of 86.44s total (68.61%)
Dropped 92 nodes (cum <= 0.43s)
Showing top 10 nodes out of 24 (cum >= 5.36s)
      flat  flat%   sum%        cum   cum%
     1.95s  2.26%  2.26%     74.02s 85.63%  main.func·018
         0     0%  2.26%     74.02s 85.63%  runtime.goexit
    20.71s 23.96% 26.21%     44.80s 51.83%  g/q/o/o/b/varint.ReadVarIntToUint
     5.57s  6.44% 32.66%     24.39s 28.22%  g/q/o/o/b/varint.VarintEncode
       15s 17.35% 50.01%     18.71s 21.65%  bytes.(*Buffer).Read
     5.64s  6.52% 56.54%     13.46s 15.57%  runtime.makeslice
         0     0% 56.54%      8.39s  9.71%  GC
     5.86s  6.78% 63.32%      8.23s  9.52%  runtime.mallocgc
     2.48s  2.87% 66.18%      7.82s  9.05%  runtime.newarray
     2.10s  2.43% 68.61%      5.36s  6.20%  bytes.(*Buffer).Write

Now GC is back up to nearly 10% of the total running time. So let's look at the profile of the VarintEncode function we changed:

(pprof) list VarintEncode
Total: 1.44mins
   5.57s     24.39s (flat, cum) 28.22% of Total
       .          .     40://
   290ms      290ms     41:func VarintEncode(w io.Writer, v uint64) error {
   550ms     14.01s     42: bs := make([]byte, 0, 10)
   170ms      170ms     43: for (v & 0xffffffffffffff80) != 0 {
   2.04s      2.04s     44:     bs = append(bs, byte((v&0x7f)|0x80))
   320ms      320ms     45:     v >>= 7
       .          .     46: }
   680ms      680ms     47: bs = append(bs, byte(v&0x7f))
       .          .     48:
   1.20s      6.56s     49: n, err := w.Write(bs)
   120ms      120ms     50: if err != nil {
       .          .     51:     return oerror.NewTrace(err)
       .          .     52: }
       .          .     53: if n != len(bs) {
       .          .     54:     return fmt.Errorf("Incorrect number of bytes written. Expected %d. Actual %d", len(bs), n)
       .          .     55: }
   200ms      200ms     56: return nil
       .          .     57:}

We can see that 58% of the time of this method is spent allocating new memory (the []byte slice on line 42), thereby causing GC to take longer. Here's why - if you look at the implementation of bytes.Buffer, you'll see that it has a fixed bootstrap array it allocates to handle small buffers and another fixed byte array (runeBytes) to handle writes to WriteByte; both of these allow it to avoid memory allocation for small operations.

Since my test code is reusing the same bytes.Buffer for each iteration, no new allocations were occurring during each call to varint.VarintEncode. But with this second revision I'm creating a new byte slice of capacity 10 in each round. So this change should be reverted.

`/* ---[ Lessons Learned ]--- */`

When you have an algorithm that you think should be CPU bound and your threads are not using ~100% CPU, then you have contention somewhere. In many scenarios that will be IO wait. But if you have no IO in that portion of your app, then you either have hidden thread contention (mutexes) and/or you may have a lot of garbage collection happening, which pauses all your worker threads/goroutines while GC is happening. Use the pprof tool to determine where time is being spent.

For performance sensitive algorithms, you will want to be garbage free in the main path as much as possible.

Once you know where the time is going, you should generally go after the largest bottleneck first. There's always a primary bottleneck somewhere. Removing the bottleneck in one place causes it to move to another. In my case, I wanted that bottleneck to just be CPU speed (or as is often the case, the time to get data from main memory or a CPU cache into a register).

A big lesson learned here is to be wary of convenience methods in Go's standard library. Many are provided for convenience, not performance. The binary.Read(buf, binary.LittleEndian, &u) call in my case is one such example. The third parameter to binary.Read is of type interface{}, so a type switch has to be done to detect the type. If your code is only ever passing in one type (uint64 in my case), then go read the stdlib code and figure out if there is a more direct method to call. That change contributed to a 5x throughput improvement in my case!

Next, be careful of too much data copying. While the io.Writer is a nice interface, if you are working with byte slices and want to pass it to some stdlib method that requires io.Writer, you will often copy the data into a bytes.Buffer and pass that in. If the function you call copies those bytes back out to yet another byte slice, then garbage is being generated and time is being wasted. So be aware of what's happening in the methods you call.

Finally, always measure carefully before and after any attempted optimizations. Intuition about where bottlenecks are and what will speed things up are often wrong. The only thing of value is to measure objectively. To end I'll quote "Commander" Pike:

Rule 1. You can't tell where a program is going to spend its time. Bottlenecks occur in surprising places, so don't try to second guess and put in a speed hack until you've proven that's where the bottleneck is. --Rob Pike's 5 Rules of Programming

`/* ---[ Misc Appendix Notes ]--- */`

The overall int64 space will still take too long to run even with these improvements, so I've settled for sampling from the state space instead.

All benchmark comparisons done were statistically significant (p<0.01) using Student's t-test, as analyzed with this tool: http://studentsttest.com. The mean and standard errors were also calculated here.

I notice that even with my best optimization (#1), there is still a ninth thread using >70% CPU. I used kill -QUIT on the program to get a stack dump of all the goroutines. I get 10 go routines - the 8 doing the fnRangeTester work, one waiting on the WaitGroup and the main goroutine which is waiting on the range failchan line. So I'm not sure what that 9th thread is doing churning up 50-70% CPU. Anyone know how to tell?

[Update - 08-July-2015]

In the comments, Carlos Torres asked for the pprof line-by-line output of the ReadVarIntToUint function after the first optimization. I did two profiling runs and compared the pprof outputs and they were both nearly identical. Here is one of them:

(pprof) list ReadVarIntToUint
Total: 1.13mins
ROUTINE ======================== g/q/o/o/b/varint.ReadVarIntToUint 
 18.20s     35.45s (flat, cum) 52.08% of Total
      .          .     25://
  480ms      480ms     26:func ReadVarIntToUint(r io.Reader) (uint64, error) {
      .          .     27:    var (
  270ms      270ms     28:        varbs []byte
  120ms      3.84s     29:        ba    [1]byte
      .          .     30:        u     uint64
      .          .     31:        n     int
  180ms      180ms     32:        err   error
      .          .     33:    )
      .          .     34:
  260ms      260ms     35:    varbs = make([]byte, 0, 10)
      .          .     36:
      .          .     37:    /* ---[ read in all varint bytes ]--- */
      .          .     38:    for {
  3.84s     15.72s     39:        n, err = r.Read(ba[:])
  530ms      530ms     40:        if err != nil {
      .          .     41:            return 0, oerror.NewTrace(err)
      .          .     42:        }
   10ms       10ms     43:        if n != 1 {
      .          .     44:            return 0, oerror.IncorrectNetworkRead{Expected: 1, Actual: n}
      .          .     45:        }
  3.21s      3.21s     46:        varbs = append(varbs, ba[0])
  980ms      980ms     47:        if IsFinalVarIntByte(ba[0]) {
  570ms      570ms     48:            varbs = append(varbs, byte(0x0))
      .          .     49:            break
      .          .     50:        }
      .          .     51:    }
      .          .     52:
      .          .     53:    /* ---[ decode ]--- */
      .          .     54:
      .          .     55:    var right, left uint
      .          .     56:
  620ms      620ms     57:    finalbs := make([]byte, 8)
      .          .     58:
      .          .     59:    idx := 0
  1.08s      1.08s     60:    for i := 0; i < len(varbs)-1; i++ {
  360ms      360ms     61:        right = uint(i) % 8
   20ms       20ms     62:        left = 7 - right
  230ms      230ms     63:        if i == 7 {
      .          .     64:            continue
      .          .     65:        }
  840ms      840ms     66:        vbcurr := varbs[i]
  900ms      900ms     67:        vbnext := varbs[i+1]
      .          .     68:
  120ms      120ms     69:        x := vbcurr & byte(0x7f)
  670ms      670ms     70:        y := x >> right
  670ms      670ms     71:        z := vbnext << left
  650ms      650ms     72:        finalbs[idx] = y | z
  780ms      780ms     73:        idx++
      .          .     74:    }
      .          .     75:
  540ms      2.19s     76:    u = binary.LittleEndian.Uint64(finalbs)
  270ms      270ms     77:    return u, nil
      .          .     78:}

If you compare it to the pprof before the optimization, the top half looks about the same, but the bottom half is dramatically different. For example, more than 30s was spent in binary.Read(&buf, binary.LittleEndian, &u) in the original version. The replacement code, binary.LittleEndian.Uint64(finalbs), only takes up about 2 seconds of processing time.

The only remaining spot I see for any further optimization is the 15s (out of 35s) spent in r.Read(ba[:]). The problem, however, is that with a varint you don't know how many bytes long it is in advance, so you have read and examine them one at a time. There is probably a way to optimize this, but I haven't attempted it yet.

Merkle Tree

2015-06-07T16:25:00.000-04:00

I hit upon the need this week to do checkpointing in a data processing system that has the requirement that no data event can ever be lost and no events can be processed and streamed out of order. I wanted a way to auto-detect this in production in real time.

There are a couple of ways to do this, but since our data events already have a signature attached to them (a SHA1 hash), I decided that a useful way to do the checkpoint is basically keep a hash of hashes. One could do this with a hash list, where a chain of hashes for each data element is kept and when a checkpoint occurs the hash of all those hashes in order is taken.

A disadvantage of this model is if the downstream system detects a hash mismatch (either due to a lost message or messages that are out-of-order) it would then have to iterate the full list to detect where the problem is.

An elegant alternative is a hash tree, aka a Merkle Tree named after its inventor Ralph Merkle.

`/* ---[ Merkle Trees ]--- */`

Merkle trees are typically implemented as binary trees where each non-leaf node is a hash of the two nodes below it. The leaves can either be the data itself or a hash/signature of the data.

Thus, if any difference at the root hash is detected between systems, a binary search can be done through the tree to determine which particular subtree has the problem. Thus typically only log(N) nodes need to be inspected rather than all N nodes to find the problem area.

Merkle trees are particularly effective in distributed systems where two separate systems can compare the data on each node via a Merkle tree and quickly determine which data sets (subtrees) are lacking on one or the other system. Then only the subset of missing data needs to be sent. Cassandra, based on Amazon's Dynamo, for example, uses Merkle trees as an anti-entropy measure to detect inconsistencies between replicas.

The Tree Hash EXchange format (THEX) is used in some peer-to-peer systems for file integrity verification. In that system the internal (non-leaf) nodes are allowed to have a different hashing algorithm than the leaf nodes. In the diagram below IH=InternalHashFn and LH=LeafHashFn.

The THEX system also defines a serialization format and format for dealing with incomplete trees. The THEX system ensures that all leaves are at the same depth from the root node. To do that it "promotes" nodes. That is when a parent only has one child, it cannot does not take a hash of the child hash; instead it just "inherits" it. If that is confusing, think of the Merkle tree as being built from the bottom up: all the leaves are present and hashes of hashes are built until a single root is present.

Notation: The first token is a node label, followed by a conceptual value for the hash/signature of the node. Note that E, H and J nodes all have the same signature, since they only have one child node.

`/* ---[ Merkle Tree as Checkpoint Data ]--- */`

I recently published my implementation of a Merkle Tree: https://github.com/quux00/merkle-tree

Before I describe the implementation, it will help to see the use case I'm targeting.

The scenario above is a data processing pipeline where messages flow in one direction. All the messages that come out of A go into B and are processed and transformed to some new value-added structure and sent on to C. In between are queues to decouple the systems.

Throughput needs to be as high as possible and every message that comes out of A must be processed by B and sent to C in the same order. No data events can be lost or reordered. System A puts a signature (a SHA1 hash) into the metadata of the event and that metadata is present on the message event that C receives.

To ensure that all messages are received and in the correct order, a checkpoint is periodically created by A, summarizing all the messages sent since the last checkpoint. That checkpoint message is put onto the Queue between A and B; B passes it downstream without alteration so that C can read it. Between checkpoints, system C keeps a running list of all the events it has received so that it can compute the signatures necessary to validate what it has received against the checkpoint message that periodically comes in from A.

`/* ---[ My Implementation of a Merkle Tree ]--- */`

The THEX Merkle Tree design was the inspiration for my implementation, but for my use case I made some simplifying assumptions. For one, I start with the leaves already having a signature. Since THEX is designed for file integrity comparisons, it assumes that you have segmented a file into fixed size chunks. That is not the use case I'm targeting.

The THEX algorithm "salts" the hash functions in order to ensure that there will be no collisions between the leaf hashes and the internal node hashes. It concatenates the byte 0x01 to the internal hash and the byte 0x00 to the leaf hash:

internal hash function = IH(X) = H(0x01, X)
leaf hash function = LH(X) = H(0x00, X)

It is useful to be able to distinguish leaf from internal nodes (especially when deserializing), so I morphed this idea into one where each Node has a type byte -- 0x01 identifies an internal node and 0x00 identifies a leaf node. This way I can leave the incoming leaf hashes intact for easier comparison by the downstream consumer.

So my MerkleTree.Node class is:

static class Node {
  public byte type;  // INTERNAL_SIG_TYPE or LEAF_SIG_TYPE
  public byte[] sig; // signature of the node
  public Node left;
  public Node right;
}

`/* ---[ Hash/Digest Algorithm ]--- */`

Since the leaf nodes are being passed in, my MerkleTree does not know (or need to know) what hashing algorithm was used on the leaves. Instead it only concerns itself with the internal leaf node digest algorithm.

The choice of hashing or digest algorithm is important, depending if you want to maximize performance or security. If one is using a Merkle tree to ensure integrity of data between peers that should not trust one another, then security is paramount and a cryptographically secure hash, such as SHA-256, Tiger, or SHA-3 should be used.

For my use case, I was not concerned with detecting malicious tampering. I only need to detect data loss or reordering, and have as little impact on overall throughput as possible. For that I can use a CRC rather than a full hashing algorithm.

Earlier I ran some benchmarks comparing the speed of Java implementations of SHA-1, Guava's Murmur hash, CRC32 and Adler32. Adler32 (java.util.zip.Adler32) was the fastest of the bunch. The typical use case for the Adler CRC is to detect data transmission errors. It trades off reliability for speed, so it is the weakest choice, but I deemed it sufficient to detect the sort of error I was concerned with.

So in my implementation the Adler32 checksum is hard-coded into the codebase. But if you want to change that we can either make the internal digest algorithm injectable or configurable or you can just copy the code and change it to use the algorithm you want.

The rest of the code is written to be agnostic of the hashing algorithm - all it deals with are the bytes of the signature.

`/* ---[ Serialization / Deserialization ]--- */`

My implementation has efficient binary serialization built into the MerkleTree and an accompanying MerkleDeserializer class that handles the deserialization.

I chose not to use the Java Serialization framework. Instead the serialize method just returns an array of bytes and deserialize accepts that byte array.

The serialization format is:

(magicheader:int)(numnodes:int)
[(nodetype:byte)(siglength:int)(signature:[]byte)]

where (foo:type) indicates the name (foo) and the type/size of the serialized element. I use a magic header of 0xcdaace99 to allow the deserializer to be certain it has received a valid byte array.

The next number indicates the number of nodes in the tree. Then follows an "array" of numnodes size where the elements are the node type (0x01 for internal, 0x00 for leaf), the length of the signature and then the signature as an array of bytes siglength long.

By including the siglength field, I can allow leaf nodes signatures to be "promoted" to the parent internal node when there is an odd number of leaf nodes. This allows the internal nodes to use signatures of different lengths.

`/* ---[ Usage ]--- */`

For the use case described above, you can imagine that system A does the following:

List<String> eventSigs = new ArrayList<>();

while (true) {
  Event event = receiveEvent();
  String hash = computeHash(event);
  // ... process and transmit the message to the downstream Queue
  sendToDownstreamQueue(hash, event);

  eventSigs.add(has);

  if (isTimeForCheckpoint()) {
    MerkleTree mtree = new MerkleTree(eventSigs);
    eventSigs.clear();
    byte[] serializedTree = mtree.serialize();
    sendToDownstreamQueue(serializedTree);
  }
}

And system C would then do something like:

List<String> eventSigs = new ArrayList<>();

while (true) {
  Event event = receiveEvent();

  if (isCheckpointMessage(event)) {
    MerkleTree mytree = new MerkleTree(eventSigs);
    eventSigs.clear();

    byte[] treeBytes = event.getDataAsBytes();
    MerkleTree expectedTree = MerkleDeserializer.deserialize(treeBytes);
    byte[] myRootSig = mytree.getRoot().sig;
    byte[] expectedRootSig = expectedTree.getRoot().sig;
    if (!signaturesAreEqual(myRootSig, expectedRootSig)) {
      evaluateTreeDifferences(mytree, expectedTree);
      // ... send alert
    }

  } else {
    String hash = event.getOriginalSignature();
    eventSigs.add(hash);
    // .. do something with event
  }
}

Hooking up Kafka to Storm with the KafkaSpout in Storm 0.9.3

2014-12-25T20:10:00.000-05:00

I have recently published a code example for using the Kafka Spout that is now part of Storm 0.9.3. After wasting a day fighting with incompatible logging frameworks between Kafka and Storm, I found an example project in Scala that had the necessary exclusions to get the uber-jar to run in Storm cluster mode. I've converted that example to Java and Maven, rather than Scala and sbt, and considerably simplified the examples so you can focus on just one thing: getting Kafka hooked up to Storm using the KafkaSpout.

The code repo is here: https://github.com/quux00/kstorm

Updated microbenchmarks for Java String tokenization

2014-10-19T11:59:00.000-04:00

I've been focusing a lot on Java performance lately and ran across a Programmers StackExchange post asking the most efficient way to tokenize a string in Java. It struck a nerve with me because the most common way to do this is to use String#split. But the split method takes as its tokenization delimiter a string that gets compiled into a regular expression. Which seems nice when you need it, but likely to be rather inefficient if all you need is to split on a character, such as a space. Then I looked up the old StringTokenizer class which I used to use years and years ago. And its Javadoc has this interesting tidbit:

StringTokenizer is a legacy class that is retained for compatibility reasons although its use is discouraged in new code. It is recommended that anyone seeking this functionality use the split method of String or the java.util.regex package instead.

Strangely, however, the accepted answer for the StackExchange question links to a blog post that shows benchmarks claiming that StringTokenizer is faster than String#split. So why shouldn't I use StringTokenizer if I don't need a regex to split a string into fields?

`/* ---[ Someone is wrong on the internet ]--- */`

Well, that blog post needs to be updated for two reasons:

it used Java 6, so we need to see whether things are different in Java 7 and 8
microbenchmarks are hard to do correctly and the author used questionable means (for example - there is not much warm up time and his timings are awfully short)

The difficulty of doing accurate microbenchmarks has been greatly assuaged by the creation of the JMH benchmark tool. If you haven't adopted it yet, put it at the top your technical TODO list. The documentation is excellent and it comes with many useful samples.

`/* ---[ Splitting a Java string into tokens ]--- */`

Of the techniques used in the original post, I am interested in four candidates:

tokenize the string yourself with String#indexOf
use String#split
use the almost deprecated StringTokenizer
use Guava's Splitter

Here are the reported outcomes of the earlier measurements for these 4 options:

IndexOf: 50ms
StringTokenizer: 89ms
GuavaSplit: 109ms
Split: 366ms

I reran the author's exact code on my system using Java 7 and got the following:

IndexOf: 52ms
StringTokenizer: 104ms
GuavaSplit: 119ms
Split: 145ms

So using his benchmark technique, I got the same ordering, though the ratios have changed a bit. In Java 7, split is only 3 times slower than index, rather than 7 times slower.

`/* ---[ What does JMH say? ]--- */`

I mildly rewrote these benchmarks using JMH. I also made longer strings to tokenize, rather than the two token string used by the blog post author, as that seemed a more realistic data set. And I used three strings, instead of one:

String[] lines = new String[]{
    "0000 12 34 5 666 77 88888 99",
    "aa bb cc dd ee ff gg hh ii jj kk ll mm nn oo pp qq rr ss tt uu vv ww xx yy zz",
    "Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod"
};

The JMH based benchmark code is listed below. The results are rather interesting. In the original benchmarks raw time is measure, so lower is better. With JMH, I measured throughput (the default metric) in operations/second, so higher is better:

With Java 7 (1.7.0_60-b19) the results were:

m.JavaStringSplits.indexOf           thrpt    10  840.147 ± 28.553 ops/ms
m.JavaStringSplits.javaSplit         thrpt    10  694.870 ±  3.858 ops/ms
m.JavaStringSplits.stringTokenizer   thrpt    10  473.812 ±  8.422 ops/ms
m.JavaStringSplits.guavaSplit        thrpt    10  392.759 ±  8.707 ops/ms

And with Java 8 (1.8.0_20-ea):

m.JavaStringSplits.indexOf           thrpt    10  827.136 ± 30.831 ops/ms
m.JavaStringSplits.javaSplit         thrpt    10  700.809 ±  7.215 ops/ms
m.JavaStringSplits.stringTokenizer   thrpt    10  480.788 ± 16.793 ops/ms
m.JavaStringSplits.guavaSplit        thrpt    10  386.398 ±  5.323 ops/ms

With the presumably more robust and accurate JMH model of measuring microbenchmarks, String#split moved up the ranks and handily outperforms both StringTokenizer and Guava Splitter.

My hand-coded splitter using indexOf was about 18% "faster" (higher throughput) than Java#split, so we do see some overhead in Java#split, but for most scenarios I conclude that the Javadoc for StringTokenizer is correct: use String#split for most scenarios unless you know that 1) you don't need regular expressions to tokenize and 2) string splitting is your bottleneck.

A follow-up set of tests that could be done would be on very large strings. In his Letter the Caller Choose Mechanical Sympathy post from a few years ago, Martin Thompson advocated devising an API where you pass in the datastructure to be populated during the tokenization, rather than relying one to be created in the underlying tokenization method and passed back to you. He states (but does not demonstrate):

In the Java world this idiom becomes critical to performance when large arrays are involved.

`/* ---[ Find the flaw ]--- */`

One of my favorite interview questions is to put a piece of code in front of someone and say "find the flaws". And Aleksey Shipilëv, the author of JMH, states:

Your [JMH] benchmarks should be peer-reviewed.

Mine have not been, so I throw it out to the internet to find the flaws in my quick analysis.

`/* ---[ The code and setup ]--- */`

Benchmarks were performed on:

Ubuntu (Xubuntu) 14.04 LTS with 3.13.0-37 Linux kernel with 8 Intel(R) Core(TM) i7-3610QM CPU @ 2.30GHz cores.

Rust namespaces by example

2014-03-17T21:41:00.001-04:00

`/* ---[ Cheat sheet for Rust namespaces ]--- */`

Rust namespaces can be a little mind-bending. This brief blog post is meant to provide instruction by example on setting up an application or library of Rust code that uses multiple files in a hierarchical namespace.

Though the current docs-project-structure guide on the Rust wiki is pretty sparse, you should start there. Then read the section on Crates and Modules in the tutorial.

I used those references plus an examination of how files and namespaces in the libstd code tree are structured to come up with this example.

In this simple example, I want to have an application where things are namespace to the top level module abc. I want to have a couple of files (namespaces) under abc, some additional directories (namespaces) below abc, such as abc::xyz and have modules in the abc::xyz namespace. Furthermore, I want all those namespaces to be able to refer to each - both down the chain and up the chain.

Here is a simple example that illustrates how to do it. I am using Rust-0.10pre (built 16-Mar-2014).

First, I have a project I called "space-manatee", under which I have a src directory and then my code hierarchy starts:

quux00:~/rustlang/space-manatee/src$ tree
.
├── abc
│   ├── mod.rs
│   ├── thing.rs
│   ├── wibble.rs
│   └── xyz
│       ├── mod.rs
│       └── waldo.rs
└── main.rs

2 directories, 6 files

To provide a namespace foo in Rust, you can either create a file called foo.rs or a dir/file combo of foo/mod.rs. The content of my abc/mod.rs is:

quux00:~/rustlang/space-manatee/src$ cat abc/mod.rs 
pub mod thing;
pub mod wibble;
pub mod xyz;

All this module does is export other modules in the same directory. It could have additional code in it - functions and data structures, but I elected not to do that.

xyz is a directory, and since I created the xyz/mod.rs dir/file combo, it is a namespace that can be used and exported.

Let's look into the other files:

quux00:~/rustlang/space-manatee/src$ cat abc/thing.rs 
extern crate time;

use time::Tm;

pub struct Thing1 {
    name: ~str,
    when: time::Tm,
}

pub fn new_thing1(name: ~str) -> ~Thing1 {
    ~Thing1{name: name, when: time::now()}
}

thing.rs pulls in the rustlang time crate and then defines a struct and constructor for it. It doesn't reference any other space-manatee code.

quux00:~/rustlang/space-manatee/src$ cat abc/wibble.rs 
use abc::thing;
use abc::thing::Thing1;

pub struct Wibble {
    mything: ~Thing1
}

pub fn make_wibble() -> ~Wibble {
    ~Wibble{mything: thing::new_thing1(~"cthulu")}
}

wibble.rs, however, does reference other space-manatee projects, so it uses the fully qualified namespace from the top of the hierarchy, but it does not have to explicitly "import" anything. It can find the thing namespace without a mod declaration because thing.rs is in the same directory.

OK, let's look into the xyz directory now.

quux00:~/rustlang/space-manatee/src$ cat abc/xyz/mod.rs 
pub mod waldo;

That just exports the waldo namespace in the same directory. What's in waldo?

quux00:~/rustlang/space-manatee/src$ cat abc/xyz/waldo.rs 
use abc::wibble::Wibble;

pub struct Waldo {
    magic_number: int,
    w: ~Wibble
}

The Waldo struct references the Wibble struct that is higher than it in the hierarchy. Notice there is no "import" via a mod statement - apparently going up the hierarchy requires no import.

So that's the supporting cast. Let's see how the main.rs program uses them:

quux00:~/rustlang/space-manatee/src$ cat main.rs 
extern crate time;

use abc::{thing,wibble};
use abc::thing::Thing1;
use abc::wibble::Wibble;
use abc::xyz::waldo::Waldo;

pub mod abc;

fn main() {
    let tg: ~Thing1 = thing::new_thing1(~"fred");
    println!("{}", tg.name);

    let wb: ~Wibble = wibble::make_wibble();
    println!("{}", wb.mything.name);

    let wdo = Waldo{magic_number: 42,
                    w: wibble::make_wibble()};
    println!("{:?}", wdo);
}

The only mod "import" main.rs had to do is of the abc namespace - which is in the same directory as main.rs. In fact, that is all you can import. If you try mod abc::thing the compiler will tell you that you aren't doing it right.

By importing abc, you are importing abc/mod.rs. Go back up and look at what abc/mod.rs does - it imports other modules, which in turn import other modules, so they all end up being imported into main.rs as addressable entities.

Once all those import references are set up, nothing special has to be done to compile and run it:

quux00:~/rustlang/space-manatee/src$ rustc main.rs
quux00:~/rustlang/space-manatee/src$ ./main
fred
cthulu
abc::xyz::waldo::Waldo{
  magic_number: 42,
  w: ~abc::wibble::Wibble{
    mything: ~abc::thing::Thing1{
      name: ~"cthulu",
      when: time::Tm{tm_sec: 21i32,
        tm_min: 26i32, tm_hour: 21i32, tm_mday: 17i32, tm_mon: 2i32,
        tm_year: 114i32, tm_wday: 1i32, tm_yday: 75i32, tm_isdst: 1i32,
        tm_gmtoff: -14400i32, tm_zone: ~"EDT", tm_nsec: 663891679i32
      }
    }
  }
}

(I hand formatted the Waldo output for easier reading.)

Select over multiple Rust Channels

2014-03-16T10:24:00.001-04:00

`/* ---[ Channels in Rust ]--- */`

The channel paradigm for CSP-based concurrency has received a lot of attention lately since it is the foundational concurrency paradigm in Go and Clojure has embraced it with core.async. It turns out that Rust, the new language from Mozilla, also fully embraces channel-based message passing concurrency.

Both Go and Clojure's core.async have a select operation that allows your code to wait on multiple channels and respond to the first one that is ready. This is based, at least conceptually, on the Unix select system call that monitors multiple file descriptors.

Rust also has a select operation. And it has a select! macro to make using it easier. Here's an example:

use std::io::Timer;

fn use_select_macro() {
    let (ch, pt): (Sender<~str>, Receiver<~str>) = channel();

    let mut timer = Timer::new().unwrap();
    let timeout = timer.oneshot(1000);
    select! (
        s = pt.recv() => println!("{:s}", s),
        () = timeout.recv() => println!("timed out!")
    );
}

Channels and Ports are now called Senders and Receivers in Rust. As with select in Go, if the Receiver called pt has a message come in before the 1 second timer goes off, its code block will execute. Otherwise, the timer's Receiver will be read from and its code block executed, printing "timed out".

Note that the select! macro uses parens, like a function call, not curly braces like a code block.

The select! macro is currently labelled experimental, since it has some limitations. One I hit this week is that it will fail (as in, not compile) if you embed the Receiver in a struct:


fn does_not_compile() {
    let (ch, pt): (Sender<~str>, Receiver<~str>) = channel();
    let a = A{c: ch, p: pt};

    let mut timer = Timer::new().unwrap();
    let timeout = timer.oneshot(1000);
    select! (
        s = a.p.recv() => println!("{:s}", s), 
        () = timeout.recv() => println!("time out!")
    );
}

This fails with error: no rules expected the token '.'. I've filed an issue for it here: https://github.com/mozilla/rust/issues/12902#issuecomment-37714663

`/* ---[ Using the Rust Channel Select API ]--- */`

The workaround is to use the Select API directly. Here's how you do it:


use std::comm::Select;
use std::io::Timer;

fn select_from_struct() {
    let (ch, pt): (Sender<~str>, Receiver<~str>) = channel();    
    let mut timer = Timer::new().unwrap();
    let timeout = timer.oneshot(1000);

    let a = A{c: ch, p: pt};

    let sel = Select::new();
    let mut pt = sel.handle(&a.p);
    let mut timeout = sel.handle(&timeout);
    unsafe { pt.add(); timeout.add(); }
    let ret = sel.wait();

    if ret == pt.id() {
        let s = pt.recv();
        println!("ss: {:?}", s);
    } else if ret == timeout.id() {
        let () = timeout.recv();
        println!("TIMEOUT!!");
    }
}

It's a little more code, but fairly straightforward to follow. You wrap your Receivers in a select Handle and them add them add them to the Receiver set via the add method (which must be wrapped in an unsafe block). Each handle gets an id so you can discover which one returned first.

Finally you wait. When the wait returns you check the return id and execute the appropriate block of code, which starts by call recv on the Receiver to get the incoming value (if any).

Debugging Rust with gdb

2014-01-26T12:28:00.001-05:00

[Update]: This blog post has been updated to Rust 0.11 as of mid-April 2014. The previous version was for Rust 0.9.

This blog entry outlines the current state of my understanding of how to use the gdb debugger for Rust programs. As of this writing, I'm using:

$ rustc -v
rustc 0.11-pre-nightly (e332287 2014-04-16 00:56:30 -0700)
host: x86_64-unknown-linux-gnu

$ gdb --version
GNU gdb (GDB) 7.6.1-ubuntu

A caveat before we begin: I'm not an expert in gdb and I'm (now less of) a newbie at Rust. I'm cataloging my findings for both my own future reference and to help anyone who needs to see an example in action. I'd welcome feedback on tips for better ways to do this.

`/* ---[ GDB ]--- */`

I won't give a gdb tutorial. There are lots of good ones on the web, such as these:

`/* ---[ The setup ]--- */`

The example I'll use is modified from an example from cmr in this Rust issue: https://github.com/mozilla/rust/issues/10350. I've chosen this one because it shows jumping into both a regular (named) function and an anonymous closure.

The codebase consists of two files, both in the same directory:

quux.rs:

pub fn quux00(x: || -> int) -> int {
    println!("DEBUG 123");
    x()
}

and bar.rs

extern crate quux;

fn main() {
    let mut y = 2;
    {
        let x = || {
            7 + y
        };
        let retval = quux::quux00(x);
        println!("retval: {:?}", retval);
    }
    y = 5;
    println!("y     : {:?}", y);
}

To compile them with debugging info included use the -g switch (this changed since version 0.9):

$ rustc -g --crate-type lib quux.rs
$ rustc -g -L . bar.rs

Start the bar program in gdb and let's set some breakpoints:

$ gdb bar
(gdb) break bar::main
Breakpoint 1 at 0x4042c0: file bar.rs, line 3.
(gdb) rbreak quux00
Breakpoint 2 at 0x43e760: file quux.rs, line 1.
int quux::quux00(int ())(int ());
(gdb) info breakpoints
Num     Type           Disp Enb Address            What
1       breakpoint     keep y   0x00000000004042c0 in bar::main at bar.rs:3
2       breakpoint     keep y   0x000000000043e760 in quux::quux00 at quux.rs:1

For the first breakpoint, I know the full path, so I just use break.

Sometimes correct full paths in Rust can be tricky, especially for functions that are parametized, so it can be easier to just use rbreak, which takes a regular expression. All functions that match the regex will have a breakpoint set.

The second breakpoint does have the word "quux00" in it, but it's been mangled. There's a way to change that in Rust, but let's move on for the moment.

`/* ---[ Aside: rbreak ]--- */`

In my playing around so far, I've been unable to use break to set a breakpoint for a function not in the file with the main method, so rbreak has been very useful.

The rbreak command is pretty powerful. According to the documentation, if you want to set a breakpoint for all functions, rbreak . will do the trick. You don't want to do this for a rust application, because there are hundreds of functions that get compiled in.

Instead, you'll want to limit the scope of the regex search by limiting the search to a single file with:

(gdb) rbreak bar.rs:.

But again I've only gotten this to work for the "main" file. If I type rbreak quux.rs:. it doesn't know what I'm talking about. Something for future research.

`/* ---[ Let's debug ]--- */`

So now we've got two breakpoints set, as indicated by the output from info breakpoints.

Let's start the debugging session:

(gdb) run
Starting program: /home/midpeter444/lang/rust/sandbox/debug/./bar
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
... deleted a bunch of info about threads starting up ...
Breakpoint 1, bar::main () at bar.rs:3
3    fn main() {
(gdb) n
4        let mut y = 2;
(gdb) n
9            let retval = quux::quux00(x);
(gdb) list
4        let mut y = 2;
5        {
6            let x = || {
7                7 + y
8            };
9            let retval = quux::quux00(x);
10            println!("retval: {:?}", retval);
11        }
12        y = 5;
13        println!("y     : {:?}", y);
(gdb) p y
$1 = 2
(gdb) p x
$2 = {int ()} 0x7fffffffd4f8

Interesting that it seemed to skip lines 5-8 when I single stepped with n. But the x value was captured as an unnamed function, as you can see on the last line of the readout.

Now we are on line 8 (confirmed with the frame command below), so let's continue - our breakpoint on the quux00 function should now be tripped:

(gdb) frame
#0  bar::main () at bar.rs:9
9            let retval = quux::quux00(x);
(gdb) c
Continuing.

Breakpoint 2, quux::quux00 (x={int ()} 0x7fffffffd500) at quux.rs:1
1    pub fn quux00(x: || -> int) -> int {

Yes, it was tripped. Let's look around and single-step through it:

(gdb) frame
#0  quux::quux00 (x={int ()} 0x7fffffffd500) at quux.rs:1
1    pub fn quux00(x: || -> int) -> int {
(gdb) list
1    pub fn quux00(x: || -> int) -> int {
2        println!("DEBUG 123");
3        x()
4    }
(gdb) s
2        println!("DEBUG 123");
(gdb) p x
$4 = {int ()} 0x7fffffffd390

OK, we're inside the quux00 method. We stepped over the first instruction (the println) and inspected the x param, which is our anonymous Rust closure. Let's continue by stepping into the closure and see if that works:

(gdb) n
DEBUG 123
3        x()
(gdb) s
fn1356 () at bar.rs:6
6            let x = || {
(gdb) n
7                7 + y
(gdb) p y
$5 = 2
(gdb) n
bar::main () at bar.rs:10
10            println!("retval: {:?}", retval);

Excellent. That worked and we even had line numbers. Now we are back in the outer main fn. BTW, note that the "anonymous" closure has a name: fn1356 - remember that name, we'll come back to it later.

It's an easy walk to the finish line from here:

(gdb) list
5        {
6            let x = || {
7                7 + y
8            };
9            let retval = quux::quux00(x);
10            println!("retval: {:?}", retval);
11        }
12        y = 5;
13        println!("y     : {:?}", y);
14    }
(gdb) p retval
$3 = 9
(gdb) n
2    
(gdb) p y
$4 = 2
(gdb) c
Continuing.
retval: 9
y     : 5
[Inferior 1 (process 7007) exited normally]

`/* ---[ Round 2: Set breakpoints on all methods in main file ]--- */`

Let's start over and set breakpoints on all the functions in the bar.rs file:

$ gdb bar
(gdb) rbreak bar.rs:.
Breakpoint 1 at 0x4042c0: file bar.rs, line 3.
static void bar::main();
Breakpoint 2 at 0x404520: file bar.rs, line 6.
static int fn1356()();

Aah, there's the name again: fn1356. So that's another way to set a breakpoint on your closures. If we redo the session, we'll see that the breakpoint now gets tripped in the closure as it's being executed (from within the quux00 method):

(gdb) r
Starting program: /home/midpeter444/lang/rust/sandbox/debug/bar 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Breakpoint 1, bar::main () at bar.rs:3
warning: Source file is more recent than executable.
3    fn main() {
(gdb) c
Continuing.
DEBUG 123

Breakpoint 2, fn1356 () at bar.rs:6
6            let x = || {
(gdb) frame
#0  fn1356 () at bar.rs:6
6            let x = || {
(gdb) n
7                7 + y
(gdb) p y
$1 = 2

`/* ---[ Round 3: Demangle function names ]--- */`

In the Rust 0.9 version of this post I showed how rust mangled the function names, but that seems to have gone away. You can still explicitly specify not to mangle function names like so:

#[no_mangle]
pub fn quux00(x: || -> int) -> int {
    println("DEBUG 123");
    x()
}

I haven't tried anything really complicated with Rust in gdb yet, but hopefully these examples serve to get you going.

Telephonic Whispers in Rust, revisited

2014-01-26T12:16:00.000-05:00

`/* ---[ Hitting the big time ]--- */`

My previous blog entry was a bit of a trifle: it showed an implementation of an toy CSP (Communicating Sequential Process) application in Rust, comparing it to an example implementation in Go by Rob Pike. I discovered, post-hoc, that it hit the Rust subreddit and Hacker News. There was some good discussion there and a few complaints.

`/* ---[ Clarifications ]--- */`

First, the complaints. Of course this example was tuned for how Go works: the example was created by Rob Pike. The point of his example, and my blog post, was that goroutines (Rust tasks) are lightweight (green threads) that are multiplexed onto native OS threads. The example is meant to show that you can spawn up tens of thousands to hundreds of thousands of goroutines (Rust tasks) pretty rapidly. That's not something you can do in Java, for example: I've tried on my Linux system and it cried uncle somewhere between 20,000 and 30,000 threads and hosed my system.

The Rust guide on Tasks says:

Rust can create hundreds of thousands of concurrent tasks
on a typical 32-bit system

So remembering the Pike example, I thought I'd try it with Rust. My primary intent was just to show that I figured out how to implement it in Rust and that it worked. When you are learning Rust, getting something to compile is an achievement worth celebrating. The Rust compiler is wonderfully strict - enforcing all its rules in order to provide deterministic, safe memory management with static analysis rather than requiring a managed runtime. (But it can provide a managed runtime with Garbage Collection if you want that model, which is one of things that makes Rust so intriguing.)

The simple timings in my earlier blog post were included only because the numbers were interestingly different. The reasons for that difference was the most interesting part of the community discussion and I certainly learned a few things I didn't know about Rust, so, despite the complaints, I'm glad I included it. There wasn't much point in doing formal extensive benchmarks, as some people suggested, because, as the other posters pointed out, the example is trifling and not something you would typically do.

Second, I called the thing "Chinese Whispers" only because that's what the esteemed Rob Pike called it and I was clearly doing a port of his work in the presentation I cited. The name makes no sense to me, but since there were charges of racism and suggestions of calling it "Broken Telephone", I'll now call it "Telephonic Whispers", since I'm not exactly sure what is "broken" about chaining a series of channels together and passing a message from one end to the other. Seems connected to me.

Third, I updated the blog with a few improvements. I shouldn't have described channels as "first class" in Rust, since that label seems to be reserved for things are a part of the core of a language. Rust channels are not built into the language in the way they are in Go. My point was that while these are in the std library and not in the core language, they are a central aspect of Rust's approach to concurrency. They are just as at hand out of the box as Go's channels are, unlike a language like Java where you have to construct them yourself from other concurrency constructs, such as a SynchronousQueue or a TransferQueue.

`/* ---[ The interesting follow up: segmented stacks ]--- */`

Patrick Walton profiled the Rust code and found that most of the time was spent allocating stacks. Go currently (I'm using version 1.1.2) uses something called "segmented stacks" rather than larger contiguous stacks. In the segmented stack model, you create small stacks for a thread/task/routine and when it needs more space you create another segment and keep track of them in some data structure, such as a linked list.

Since most of the work of the "Telephonic Whispers" example is spent creating a chain of 100,001 channels, it makes sense that systems that build larger stacks for each routine/task will take longer.

An advantage of a segmented stack model is that it allows you to quickly spawn up tasks and if they never need a lot of stack space, less memory is wasted. If the task needs more later, then it gets more on demand.

Rust used to have segmented stacks, but they were removed late in 2013, as announced here by Brian Anderson: https://mail.mozilla.org/pipermail/rust-dev/2013-November/006314.html. The Rust team found that the apparent advantages of compact memory caused performance problems and had compatibility issues with integrating with foreign code (typically C) and with LLVM libc optimizations.

And now, as pointed out in the "Chinese Whispers" community discussions, it looks like Go will be moving away from segmented stacks too. The primary issue they cite is what Brian Anderson above calls "stack thrashing": when you have heavy looping and that code is continually hitting a stack boundary, causing stack allocations to happen very frequently, thus significantly affecting performance.

`/* ---[ Valid consideration: Number of native threads ]--- */`

Another important point made in the "whispers" community discussion was about the underlying threading defaults: in Go version 1, only a single native OS thread is created even if you spawn multiple goroutines. To change that you have to manually set it via the runtime.GOMAXPROCS(int) function. So in the Pike example, only one thread is being used.

Rust v0.9 uses multiple threads by default and has to do synchronization between threads. So that might account for some of the slow down between the Go and Rust versions. But Walton's profiling showed that most of the time is the stack allocation, so this consideration is probably secondary.

I did a small experiment on this front by setting GOMAXPROCS to 8 (the number of CPUs on my system):

Then I did a small simple benchmark where I looped 50 times running the Go "whispers" code either with the default GOMAXPROCS setting or GOMAXPROCS=NumCPU (8). I ran each three times and saw consistent results. Here are two representative runs:

GOMAXPROCS = default:

$ time ./whispers
100001

real    0m5.569s
user    0m5.403s
sys     0m0.168s

GOMAXPROCS = NumCPU (8)

$ time ./whispers
100001

real    0m10.801s
user    0m20.413s
sys     0m6.007s

Using more CPUs causes the Go version to run about twice as slow. This would add support to the notion that this is part of the timing difference between Go and Rust for this example is due to the default number of threads that are spun up.

`/* ---[ Final thoughts ]--- */`

To conclude, I want to state something that is obvious by now and was also discussed the community discussion: if you go to the effort of spawning up lots of tasks in Rust or goroutines in Go, design your application in such a way that they do some amount of significant work. Spawning more tasks or routines can have overhead that you may not need to pay. As strncat says in the Reddit discussion, Rust does not have true coroutines. That doesn't mean you can't use tasks that way, but you'll need to be judicious about how you use them (and how many).

One example I'd like to implement in Rust is another port of some Rob Pike Go code: Lexical scanning and parsing use Goroutines: https://www.youtube.com/watch?v=HxaD_trXwRE

Telephonic Whispers in Rust

2014-01-19T19:34:00.000-05:00

`/* ---[ CSP: an idea whose time has come ]--- */`

Of late, the Go language has popularized the CSP (Communicating Sequential Processes) style of programming. Rob Pike gives a great talk on Go's implementation of CSP with Go channels and goroutines in this presentation: Google I/O 2012 - Go Concurrency Patterns.

After watching that presentation, I myself got inspired and wrote a Go-inspired CSP library in Clojure. Later, Rich Hickey and team came and built an even better version, core.async, that has gotten a lot of acclaim.

Recently, I've been learning the Rust language and was intrigued and happy to see that the CSP model is also embraced front-and-center by the Rust team. Like Go, the core primitive for message passing is a channel and Rust has lightweight routines (green threads), called tasks in Rust.

One of the first things I did when implementing a CSP library in Go was to port Pike's Go examples from this Google I/O talk to Clojure. Here I do that in Rust for the "Chinese Whispers" example.

`/* ---[ Telephonic Whispers ]--- */`

Pike shows this example in his 2012 Heroku presentation to show off the idea that you can spawn up a lot of lightweight goroutines and have it run efficiently. Since Rust tasks are lightweight, I wanted to test whether Rust would behave similarly.

In the Chinese Whispers example, a daisy-chain of go routines are formed and they communicate unidirectionally along a series of channels. Go routine A signals to B who signals to C, etc. The message passed along the way is an integer than is incremented on each hand off.

With the Go example, you can run a daisy-chain of 100,000 go routines very efficiently:

$ time ./chinese-whispers 
100001

real    0m0.322s
user    0m0.215s
sys     0m0.105s

Here's my code to do the same in Rust (v 0.9):

A key difference is that in the Rust model a single channel actually comes in two parts: the port end, from which you receive, and a channel end, onto which you send data. The Go model has a single entity, the channel, which can be used for both sending and receiving, though you can create write-only or read-only channel handles.

This separation makes the Rust code a little more intricate and verbose. I also need additional variables, such as the temp variables on lines 16 and 17, since Rust pointers can only be owned by one entity at a time and sending them off into a task (via spawn) means that you've permanently transferred ownership of that reference.

Here is the running speed of the Rust version on my system:

$ time bin/chinese-whispers 
100001

real    0m4.234s
user    0m16.942s
sys     0m3.331s

So Rust can do it just fine, but it is not as efficient as Go. The Go version is about 13 times faster, based on these timed runs on my system.

[Update]: This blog post was discussed on Hacker News and Reddit soon after I published it. I've written a follow-up post addressing some of the complaints and summarizing some of the really interesting points in those community discussions: http://thornydev.blogspot.com/2014/01/telephonic-whispers-in-rust-revisited.html

Debugging Go (golang) programs with gdb

2014-01-01T15:36:00.000-05:00

`/* ---[ Update: July 2015 ]--- */`

At this point, I think the Go community has given up trying to make gdb work with Go programs. It is pretty painful, so I don't recommend it. Recently, I've tried both delve and godebug, two contributions from the Go community.. I had better luck with godebug. In fact, it performed perfectly for a recent issue I was having and was a joy to work with.

`/* ---[ Debugging Go ]--- */`

At the time of this writing, the only debugger I know of for the Go language is the FSF's gdb debugger. gdb can be used to debug programs written in Go and compiled with gccgo or 6g compilers.

At present, I'm using Go version 1.1.2 (on Xubuntu Linux). Do not upgrade to version 1.2 if you want to be able to use gdb. The 1.2 release introduced changes that breaks single stepping through code in gdb: http://code.google.com/p/go/issues/detail?id=6776.

As a side note, I find this situation pretty disappointing. It says that the Go developers are not including gdb compatibility tests in their testing of Go. That really isn't acceptable in my opinion if gdb is the only debugger tool available. Happily, the last entry in the comments from one of the Go maintainers/developers is "If possible, fix this for 1.2.1."

`/* ---[ GDB ]--- */`

I won't give a gdb tutorial. There are lots of good ones on the web, such as these:

`/* ---[ Using GDB with Go ]--- */`

I've written a unit test, using Go's testing library, for the code I'm debugging. Here's the code under test:

//
// Read in the username and password properties from the CONFIG_FILE file
// Returns error if CONFIG_FILE cannot be found or opened
//
func readDatabaseProperties() (uname, passw string, err error) {
    propsStr, err := ioutil.ReadFile(CONFIG_FILE)
    if err != nil {
        return
    }

    for _, line := range strings.Split(string(propsStr), "\n") {
        prop := strings.Split(line, "=")
        if len(prop) == 2 {
            switch prop[0] {
            case "username":
                uname = prop[1]
            case "password":
                passw = prop[1]
            }
        }
    }
    return
}

And here is the unit test for it:

func TestReadDatabaseProps(t *testing.T) {
    uname, passw, err := readDatabaseProperties()
    if err != nil {
        t.Errorf(fmt.Sprintf("%v", err))
    }
    if len(uname) == 0 {
        t.Errorf("uname is empty")
    }
    if len(passw) == 0 {
        t.Errorf("passw is empty")
    }
}

The CONFIG_FILE on my system has:

username = midpeter444
password = jiffylube

The unit test is currently failing:

$ go test -test.run="TestReadDatabaseProps"
--- FAIL: TestReadDatabaseProps (0.00 seconds)
    indexer_test.go:150: uname is empty
    indexer_test.go:153: passw is empty
FAIL
exit status 1
FAIL    fslocate    0.023s

So let's check it out in gdb. First, compile the go code with the following flags:

$ go test -c -gcflags '-N -l'

The -c flag causes the go compiler to generate an executable in the current directory called xxx.test, where xxx is the name of the directory your code is in. In my case, it generated fslocate.test.

Next ensure you have the GOROOT environment variable properly set and start gdb.

$ gdb fslocate.test -d $GOROOT

After doing this, if you see an error about /usr/local/go/src/pkg/runtime/runtime-gdb.py, see my side note #1 at the bottom of this posting.

The way we just fired up gdb will give you the command line prompt only. If you want to use it that way, I recommend using the frame command in order to keep track of where you are in the code as you single-step through it.

However, I like using the -tui option to gdb which will split the screen and give you a visual display of the code as you step through it.

$ gdb -tui fslocate.test -d $GOROOT

You get a screen like this. Don't worry about the assembly code - that will switch to your Go code once you get underway.

To proceed, I'm going to set a breakpoint at the start of the test. To do this you'll need to specify the path to the function you want to break on. The code is actually in package "main", but it is in the "fslocate" directory, so I set the breakpoint like this:

(gdb) b "fslocate.TestReadDatabaseProps"
Breakpoint 1 at 0x43c730: file /home/midpeter444/lang/go/projects/src/
fslocate/indexer_test.go, line 144.

If you get a message like:

Function "fslocate.TestReadDatabaseProps" not defined.
Make breakpoint pending on future shared library load? (y or n)

You didn't get the path right. See side note #2 for some help.

Now that we have the breakpoint set, run the program to the breakpoint by typing run or r:

(gdb) r

Now you'll see that the TUI window jumps to the current line of code and highlights it:

Next advance line by line with n (or next) to step over to the next line where the function under test is called. Once there I step into it (s or step) and step over the next few lines until I'm here:

Now let's inspect some of our variables with print or p:

(gdb) p propsStr
$2 = {array = 0xc2000b3240 "username = midpeter444\npassword = jiffylube\n",
      len = 44, cap = 556}
(gdb) p err
$3 = {tab = 0x0, data = 0x0}
(gdb) p line
$4 = 0xc20009b7b0 "username = midpeter444"

The function ioutil.ReadFile returns a byte slice and error. Inspecting them shows the string value of the byte slice, its length and capactiy and that the error is nil (as indicated by its data value being 0x0).

So the file read worked. Then we read in the first line as a string and it looks good. Then I called:

prop := strings.Split(line, "=")

to split the line into a string slice. Inspecting this slice shows us the contents of the slice "struct", but not the contents of the underlying slice array:

(gdb) p prop
$5 = {array = 0xc2000b1260, len = 2, cap = 2}

To peek into the array in gdb, we can use standard array indexing:

(gdb) p prop.array[0]
$20 = 0xc20009b7b0 "username "
(gdb) p prop.array[1]
$21 = 0xc20009b7ba " midpeter444"

or we can use the dereferencing operator * and @N to look at N contiguous portions of the array. The inspection so far told us that there are two entries in the array, so this is how to see all of the array entries in one command:

(gdb) p *prop.array@2
$22 = {0xc20009b7b0 "username ", 0xc20009b7ba " midpeter444"}

And with that I can see the defect in my code: I need to trim the string before comparing to the cases in my switch statement.

`/* ---[ Looking into Go structs ]--- */`

You can also use dot notation to look into Go structs, using dereferencing where you have pointers rather than values.

For example here's a snippet of code from my fslocate program:

fse := fsentry.E{"/var/log/hive/foo", "f", false}

// query that new entry
dbChan <- dbTask{QUERY, fse, replyChan}
reply = <-replyChan

fse is a struct of type fsentry.E:

type E struct {
    Path       string // full path for file or dir
    Typ        string // DIR_TYPE or FILE_TYPE
    IsTopLevel bool   // true = specified in the user's config/index file
}

reply is a struct of type dbReply

type dbReply struct {
    fsentries []fsentry.E
    err       error
}

dbChan is a go channel (that takes dbTasks, another struct).

For that snippet of code I can inspect the contents in gdb:

(gdb) p fse
$1 = {Path = 0x637690 "/var/log/hive/foo", Typ = 0x61fd20 "f",
      IsTopLevel = false}

(gdb) p *dbChan
$3 = {qcount = 1, dataqsiz = 10, elemsize = 56, closed = 0 '\000',
      elemalign = 8 '\b', elemalg = 0x5646e0, sendx = 1, recvx = 0,
      recvq = {first = 0x0, last = 0x0}, sendq = {first = 0x0,
      last = 0x0}, runtime.lock = {key = 0}}

I had to use * on dbChan since channels in go are pointers (references).

The reply struct is a little more tricky:

(gdb) p reply          
$9 = {fsentries = {array = 0xc20007e640, len = 1, cap = 4},
      err = {tab = 0x0, data = 0x0}}

Here we see that it has two entries: fsentries and err. The error is null. fsentries is an array of length one, which using the techniques we walked through earlier can be inspected:

(gdb) p reply.fsentries.array[0]
$11 = {Path = 0xc2000cbfe0 "/var/log/hive/foo", Typ = 0xc2000005d8 "f",
       IsTopLevel = false}

`/* ---[ Inspecting "slices" of Go arrays ]--- */`

Suppose you have a large Go slice and you want to see elements 20 through 24 only. How would you do that? You can use the indexing operator and the @N operator together. Here's a slice of length three, where I look at the last two elements in the last command:

(gdb) p configDirs
$6 = {array = 0xc2000f8210, len = 3, cap = 3}

(gdb) p *configDirs.array@3
$7 = {0x618dc0 "/d/e", 0x627170 "/new/not/in/db", 0x626fd0 "/a/b/c/foo/bar"}

(gdb) p configDirs.array[1]@2
$8 = {0x627170 "/new/not/in/db", 0x626fd0 "/a/b/c/foo/bar"}

Hopefully, reading through a gdb tutorial and this post are enough to get you through your Go debugging sessions.

Side note 0: Key bindings with -tui

One tricky/annoying aspect of running with the TUI is that the arrow keys navigate around the TUI screen rather than on your command line. So to go back and forward in history (usually the up and down arrows) use the bash bindings Ctrl-p and Ctrl-n respectively. To go left and and right on the command line use the bash/emacs bindings Ctrl-f and Ctrl-b respectively.

Side note 1: python error when starting gdb

On Ubuntu, I get an error after gdb starts up stating:

File "/usr/local/go/src/pkg/runtime/runtime-gdb.py", line 358
print s, ptr['goid'], "%8s" % sts[long((ptr['status']))], blk.function

I've been able to ignore this and still use gdb with Go code just fine.

According to this thread, this is a python version issue used to build gdb and isn't an issue with the Go distribution. It may be specific to Ubuntu-flavored Linux distros. This posting in that thread says you can fix it with:

sudo 2to3 -w /usr/local/go/src/pkg/runtime/runtime-gdb.py

But when I did that I now get errors when I try to look inside a string, struct or array:

(gdb) p propsStr
$1 =  []uint8Python Exception <class 'TypeError'> 'gdb.Value' object cannot
      be interpreted as an integer:

So I recommend that you back up the /usr/local/go/src/pkg/runtime/runtime-gdb.py file before trying it in case you get the same error I did.

Side note 2: Function "fslocate.TestReadDatabaseProps" not defined.

If you get this type of error message when you set a breakpoint on a function:

Function "fslocate.TestReadDatabaseProps" not defined.    
Make breakpoint pending on future shared library load? (y or n)

Then you either got the path wrong or the notation to the path wrong, so type "n" and try again. First check your spelling.

If that doesn't fix it, you might have a nested path. For example in the fslocate project, I have a stringset subdirectory:

midpeter444 ~/lang/go/projects/src/fslocate
$ tree stringset/
stringset/
├── stringset.go
└── stringset_test.go

In my case the fslocate directory is in my $GOPATH/src directory and the test file is in the fslocate/stringset directory. If I were to go into that directory and compile the test for gdb,
the breakpoint path would be:

(gdb) b "fslocate/stringset.TestReadDatabaseProps"

Notice that you uses slashes for package separators and dots to indicate a function in a package.

Launching a Cascading job from Apache Oozie

2013-10-11T20:22:00.000-04:00

The Cascading framework has its own workflow management system embedded in it, so when I tried to find information online about how to launch a Cascading job from within the Apache Oozie workflow scheduler tool, I found a dearth of information.

In fact, when I asked on the oozie-users mailing list how to do it, the only response I got back was to write an Oozie extension to run Cascading jobs. That may be the right solution long term (don't know enough yet), but I did find a way to get it working with what Oozie provides today.

`/---[ Failed attempts ]---/`

I tried unsuccessfully to use the map-reduce action and the shell action. The former won't work because it wants you to specify the Mapper and Reducer classes explicitly. That doesn't make sense in a Cascading job - you launch your main Cascading class and it auto-generates a bunch of mappers and reducers. Secondly, while you can use the oozie.launcher.action.main.class property and specify your main Cascading class, there seems to be no way to pass arguments to it.

I'm not sure why I couldn't get the shell action to work. I made the exec property /usr/bin/hadoop in order to run it as hadoop jar myjar.jar com.mycompany.MyClass arg1 arg2 argN, but several attempts to make that work failed. There probably is a way to make it work, however.

`/---[ Solution: use the java action ]---/`

In order to launch Cascading jobs, we build an uber-jar (which maven annoyingly calls a shaded jar) that has our specific Cascading code and supporting objects, as well as the Cascading library all bundled in it. But that's not enough as all that depends on the myriad Hadoop jars. We then use the hadoop jar invocation as I indicated above because it puts all the Hadoop jars in the classpath.

I didn't think using the Oozie java action would work unless I built a massive uber jar with all the Hadoop dependencies which then have to get farmed around the Hadoop cluster each time you run it -- a great waste.

But I was happily surprised to notice that Oozie sets up the classpath for java (and map-reduce) tasks with all the Hadoop jars present.

So, here's the workflow.xml file that works:

<workflow-app xmlns='uri:oozie:workflow:0.2' name='cascading-wf'>
  <start to='stage1' />
  <action name='stage1'>
    <java>
      <job-tracker>${jobTracker}</job-tracker>
      <name-node>${nameNode}</name-node>

      <configuration>
        <property>
          <name>mapred.job.queue.name</name>
          <value>${queueName}</value>
        </property>
      </configuration>

      <main-class>com.mycompany.MyCascade</main-class>
      <java-opts></java-opts>
      <arg>/user/myuser/dir1/dir2</arg>
      <arg>my-arg-2</arg>
      <arg>my-arg-3</arg>
      <file>lib/${EXEC}#${EXEC}</file> 
      <capture-output />
    </java>
    <ok to="end" />
    <error to="fail" />
  </action>


  <kill name="fail">
    <message>FAIL: Oh, the huge manatee!</message>
  </kill>

  <end name="end"/>
</workflow-app>

The parameterized variables, such as ${EXEC}, are defined in a job.properties in the same directory as the workflow.xml file. The shaded jar is in a lib subdirectory as indicated.

  
 nameNode=hdfs://10.230.138.159:8020
 jobTracker=http://10.230.138.159:50300
  
 queueName=default
  
 oozie.wf.application.path=${nameNode}/user/${user.name}/examples/apps/cascading
 EXEC=mybig-shaded-0.0.1-SNAPSHOT.jar

Let me know if you find another way to launch a Cascading job from Oozie or find any problems with this solution.

Beautiful Code Ported to Go

2013-10-08T22:41:00.001-04:00

This week I've been learning Go - the programming language, not the game. I had studied its concurrency primitives for my Clojure library that would bring the CSP model to Clojure (this was before Rich Hickey and crew created core.async), but until a few days ago I hadn't formally studied the whole of Go with the intention of being proficient in it.

Go has pointers, but it does not have pointer arithmetic. Instead, it has slices - variable sized arrays on which you can use Python-like "slice" notation. I wanted a chance to try that out and found it recently when reading Chapter 1 of Beautiful Code, which is about a limited regular expression that Rob Pike (co-creator of Go) wrote for the Practice of Programming book he co-wrote with Brian Kernighan (who is also the author of Ch. 1 of Beautiful Code).

Pike's code is a limited (pedagogical) regex library that allows the following notation:

|------------+------------------------------------------------------------|
| Character  | Meaning                                                    |
|------------+------------------------------------------------------------|
| c          | Matches any literal character c                            |
| . (period) | Matches any single character                               |
| ^          | Matches the beginning of the input string.                 |
| $          | Matches the end of the input string                        |
| *          | Matches zero or more occurrences of the previous character |
|------------+------------------------------------------------------------|

`/* ---[ The C version ]--- */`

Here's Pike's code in C:


 #include <stdio.h>

 int matchstar(int c, char *regexp, char *text);

 /* matchhere: search for regexp at beginning of text */
 int matchhere(char *regexp, char *text) {
   if (regexp[0] == '\0')
     return 1;
   if (regexp[1] == '*')
     return matchstar(regexp[0], regexp+2, text);
   if (regexp[0] == '$' && regexp[1] == '\0')
     return *text == '\0';
   if (*text!='\0' && (regexp[0]=='.' || regexp[0]==*text))
     return matchhere(regexp+1, text+1);
   return 0;
 }


 /* matchstar: search for c*regexp at beginning of text */
 int matchstar(int c, char *regexp, char *text) {
   do {
     /* a * matches zero or more instances */
     if (matchhere(regexp, text))
       return 1;
   } while (*text != '\0' && (*text++ == c || c == '.'));
   return 0;
 }


 /* match: search for regexp anywhere in text */
 int match(char *regexp, char *text) {
   if (regexp[0] == '^')
     return matchhere(regexp+1, text);
   do {
     /* must look even if string is empty */
     if (matchhere(regexp, text))
       return 1;
   } while (*text++ != '\0');
   return 0;
 }

I'll let you read Ch. 1 of Beautiful Code for an analysis, but two things are noteworthy for my purposes:

Pike uses pointer arithmetic throughout the code
He uses the unusual do-while loop twice in only 30 or so lines of code

So I thought I'd port it to Pike's new language Go.

`/* ---[ My Go version ]--- */`


 package pikeregex

 // search for c*regex at beginning of text
 func matchstar(c rune, regex []rune, text []rune) bool {
     for {
         if matchhere(regex, text) {
             return true
         }
         if ! (len(text) > 0 && (text[0] == c || c == '.')) {
             return false
         }
         text = text[1:]
     }
 }

 // search for regex at beginning of text
 func matchhere(regex []rune, text []rune) bool {
     if len(regex) == 0 {
         return true
     }
     if len(regex) > 1 && regex[1] == '*' {
         return matchstar(regex[0], regex[2:], text)
     }
     if regex[0] == '$' && len(regex) == 1 {
         return len(text) == 0
     }
     if len(text) > 0  && (regex[0] == '.' || regex[0] == text[0]) {
         return matchhere(regex[1:], text[1:])
     }
     return false
 }

 // search for regex anywhere in the text
 func Match(regex string, text string) bool {
     runerx := compile(regex)
     runetxt := []rune(text)

     if len(runerx) > 0 && runerx[0] == '^' {
         return matchhere(runerx[1:], runetxt)
     }

     for {
         if matchhere(runerx, runetxt) {
             return true
         }
         if len(runetxt) == 0 {
             return false
         }
         runetxt = runetxt[1:]
     }
 }

 // one enhancement: allow + (1 or more) notation
 func compile(regex string) (regslc []rune) {
     regslc = make([]rune, 0, len(regex) + 10)

     for _, r := range regex {
         if r == '+' {
             regslc = append(regslc, regslc[len(regslc) - 1], '*')
         } else {
             regslc = append(regslc, r)
         }
     }   
     return regslc
 }

This is as straight a port as I could make it. And I think it translates well to Go. I've capitalized the Match method, as that is the public one to be exported to other libraries.

Instead of pointer arithmetic I used slice notation, as in this recursive call to matchhere:


 // C version
 return matchhere(regexp+1, text+1);

 // Go version
 return matchhere(regex[1:], text[1:])

Also in the C code you check whether you are at the end of the text string by looking for the NUL char: *text == '\0'. In Go, you can use the builtin len function: len(text) == 0. That statement is true if you keep recursively slicing text[1:] until you get to an empty string, or rather in my code, an empty slice of runes.

`/* ---[ Runeology ]--- */`

Runes are the Go 'char' type. A rune is an integer value identifying a Unicode code point. When you iterate over strings, you get runes, which are of variable size (number of bytes).

You have to be careful with strings in Go: text[2] returns the third byte, not the third rune in the string. If you want the third rune, you might try to use the utf8.DecodeRuneInString(text[2:]) function. But this would only work with ASCII, as you are slicing at the third byte and asking the utf8 library to parse the first rune from that point. But if the first rune in the string is two bytes long, you'll be getting the second rune in the string, not the third. If it's three bytes long, you're really in trouble.

The safest way is to do what I did in the code: convert the string to a slice of runes ([]rune) immediately and then work it that. Now when you index runeslice[2] you know you are getting the third rune.

`/* ---[ No do-while ]--- */`

Go doesn't have a do-while loop. It doesn't even have a while statement: just for. But, as Rob Pike reminded me in a critique of the first version of this blog entry, a do-while can be adequately mimicked with an infinite for loop:

 func matchstar(c rune, regex []rune, text []rune) bool {
     for {
         if matchhere(regex, text) {
             return true
         }
         if ! (len(text) > 0 && (text[0] == c || c == '.')) {
             return false
         }
         text = text[1:]
     }
 }

The stated intent of Go was to be as minimal as possible. Pike, in a recent podcast interview, said that the core team that created Go (which includes Ken Thompson) all had to agree that a feature was essential for it be included. Many candidate features were dropped, including the do-while loop. Of note, goto was not, which I find quite interesting. goto is only mentioned once (almost in passing) in the Effective Go guide, so I'm interested in what the essential use case for it was.

`/* ---[ One addition ]--- */`

Finally, in the Beautiful Code chapter, Kernighan suggests a number of enhancements the reader can make. I've only done one - allowing the + (1 or more) operator by mildly precompiling the regex, turning x+ into xx*, allowing me to use Pike's original (ported) code untouched.

The above code is available on GitHub: https://github.com/midpeter444/pikeregex

How to compile Groovy scripts and run them on systems with no Groovy installed

2013-09-25T23:01:00.000-04:00

`/---[ Problem ]---/`

This week I was faced with the need to write a Groovy script that would run on a Hadoop node at work, but we don't yet have groovy installed on the Hadoop nodes (and I don't have privileges to do that). Since Groovy is our defined scripting language, I had two options in the meantime:

Download the groovy zip package, just unzip it in my user directory on the Hadoop node and run my thing.
Compile the groovy script to bytecode and build an uber-jar with groovy in it and then run it like a Java program (with java -cp myjar.jar blah blah blah).

`/---[ Solution ]---/`

Since the second sounded like more of a challenge and would teach me a few things I hadn't done yet, I picked that. It worked out - here's my cheat sheet for future reference.

`/---[ Quoth the Maven ]---/`

Create a new maven project:

mvn archetype:generate -DarchetypeArtifactId=maven-archetype-quickstart \
-DinteractiveMode=false -DgroupId=net.thornydev -DartifactId=script

`/---[ Two plugins ]---/`

To compile groovy to bytecode use the groovy-eclipse-compiler plugin. Yes, I know that sounds weird, but you don't need to fire up Eclipse. You don't even need to have Eclipse installed.

To build the uberjar having your compiled script and all of groovy, use the maven-shade-plugin. Like most things about maven, I find name "shade-plugin" irritating, but it gets the job done.

Finally include groovy-all.jar as a dependency.

Here's my pom:

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
  <modelVersion>4.0.0</modelVersion>
  <groupId>net.thornydev</groupId>
  <artifactId>script</artifactId>
  <packaging>jar</packaging>
  <version>1.0</version>
  <name>script</name>
  <url>http://maven.apache.org</url>
  <build>
    <plugins>
      <plugin>
        <artifactId>maven-compiler-plugin</artifactId>
        <version>2.3.2</version>
        <configuration>
          <compilerId>groovy-eclipse-compiler</compilerId>
        </configuration>
        <dependencies>
          <dependency>
            <groupId>org.codehaus.groovy</groupId>
            <artifactId>groovy-eclipse-compiler</artifactId>
            <version>2.7.0-01</version>
          </dependency>
        </dependencies>
      </plugin>
      <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-shade-plugin</artifactId>
        <version>2.1</version>
        <executions>
          <execution>
            <phase>package</phase>
            <goals>
              <goal>shade</goal>
            </goals>
          </execution>
        </executions>
      </plugin>      
    </plugins>
  </build>
  <dependencies>
    <dependency>
      <groupId>org.codehaus.groovy</groupId>
      <artifactId>groovy-all</artifactId>
      <version>2.0.7</version>      
    </dependency>
  </dependencies>
</project>

`/---[ Treat groovy like a first class citizen ]---/`

Put your groovy file(s) in the src/main/java directory, not src/main/groovy like that other Groovy compiler tool wants.

The directory structure:

$ tree
.
├── pom.xml
└── src
    ├── main
    │   └── java
    │       └── net
    │           └── thornydev
    │               └── GroovyApp.groovy

The Groovy code:

package net.thornydev;

class Script {

  def main(args) {
    println "Hello ${args[0]}. I'm groovy."
  }
}

new Script().main(args)

`/---[ Package, push, run ]---/`

Next: mvn clean package

Then scp the script-1.0.jar in the target dir to your desired system and run it:

$ java -cp script-1.0.jar net.thornydev.GroovyApp thornydev
Hello thornydev. I'm groovy.

QED.

VMWare Player Crashes in Ubuntu After Kernel Upgrade

2013-08-17T11:19:00.000-04:00

`/---[ Annoyance ]---/`

With Xubuntu 13.04, every time I get a kernel upgrade, which seems to happen at least once a month, my VMWare Player no longer works. I'm sure this is not specific to Xubuntu - probably any Ubuntu-13.04 based distro will have this problem.

The first time it happened, I spent a while trying to fix it and get it to recompile and then ended up deciding to uninstall and reinstall, but even that was a mess because the vmware-uninstaller doesn't work and tells you to use the installer and then I downloaded a really old version of VMWare Player that for some reason VMWare still has up and and came up as a top choice in google. When I installed that, it complained about all sorts of kernel issues, including that it wouldn't install on a system with KVM enabled, so I disabled KVM and it still wouldn't work. Then I figured out I had a really old version (like version 2 instead of 5!).

`/---[ Solution ]---/`

The solution is quite simple, so I'm documenting it here to save others headache.

If you have a kernel upgrade and VMWare player won't start, this is what I do:

Uninstall: sudo vmware-installer -u vmware-player
Download the latest VMPlayer for Linux from here: https://www.vmware.com/products/player/
Reinstall: sudo ./VMware-Player-5.0.2-1031769.x86_64.bundle

Only takes a few minutes and all is back to normal.

Querying JSON records via Hive

2013-07-05T14:56:00.000-04:00

`/* ---[ Opacity: A brief rant ]--- */`

Despite the popularity of Hadoop and its ecosystem, I've found that much of it is frustratingly underdocumented or at best opaquely documented. An example proof of this is the O'Reilly Programming Hive book, whose authors say they wrote it because so much of Hive is poorly documented and exists only in the heads of its developer community.

But even the Programming Hive book lacks good information on how to effectively use Hive with JSON records, so I'm cataloging my findings here.

`/* ---[ JSON and Hive: What I've found ]--- */`

I've only been playing with Hive about two weeks now, but here's what I found with respect to using complex JSON documents with Hive.

Hive has two built-in functions, get_json_object and json_tuple, for dealing with JSON. There are also a couple of JSON SerDe's (Serializer/Deserializers) for Hive. I like this one the best: https://github.com/rcongiu/Hive-JSON-Serde

I will document using these three options here.

Let's start with a simple JSON document and then move to a complex document with nested subdocuments and arrays of subdocuments.

Here's the first document:

{
    "Foo": "ABC",
    "Bar": "20090101100000",
    "Quux": {
        "QuuxId": 1234,
        "QuuxName": "Sam"
    }
}

We are going to store this as a Text document, so it is best to have the whole JSON entry on a single line in the text file you point the Hive table to.

Here it is on one line for easy copy and pasting:

{"Foo":"ABC","Bar":"20090101100000","Quux":{"QuuxId":1234,"QuuxName":"Sam"}}

Let's create a Hive table to reference this. I've put the above document in a file called simple.json:

CREATE TABLE json_table ( json string );

LOAD DATA LOCAL INPATH  '/tmp/simple.json' INTO TABLE json_table;

Since there are no delimiters, we leave off the ROW FORMAT section of the table DDL

Built in function #1: get_json_object

The get_json_object takes two arguments: tablename.fieldname and the JSON field to parse, where '$' represents the root of the document.

select get_json_object(json_table.json, '$') from json_table;

Returns the full JSON document.

So do this to query all the fields:

select get_json_object(json_table.json, '$.Foo') as foo, 
       get_json_object(json_table.json, '$.Bar') as bar,
       get_json_object(json_table.json, '$.Quux.QuuxId') as qid,
       get_json_object(json_table.json, '$.Quux.QuuxName') as qname
from json_table;

You should get the output:

foo    bar              qid     qname
ABC    20090101100000   1234    Sam

(Note: to get the header fields, enter set hive.cli.print.header=true at the hive prompt or in your $HOME/.hiverc file.)

This works and has a nice JavaScript like "dotted" notation, but notice that you have to parse the same document once for every field you want to pull out of your JSON document, so it is rather inefficient.

The Hive wiki recommends using json_tuple for this reason.

Built in function #2: json_tuple

So let's see what json_tuple looks like. It has the benefit of being able to pass in multiple fields, but it only works to a single level deep. You also need to use Hive's slightly odd LATERAL VIEW notation:

select v.foo, v.bar, v.quux, v.qid 
from json_table jt
     LATERAL VIEW json_tuple(jt.json, 'Foo', 'Bar', 'Quux', 'Quux.QuuxId') v
     as foo, bar, quux, qid;

This returns:

foo  bar             quux                              qid
ABC  20090101100000  {"QuuxId":1234,"QuuxName":"Sam"}  NULL

It doesn't know how to look inside the Quux subdocument. And this is where json_tuple gets clunky fast - you have to create another lateral view for each subdocument you want to descend into:

select v1.foo, v1.bar, v2.qid, v2.qname 
from json_table jt
     LATERAL VIEW json_tuple(jt.json, 'Foo', 'Bar', 'Quux') v1
     as foo, bar, quux
     LATERAL VIEW json_tuple(v1.quux, 'QuuxId', 'QuuxName') v2
     as qid, qname;

This gives us the output we want:

foo  bar             qid   qname
ABC  20090101100000  1234  Sam

With a complicated highly nested JSON doc, json_tuple is also quite inefficient and clunky as hell. So let's turn to a custom SerDe to solve this problem.

The best option: rcongiu's Hive-JSON SerDe

A SerDe is a better choice than a json function (UDF) for at least two reasons:

it only has to parse each JSON record once
you can define the JSON schema in the Hive table schema, making it much easier to issue queries against.

I reviewed a couple of SerDe's and by far the best one I've found is rcongiu's Hive-JSON SerDe.

To get that SerDe, clone the project from GitHub and run mvn package. It creates a json-serde-1.1.6.jar in the target directory. If you have a place you like to put your jars for runtime referencing move it there.

Then tell Hive about it with:

ADD JAR /path/to/json-serde-1.1.6.jar;

You can do this either at the hive prompt or put it in your $HOME/.hiverc file.

Now let's define the Hive schema that this SerDe expects and load the simple.json doc:

CREATE TABLE json_serde (
  Foo string,
  Bar string,
  Quux struct<QuuxId:int, QuuxName:string>
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe';

LOAD DATA LOCAL INPATH '/tmp/simple.json' INTO TABLE json_serde;

With the openx JsonSerDe, you can define subdocuments as maps or structs. I prefer structs, as it allows you to use the convenient dotted-path notation (e.g., Quux.QuuxId) and you can match the case of the fields. With maps, all the keys you pass in have to be lowercase, even if you defined them as upper or mixed case in your JSON.

The query to match the above examples is beautifully simple:

SELECT Foo, Bar, Quux.QuuxId, Quux.QuuxName
FROM json_serde;

Result:

foo  bar             quuxid  quuxname
ABC  20090101100000  1234    Sam

And now let's do a more complex JSON document:

{
  "DocId": "ABC",
  "User": {
    "Id": 1234,
    "Username": "sam1234",
    "Name": "Sam",
    "ShippingAddress": {
      "Address1": "123 Main St.",
      "Address2": null,
      "City": "Durham",
      "State": "NC"
    },
    "Orders": [
      {
        "ItemId": 6789,
        "OrderDate": "11/11/2012"
      },
      {
        "ItemId": 4352,
        "OrderDate": "12/12/2012"
      }
    ]
  }
}

Collapsed version:

{"DocId":"ABC","User":{"Id":1234,"Username":"sam1234","Name":"Sam","ShippingAddress":{"Address1":"123 Main St.","Address2":"","City":"Durham","State":"NC"},"Orders":[{"ItemId":6789,"OrderDate":"11/11/2012"},{"ItemId":4352,"OrderDate":"12/12/2012"}]}}

Hive Schema:

CREATE TABLE complex_json (
  DocId string,
  User struct<Id:int,
              Username:string,
              Name: string,
              ShippingAddress:struct<Address1:string,
                                     Address2:string,
                                     City:string,
                                     State:string>,
              Orders:array<struct<ItemId:int,
                                  OrderDate:string>>>
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe';

Load the data:

    LOAD DATA LOCAL INPATH '/tmp/complex.json' 
    OVERWRITE INTO TABLE complex_json;

First let's query something from each document section. Since we know there are two orders in the orders array we can reference them both directly:

SELECT DocId, User.Id, User.ShippingAddress.City as city,
       User.Orders[0].ItemId as order0id,
       User.Orders[1].ItemId as order1id
FROM complex_json;

Result:

docid  id    city    order0id  order1id
ABC    1234  Durham  6789      4352

But what if we don't know how many orders there are and we want a list of all a user's order Ids? This will work:

SELECT DocId, User.Id, User.Orders.ItemId
FROM complex_json;

Result:

docid  id    itemid
ABC    1234  [6789,4352]

Oooh, it returns an array of ItemIds. Pretty cool. One of Hive's nice features.

Finally, does the openx JsonSerDe require me to define the whole schema? Or what if I have two JSON docs (say version 1 and version 2) where they differ in some fields? How constraining is this Hive schema definition?

Let's add two more JSON entries to our JSON document - the first has no orders; the second has a new "PostalCode" field in Shipping Address.

{
    "DocId": "ABC",
    "User": {
        "Id": 1235,
        "Username": "fred1235",
        "Name": "Fred",
        "ShippingAddress": {
            "Address1": "456 Main St.",
            "Address2": "",
            "City": "Durham",
            "State": "NC"
        }
    }
}

{
    "DocId": "ABC",
    "User": {
        "Id": 1236,
        "Username": "larry1234",
        "Name": "Larry",
        "ShippingAddress": {
            "Address1": "789 Main St.",
            "Address2": "",
            "City": "Durham",
            "State": "NC",
            "PostalCode": "27713"
        },
        "Orders": [
            {
                "ItemId": 1111,
                "OrderDate": "11/11/2012"
            },
            {
                "ItemId": 2222,
                "OrderDate": "12/12/2012"
            }
        ]
    }
}

Collapsed version:

{"DocId":"ABC","User":{"Id":1235,"Username":"fred1235","Name":"Fred","ShippingAddress":{"Address1":"456 Main St.","Address2":"","City":"Durham","State":"NC"}}}
{"DocId":"ABC","User":{"Id":1236,"Username":"larry1234","Name":"Larry","ShippingAddress":{"Address1":"789 Main St.","Address2":"","City":"Durham","State":"NC","PostalCode":"27713"},"Orders":[{"ItemId":1111,"OrderDate":"11/11/2012"},{"ItemId":2222,"OrderDate":"12/12/2012"}]}}

Add those records to complex.json and reload the data into the complex_json table.

Now try the query:

SELECT DocId, User.Id, User.Orders.ItemId
FROM complex_json;

It works just fine and gives the result:

docid  id    itemid
ABC    1234  [6789,4352]
ABC    1235  null
ABC    1236  [1111,2222]

Any field not present will just return null, as Hive normally does even for non-JSON formats.

Note that we cannot query for User.ShippingAddress.PostalCode because we haven't put it on our Hive schema. You would have to revise the schema and then reissue the query.

`/* ---[ A tool to automate creation of Hive JSON schemas ]--- */`

One feature missing from the openx JSON SerDe is a tool to generate a schema from a JSON document. Creating a schema for a large complex, highly nested JSON document is quite tedious.

I've created a tool to automate this: https://github.com/midpeter444/hive-json-schema.

Signing and Promoting your Clojure libraries on Clojars

2013-03-03T22:36:00.001-05:00

Phil Hagelberg, the creator and primary maintainer of Leiningen, has been advocating that Clojurians sign their Clojure libraries for the releases repository in Clojars. By itself, this isn't sufficient to provide security to avoid malicious code from causing havoc with public code repositories, but it is a necessary first step. Phil has talked about his ideas on how to get to a more complete model of security in a couple of places:

Exhibit A: Posting on Clojars-maintainers Google Group
Exhibit B: Mostly Lazy podcast Episode 8
Exhibit C: Clojure Conj 2012 presentation

`/* ---[ Signing your Clojure libraries ]--- */`

My first experience deploying a signed jar to Clojars was a little rocky, so I'm providing this how-to report to help others (including future me).

I have only done this on (Xubuntu) Linux, but I imagine it will work fairly similarly on Macs. Not sure about Windows, as I seem to have constant trouble getting Clojure and Windows to play nicely together. I have used GPG on Windows that comes with mysysgit, so that will probably work with these instructions as well, but I haven't tried it.

`/* ---[ STEP 1: Generate GPG Keys ]--- */`

Clojars security is based on PGP keys, so you need to a have a PGP public/private keyset. GnuPG (GPG) is the generally recommended tool for that.

If you already have GPG installed and can't remember if you've already created a keyset, try this first:

gpg --list-keys

If you see your name and email in the list, then you have. If not, generate them with:

gpg --gen-key

Accepting the defaults you are prompted with is fine. See this article for details on this step. When completed this will create your public key ring and secret/private key ring:

$ ls ~/.gnupg
pubring.gpg  pubring.gpg~  random_seed  secring.gpg  trustdb.gpg

`/* ---[ STEP 2: Publish your public GPG key to a keyserver ]--- */`

By publishing your public key, others can download it and verify that your signed library is in fact signed by you.

To publish your key you will need to get its ID.

$ gpg --list-keys
/home/midpeter444/.gnupg/pubring.gpg
------------------------------------
pub   2048R/5414B325 2012-11-12
uid                  Michael Peterson <myemail@fubar.com>

The 8 characters after the '/' on the "pub" row of your key is your key's ID. Now publish it:

$ gpg --send-key 5414B325

If you don't specify a key server it will choose the GnuPG keyserver. If you want to target a specify keyserver use the --keyserver option as shown here.

`/* ---[ STEP 3: Add your GPG key to your Clojars account ]--- */`

When you sign up for Clojars there is a section in your Profile to add two keys: 1) an SSH public key and a PGP public key. The SSH key is for secure transport of the library from your system to the Clojars repo via scp. It is not related to signing your jars.

Your library will be signed with your PGP private key that resides only on your system. That signature indicates that the owner of the private key (the one paired with the public key you just published) signed this code artifact. It allows someone else to know who signed it and whether the code artifact has been changed since it was signed and deployed.

By having your PGP public key on Clojars, you allow Clojars to verify that one of its members signed the artifact. This check happens when you promote your release to the release repo (more on that below).

Note: Clojars is not a keyserver, so putting it there will not allow others to verify the signature. That is why in step 2 we published it to a public keyserver.

To add your public key to Clojars you create an "ASCII-armored" version of the binary public key, which you generate with:

gpg --armor --export your@email.here code

Once you have it, what exactly do you paste into the Clojars text box? The BEGIN and END delimiter lines and everything in between, like so:

-----BEGIN PGP PUBLIC KEY BLOCK-----
Version: GnuPG v1.4.11 (GNU/Linux)

mQENBFChf/ABCAC/2nK75NwOsg7nkI5NNTCqBMk5DMX0JWu17EZoii/6vH88KlTm
0xeIHwv3leMZbtjqTNzFPfGh5xQo7zH+Y2CBPG8gq9QKv9aB587vuzwCtN/uaP6Z
mjQlafc5HK8gn5PMULWJC0V46g+y39g8bDSEZDInGFFWF7kCOXMcsJnNuoXWbajz
WwV8lcr56EKeenRS3lV4GWd/W+aSjCkaq1SM+9XP3qZYC9lOuaYfkzxfTsf5hpvG
wfTJVOaaPDtfhefgzrK6+znvMC1TsKMKU8bpX7u9WaHn9jD24UE6idzSn84uPuNK
5Jms4r7r6y+kfMSrWK0KUH+Gp0Bs+6kVu6S1ABEBAAG0J01pY2hhZWwgUGV0ZXJz
b24gPG1wZXRlcnNvbjJAZ21haWwuY29tPokBOAQTAQIAIgUCUKF/8AIbAwYLCQgH
AwIGFQgCCQoLBBYCAwECHgECF4AACgkQeBe9+DhXuBiGzAf/aC43wc/TrNSMeWkN
6X92YpPu8SYh1bcDOEm7FvBSWZg/NSf4VBNqP6TXjobIGfSX8hFGrgrkB/ZDMY6N
Ec9UxpnhVC2gOn9TZzOCNfbvN4SAcBWm9vfABEQxIcXsOXEGxLWsW3FSeK2fp5Iv
S19eQ6Z6N2jw/H6xLpd5Zrvw4vAROOVKiYNkQKkqU95hqJQz+9xPOBwDIIL2isQ2
qd0fgDryue7D31XJ/Qrwxa++I70ew4u3TqYboUAL6aTIAxSGmMlbk2CDvVusRUw5
lrN7qxWejq61Qlhx+l9xEEBcq5HflQZpFENn95xT6l6IiLiiEWT4Gju4EwZz+CUO
pe99QrkBDQRQoX/wAQgAq6SDCbXHh6GKFnb1as0zzlngwv6MiA5eaY+83qOgeXov
UVOZBQU8vBmVuF/3Pd5Q7asTOy+40sBYcuCwsMvtXPX0s7A0pdSn5A7DFelVRM5y
oQheASDCtlnp1xpL/8GTr/YuYlQSgC5zqcv23FatrKQ5ljPDV+tbe40T0HQ1491x
g+QPmnS4jofcOGBJ/AcAPAXU17zEiip/JmDOGfpvAf+igRNW5nyGfCkkrHeAaovR
tkqMMtq3YZBBrfgYIuGOZYzIz/lOCDyVb6QP2B/rn6ub5UeB0oYJa98uW7Zmx1vn
ZIPgtbDFRoAj2NV1JEAgmZABcYYQVVpRuvyEC+94FwARAQABiQEfBBgBAgAJBQJQ
oX/wAhsMAAoJEHgXvfg4V7gYfcMH/3hiqNPHlb9FxY4p8gIj6JWdj++CXXjRg4Re
4QWP/JvRH5v4z8DLstcJmezgerHyFqSb7ylo108qONW+x9Q1tNRe+ey9YOeg4581
tdXLMPaGjU0jz5aCKnKQR7LJjOTS4SPPU4dYURDUUkmKgU4tmbQVdkXyT45rCh6b
tB655w1aYSLbA93E3DKkdqoN1gCTlwzsiayLsu1kiWSUopOlPKcwLjyo1OpRC2ph
3T7RuF+whq/NQ8SYSz6GgWh8tSMt/SDpJ5/YOveyH7iAuwcL4pNgGYSjAPklSolp
UZwJPsLOqDSxnlc7RKwX9hsdDL7tybYAX2P7BOGpoNDeN1ZMIEA=
=Kyc/
-----END PGP PUBLIC KEY BLOCK-----

`/* ---[ STEP 4: Prepare your project and its metadata ]--- */`

With Clojars you can publish SNAPSHOTS or releases. The latter can be "promoted" if you meet all the criteria in your project.clj, which are:

you cannot have the word SNAPSHOT in your version
you should have your license filled in
you need to have the :scm section filled in
- you can either do this manually, as in the example below
- or lein in theory can automatically do this for you if you are using GitHub and its remote "ID" is origin (though I've had issues even in that case)

Here is an example project.clj:


 (defproject thornydev/go-lightly "0.4.0"
   :description "Clojure library to facilitate CSP concurrent programming based on Go concurrency constructs"
   :url "https://github.com/midpeter444/go-lightly"
   :license {:name "Eclipse Public License"
             :url "http://www.eclipse.org/legal/epl-v10.html"}
   :profiles {:dev {:dependencies [[criterium "0.3.1"]]}}
   :dependencies [[org.clojure/clojure "1.5.0"]]
   :scm {:name "git"
         :url "https://github.com/midpeter444/go-lightly"})

`/* ---[ STEP 5: Commit your code ]--- */`

Make sure you have committed all your changes into Git (or Hg, SVN or whatever SCM you are using). Tag the release if you are so inclined and (optional) push it to GitHub or your remote or central hosting server.

`/* ---[ STEP 6: Deploy to Clojars ]--- */`

From the top of your project directory, enter:

lein deploy clojars

In my case, my gpg-agent prompted me twice for my GPG passphrase and then the deploy happened.

When you do this lein will create a pom and a jar and upload those to Clojars. That pom.xml should include SCM information that looks like this:


  <scm>
    <tag>12f653361a88c4df14</tag>
    <url>https://github.com/midpeter444/go-lightly</url>
  </scm>

The tag there should be the SHA1 of the last commit (in the case of Git). Note: Don't confuse it with a "tag" that you create with "git tag".

If the deploy was successful, your jar should be signed and (possibly) ready for Promotion.

`/* ---[ STEP 7: Check whether your jar was signed ]--- */`

Create a new lein project and make your deployed library one of its dependencies. Then in that new project run:

$ lein deps :verify
:signed [criterium "0.3.1"]
:unsigned [enlive "1.0.1"]
:signed [org.clojure/tools.macro "0.1.1"]
:signed [org.clojure/clojure "1.5.0"]
:bad-signature [thornydev/go-lightly "0.4.0"]

You see that some are signed and some are not. Obviously, you want yours to say :signed. If it has unsigned then you are probably either using Lein 1 or you didn't generate your GPG keys. If it has :bad-signature then something got corrupted on the Clojars server. In my case above, I promoted and tried to redeploy, which uncovered a bug in lein/clojars that caused some files to get overwritten when they shouldn't. This issue should be fixed soon. If you do have that problem, delete your local copy from your ~/.m2 directory and contact someone on the #leiningen IRC channel.

`/* ---[ Optional STEP 8: Promote to release status ]--- */`

If you are eligible to promote to release status, you will see a "Promote" button on your Clojars page. If you are not, you may be missing SCM information, which is what happened to me recently.

Note that once you promote you can no longer deploy to that version again, so make sure you're ready to make it immutable. After that, you can only add new versions.

Go Concurrency Constructs in Clojure, part 4: idioms and tradeoffs

2013-01-13T21:12:00.002-05:00

The Go approach to concurrent software can be characterized as: Don't communicate by sharing memory, share memory by communicating.... You use the channel to pass the data back and forth between the Go routines and make your concurrent program operate that way.

--Rob Pike, Google IO 2012 conference talk

Programmers know the benefits of everything and the tradeoffs of nothing.

--Rich Hickey, Strangeloop 2011 talk "Simple Made Easy"

`/* ---[ Functional Clojure Idioms ]--- */`

In the preface to Eloquent Ruby, Russ Olsen relates a story that after teaching a Ruby class one of his students complained that his Ruby programs tended to end up looking like his Java programs. I remember that same experience when I first learned Ruby in the early 2000s. In fact, I set up conventions for myself in Ruby to try to "force" it to be more like Java. I hadn't fully grokked that changing languages does not mean just learning the syntax and libraries. It means adopting the idioms, the approaches and even the constraints that the designer put into the language and that have arisen in the language community. It often means changing the way you think. That is certainly true of Clojure, perhaps more than any language I've ever learned. (Side note: I haven't learned Haskell yet!)

Go is intended primarily to be systems-programming language, with a strong focus on writing concurrent server programs. While it does include some more "modern" functional features, such as closures and first class functions, it is not a functional programming language.

These blog posts and the go-lightly library are my attempt to think about how to adopt a Go-like CSP concurrency programming style into Clojure. But we should also think about when to adopt this style of programming. I don't have an answer yet and I'm writing this library to explore this area.

The Go-channel model is a message passing model, which you could view as a poor-man's Actor model, something Rich Hickey considered for the Clojure language and decided to leave out. He outlines those reasons in the clojure.org/state page (see the "Message Passing and Actors" section in particular).

On the blog lead-in above, I quote Pike saying that Go's model is to share memory by communicating, rather than the other way around. Hickey argues that the message-passing model is a complex model. Remember that "complex", in the Clojure community, is an objective measure of how entwined things are. Sharing memory by communicating is more complex as you have to coordinate entities, particularly if you have blocking waits. If one depends directly on the other, you have an entangled system. Coordinating multiple entities can be difficult and with blocking operations can lead to deadlocks. If you use timeouts to overcome potential deadlocks, then you have to add special logic to your code to deal with it. Sharing memory with immutable values or STM-protected values is often a simpler, less complected, model.

Go (synchronous) channels are for synchronizing threads or routines. When you need to synchronize in other languages you have constructs like CountDownLatches, CyclicBarriers, waiting on a future, a join in a fork-join model or, the lowest level, mutexes and semaphores. The synchronous channel provides an easier model that also allows message passing. But remember that easy is not necessarily simple [footnote 1] and consider the tradeoffs.

`/* ---[ Channels as Queues ]--- */`

Go buffered channels, on the other hand, are not synchronous communication tools. They are queues for asynchronous workflows. By decoupling producer(s) and consumer(s), they are less complected. Hickey, in his Simple Made Easy talk, has a table listing paired concepts where one is more complex and the other simple. On his chart, Actors are juxtaposed with queues: Actors are complex, queues are simple. And in the 2012 Clojure conj keynote, Hickey stated that queues have been underemphasized in the Clojure community so far.

Thus, as far as channels go, the asynchronous buffered ones are more idiomatic in Clojure than synchronous channels. In fact, an async concurrent queue is used in the Clojure Programming book's webcrawler example. For contrast, I implemented this webcrawler example using go-lightly.

On the other hand, from what I've seen, synchronous channels are very idiomatic in Go, and perhaps even preferred over buffered channels. That is the impression I've gotten from watching Pike's talks and reading a few threads on the golang mailing list. For example, see this thread where one participant says:

Go channels can be asynchronous, but most of the time that's not
what you want. When communicating between goroutines running on the
same machine a synchronous send/recv improves program flow. Synchronous
channels have a lot of advantages by making program flow predictable
and easier to think about.

I'll leave it there as an open question deserving careful thought.

`/* ---[ Making channels more idiomatic ]--- */`

As I've been doing various example programs with go-lightly, I've noticed that the code structure can be more imperative than functional, in part because channels are not composable data structures. You can't pass a channel to map, reduce or filter, since it does not implement the ISeq interface.

To remedy this, I've added four functions to go-lightly to allow you to treat it like a seq when retrieving from it.

The first two functions convert the current values on a channel to a seq or a vector without removing them from the channel. The latter two functions remove or drain the channel either immediately (non-lazy) or as you read from it in a lazy fashion.

(channel->seq chan)
"Takes a snapshot of all values on a channel without removing the values from the channel. Returns a (non-lazy) seq of the values.

(channel->vec chan)
"Takes a snapshot of all values on a channel without removing the values from the channel. Returns a vector of the values."

(drain chan)
"Removes all the values on a channel and returns them as a non-lazy seq."

(lazy-drain chan)
"Lazily removes values from a channel. Returns a Cons lazy-seq until it reaches the end of the channel (as determined by getting a nil value when asking for the next value on the channel)."

All the sequences will end once a nil value is pulled off the queue, which represents the end of the queue. Since the lazy-drain function is lazy if something else added to the queue before the end of the queue is reached, it will read that additional value, where the non-lazy drain method will not.

A REPL session will illustrate how these work.

First let's look at channel->seq using a buffered channel:


  ;; define a channel with capacity of 7
  user=> (def ch (go/channel 7))
  #'user/ch
  user=> (dotimes [i 6] (.put ch i))
  nil
  user=> ch
  #<LinkedBlockingQueue [0, 1, 2, 3, 4, 5]>

  ;; grab the values into a non-lazy seq
  user=> (def cseq (go/channel->seq ch))
  #'user/cseq
  user=> cseq
  (0 1 2 3 4 5)
  user=> (type cseq)
  clojure.lang.ArraySeq

  ;; the values have not been removed from the channel
  user=> ch
  #<LinkedBlockingQueue [0, 1, 2, 3, 4, 5]>

  ;; if a value is removed from the channel the seq is not affected
  user=> (.take ch)
  0
  user=> ch
  #<LinkedBlockingQueue [1, 2, 3, 4, 5]>
  user=> cseq
  (0 1 2 3 4 5)

channel->vec behaves the same way except it returns a vector, not a seq.

Next let's look at the drain functions using a buffered channel:


  user=> (def ch (go/channel 7))
  #'user/ch
  user=> (dotimes [i 6] (.put ch i))
  nil
  user=> ch
  #<LinkedBlockingQueue [0, 1, 2, 3, 4, 5]>

  ;; calling drain returns a seq of all the values on the
  ;; channel and removes them
  user=> (def dseq (go/drain ch))
  #'user/dseq
  user=> (type dseq)
  clojure.lang.IteratorSeq
  user=> dseq
  (0 1 2 3 4 5)
  user=> ch
  #<LinkedBlockingQueue []>

  ;; add more elements to the queue; the seq is not affected
  user=> (dotimes [i 6] (.put ch (+ 100 i)))
  nil
  user=> ch
  #<LinkedBlockingQueue [100, 101, 102, 103, 104, 105]>
  user=> dseq
  (0 1 2 3 4 5)

  ;; now let's lazily drain the queue into a lazy-seq Cons
  user=> (def zseq (go/lazy-drain ch))
  #'user/zseq
  user=> (type zseq)
  clojure.lang.Cons

  ;; realize the first three elements - takes only those
  ;; off the channel
  user=> (take 3 zseq)
  (100 101 102)
  user=> ch
  #<LinkedBlockingQueue [103, 104, 105]>

  ;; take more than are on the channel - get only what's available
  user=> (take 100 zseq)
  (100 101 102 103 104 105)
  ;; the channel is now empty
  user=> ch
  #<LinkedBlockingQueue []>

  ;; what if we try to take/read them again? They are still
  ;; in the lazy-seq since it caches the results
  user=> (take 100 zseq)
  (100 101 102 103 104 105)

  ;; we can use higher order functions - composability!
  user=> (map str (filter odd? zseq))
  ("101" "103" "105")

These functions also work with synchronous channel, but are not as useful. In particular lazy-seq faces a race condition with producers that try to transfer multiple consecutive values as shown below:


  ;; create a synchronous channel
  user=> (def c (go/channel))
  #'user/c

  ;; queue up 6 values to be put onto the queue but
  ;; only one can go on at a time waiting for a consumer
  user=> (go/go (dotimes [i 6] (.transfer c i)))
  #<core$future_call$reify__6110@5d47522a: :pending>
  user=> c

  ;; channel->vec and channel->seq will grab one value since
  ;; a producer is waiting for a consumer
  #<LinkedTransferQueue [0]>
  user=> (go/channel->vec c)
  [0]
  user=> c
  #<LinkedTransferQueue [0]>

  ;; drain also takes only the first value and also removes it
  ;; from the channel, allowing the next val to be put on the channel
  user=> (go/drain c)
  (0)
  user=> c
  #<LinkedTransferQueue [1]>

  ;; lazy-drain looks like it works!
  user=> (take 2 (go/lazy-drain c))
  (1 2)
  user=> c
  #<LinkedTransferQueue [3]>

  ;; but it has a race condition with the producer, so may
  ;; not get everything we "queued" up to transfer
  user=> (go/lazy-drain c)
  (3 4)
  user=> c
  #<LinkedTransferQueue [5]>

`/* ---[ Next ]--- */`

I've now created a go-lightly wiki with fairly extensive documentation and I've implemented a number of example applications using go-lightly.

A couple of things you may want to look into if you find this topic interesting:

I've added formal abstractions for the Channel types in go-lightly. Channel, BufferedChannel and TimeoutChannel all implement the GoChannel protocol.
As mentioned above, I have done a go-lightly centric implementation of a simple web crawler app based on the example at the end of Ch. 4 from the O'Reilly Clojure Programming book. This will provide a good contrast between the two concurrency approaches.
I have added the ability to preferentially read from one or more channels in a select or selectf.
I implemented Pike's Chinese Whispers example in go-lightly to see how many "Go routines" could be spawned in Clojure compared to Go. This is certainly an area where the JVM is less powerful than Go.

`/* ---[ Resources and Notes ]--- */`

[1] If you've spent much time in the Clojure community, you know I'm referring to the distinction that Hickey drew between the concepts of easy, a subjective concept, and simple, an objective one in his Simple Made Easy presentation. If you haven't watched it, well, listen to Bodil. (Jump back)

Blog entries in this series:

The Clojure go-lightly library on GitHub: https://github.com/midpeter444/go-lightly

Go Concurrency Constructs in Clojure, part 3: why go-lightly?

2013-01-12T13:49:00.000-05:00

There's a legendary example called the concurrent prime sieve, which is kind of an amazing thing. It was the first truly beautiful concurrent program I think I ever saw.

--Rob Pike, Google IO 2012 conference talk

`/* ---[ Why go-lightly? ]--- */`

In part 1 and part 2 of this blog series, I introduced the basics of Go channels, Go routines and the Go select statement. I then walked through initial implementations of these ideas in the go-lightly library and how to use the lamina channel and facilities to do Go-style CSP concurrent programming.

If the lamina library, which is 2+ years old now (thus reasonably mature and stable) and under active development, can be used for this, why am I proposing a new library? Well, I might have built one anyway just to get familiar with CSP style programming and improve my Clojure skills, but ultimately, I do think there is a good justification for us to consider a new library focused just around this.

The lamina library is fundamentally focused an asynchronous event-driven programming. Since dealing with callbacks can get messy and is hard to structure in a functional way, the core construct and central metaphor of Zach Tellman's approach to async programming is a channel that is used for putting and pulling events. Since a key focus of async event-driven programming is to avoid blocking, there are very few blocking operations in the lamina library. One case where it is provided is that you can choose to wait on pulling a value out of a channel. This is the part we've seen in use to emulate Go channels.

However, the primary use case for lamina channels is an event queue, which means you want it to be unbounded and non-blocking, especially for events being put onto the queue. Thus, lamina uses a Java ConcurrentLinkedQueue underneath.

Go channels, however, come in two flavors: bounded, blocking queues of size 0 (every put has to have a corresponding take) and bounded, asynchronous queues of a size you specify in the make function. The lamina channel really maps to neither, though in some scenarios it can be used for async queues or blocking queues where you need to block on read (but not write).

As I discussed in the first blog entry, Java's util.concurrent library already provides these Go channel types and even more variations on them. The bounded, blocking queue maps to a SynchronousQueue or a TransferQueue (if you only use the transfer and take methods). The bounded, asynchronous queue maps to LinkedBlockingQueue.

Thus, go-lightly proposes to wrap these Java concurrent queues, specifically facilitating a Go-style CSP concurrency semantics.

`/* ---[ Why bounded blocking queues? ]--- */`

So what is a use case where I really need a bounded blocking queue?

First from here on out I will use the term channel or Go channel to refer to a blocking queue of size 0 and buffered channel to refer to a non-blocking queue of arbitrary size: this is the Go terminology.

Rephrasing the question - when would I need a channel and not a buffered channel? With a buffered channel you "fire-and-forget" and let some other thread pluck it off the buffered channel when it's ready.

A channel, on the other hand, is a synchronization mechanism between threads/routines similar to a CountDownLatch, CyclicBarrier or join of a fork-join model, except you not only synchronize threads, but pass messages between them, so it is synchronizing communication tool.

`/* ---[ Beautiful concurrency ]--- */`

The golang site provides an example a concurrent prime sieve algorithm that, as implemented, requires blocking channels. If you were to use a lamina channel or buffered channel you'd potentially have some threads running way ahead of the others unnecessarily consuming memory and wasting CPU cycles.

This is the "first truly beautiful concurrent program" Pike referred to in his Google IO 2012 talk.

Let's look at the Go implementation from the Golang website first:

The Generate and Filter functions absolutely need to synchronize - when they push data onto a channel, they need to wait until the consumer (a chained filter function or main) is ready to pull it off.

Here is a Clojure version using go-lightly:

Happily, the implementations are pretty much the same line-for-line.

`/* ---[ Next ]--- */`

In the next blog entry, I will contrast the Go CSP concurrency model to the Clojure concurrency model and add some functions to allow channels to interoperate with the Clojure seq abstraction.

`/* ---[ Resources ]--- */`

Both of the prime sieve examples are available in the GitHub go-lightly repo:

Blog entries in this series:

Go Concurrency Constructs in Clojure, part 2: select

2013-01-05T19:10:00.000-05:00

"The select statement is a key part of why concurrency is built into Go as features of the language, rather than just a library. It's hard to do control structures that depend on libraries."

-Rob Pike, 2012 Google IO Conference

In the first blog entry of this series, I introduced some simple examples of the CSP (Communicating Sequential Processes) model of concurrency that have been built into the Go language. I'm blogging my investigation of how we might leverage this style of concurrent programming in Clojure.

The key benefit of the CSP approach is that you can use normal sequential semantics and control flows that are easy to reason about while building concurrent flows and processes. Channels are used to communicate and synchronize processes to bring control and deterministic behavior to an otherwise non-deterministic concurrent environment. We can do this without locks or other low-level constructs that are hard to reason about. The CSP constructs are built on top of those low-level primitives (or at least compare-and-swap mechanisms), but they are hidden from view from the application developer.

`/* ---[ A construct to wait for the next available channel ]--- */`

Go comes with a ready-made control structure called select. It provides a shorthand way to specify how to deal with multiple channels, as well as allow for timeouts and non-blocking behavior (via a "default" clause). It looks like a switch/case statement in C-based languages, but is different in that all paths involving a channel are evaluated, rather than just picking the first one that is ready.

Let's look at an example (adapted from Pike's 2012 Google IO talk):


 select {
 case v1 := <-c1:
     fmt.Printf("received %v from c1\n", v1)
 case v2 := <-c2:
     fmt.Printf("received %v from c2\n", v2)
 }

This select wraps two channels. It evaluates both channels and there are four possible scenarios:

c1 is ready to give a message, but c2 is not. The message from c1 is read into the variable v1 and the code clause for that first case is executed.
c2 is ready to give a message, but c1 is not. v2 then is assigned to the value read from c2 and its code clause is executed.
Both c1 and c2 are ready to give a message. One of them is randomly chosen to execute and the other does not execute. Note this means that you cannot depend on the order your clauses will be executed in.
Neither c1 nor c2 are ready to give a message. The select will block until the first one is ready, at which point it will be read from the channel and execute the corresponding code clause.

Select statements can also have a default to make it non-blocking:


 select {
 case v1 := <-c1:
     fmt.Printf("received %v from c1\n", v1)
 case v2 := <-c2:
     fmt.Printf("received %v from c2\n", v2)
 default:
     fmt.Println("no channel was ready to communicate")
 }

If neither channel is ready, the select executes the default clause and returns immediately.

Finally, select statements can also have a timeout:


 for {
     select {
     case v1 := <-c1:
         fmt.Printf("received %v from c1\n", v1)
     case v2 := <-c2:
         fmt.Printf("received %v from c2\n", v2)
     case <-time.After(1 * time.Second):
         fmt.Println("You're too slow!")
     }
 }

In this example, the select is wrapped in an infinite loop, which will stop the first time any one round takes longer than 1 second to read from either channel. But we can also set a timeout on the loop as a whole:


 timeout := time.After(1 * time.Second)
 for {
     select {
     case v1 := <-c1:
         fmt.Printf("received %v from c1\n", v1)
     case v2 := <-c2:
         fmt.Printf("received %v from c2\n", v2)
     case <-timeout:
         fmt.Println("Time's up!")
     }
 }

Now the loop will always cease after 1 second and in that one second it will read as many times as possible from either channel.

Here is an example of using selects with timeouts in a Go program:

`/* ---[ Implementing select in Clojure ]--- */`

Let's evaluate some of the ways we could emulate or implement the behavior of select in Clojure. While Go does have closures, treats functions as first class entities and deemphasizes object-orientation and inheritance, Go is not a functional language. So how should something like select be done in Clojure? What is the essence of what it accomplishes?

Let's first turn to the Racket language, a Lisp that is a descendant of Scheme. It has Events in the language. I am not deeply knowledgable about Racket, but from the research I've done the analog of select in Racket is sync. The sync function takes one or more "synchronizable events" and blocks until the first one is ready and returns that result:


 (let ((msg (sync evt1 evt2 evt3)))
    ;; do something with the first message result here
   )

As with Go's select, Racket's sync will choose to read from one of the events at random if more than one is ready.

Notice that the Racket version does not take a code block to execute for each event. In functional programming, it is preferable and more natural to return a value from an operation. Go's select is truly a control structure (in the C language sense of the word) - it does not return a value.

So let's implement Racket's sync in Clojure.

In order to implement select/sync in Clojure using the Java Queue libraries we used in the previous blog entry, we will need to able to check whether more than one of the queues has a value ready without blocking. That is why I selected the TransferQueue over the SynchronousQueue.

Next we have to decide what to name it. sync is already taken in clojure.core and has a specific and important enough meaning in Clojure that is best avoided. select is also used -- it is a function clojure.set namespace -- but since it is not in clojure.core, I'll go with it in my go-lightly namespace.

My initial implementation to get started is a simple one - it will check all the channels to see if any are ready and if not, do a short sleep. To do the check, it uses the .peek method of TransferQueue, since it neither blocks nor throws an exception if the queue is empty.

You pass select one or more channels and it immediately filters for those that already have a ready value. If there are any it picks one of those ready ones at random, dequeues the value and returns it. Only the one value is dequeued, so the other channels remain untouched.

If none are ready, it will "probe" the channels between short sleeps to get the first value it can find. This is an unsophisticated implementation, but it works for simple uses. (I'll provide a usage example after we add timeouts and "defaults" next.)

`/* ---[ Adding "default" and timeouts to Clojure select ]--- */`

The default clause in Go's select statement is a short circuit to not block if no channels are ready. Since Clojure's select is not a control structure, the most natural choice is to add another function, which I've called select-nowait.

As before it takes one or more channels (as a varargs list) and an optional sentinel keyword value. If no channels are ready, select-wait will return the sentinel keyword (if provided) or nil.


 user=> (select-nowait ch1 ch2 ch3 :bupkis)
 :bupkis

For timeouts, the Go example above shows that they come in two flavors: a timeout per round (timer starts each time you call select) or a timeout for a "conversation" that could involve multiple rounds of selecting the next value.

Let's take these one a time, as they will have different solutions in my implementation. For a timeout-per-select call, I've created a select-timeout function that takes a timeout (in milliseconds) as the first argument.


 ;; returns a value from one of the channels if it can
 ;; be read within 1 sec.  Otherwise it times out and
 ;; returns :go-lightly/timeout
 user=> (select-timeout 1000 ch1 ch2 ch3)
 :go-lightly/timeout

For an overall timeout, I provide two options.

First, following the pattern in Alexey Kachayev's example of doing this with the lamina library - we build a channel that will have a timeout sentinel value once the timer goes off. Use the go-lightly timeout-channel factory fn and then pass that timeout channel to the select function.

In order for the timeout-channel to be effective, you have to be continuously calling select until you hit the timeout. Also the current implementation of select doesn't preferentially look at the timeout channel first and select that over other channels if it is ready, but I'll be fixing that in later in the series.

You can also pass a timeout-channel into select-timeout if you want both types of timers running.

Second, I've added a general purpose with-timeout macro to the go-lightly.core library that wraps any arbitrary set of statements in a timeout.

Go here if you want to see the full implementation of these timeout methods.

All of these options are shown in this Clojure go-lightly example implementation of the Go "boring" select example:

Note: the channels here are no longer raw LinkedTransferQueues - they are go-lightly GoChannel type entities. See the go-lightly wiki for a detailed explanation.

`/* ---[ Emulating Go's select in lamina ]--- */`

lamina's analog to select is its join operation, which basically routes the output of multiple lamina channels into a single channel:


 user=> (use 'lamina.core)
 nil
 user=> (def ch1 (channel))
 #'user/ch1
 user=> (def ch2 (channel))
 #'user/ch2
 user=> (def ch3 (channel))
 #'user/ch3
 user=> (join ch1 ch3)
 true
 user=> (join ch2 ch3)
 true
 user=> [ch1 ch2 ch3]
 [<== [ … ] <== [ … ] <== […]]
 user=> (enqueue ch1 :one)
 :lamina/enqueued
 user=> (enqueue ch2 :two :three)
 :lamina/enqueued
 user=> [ch1 ch2 ch3]
 [<== [ … ] <== [ … ] <== [:one :two :three …]]

You can then read from the downstream channel:


 user=> @(read-channel ch3)
 :one
 user=> @(read-channel ch3)
 :two

To create a whole-conversation timeout, you can call the periodically fn that invokes your fn every 'period' milliseconds and returns the value. This was the inspiration for go-lightly's timeout-channel.

To create a per-round timeout, you can use either the read-channel* macro or the channel->lazy-seq function, both of which take a per-read timeout.

This program that demonstrates these options (and a few others) using lamina (with some helper functions from go-lightly):

`/* ---[ Implementing Go's select in Clojure ]--- */`

So we can provide Racket's sync functionality in Clojure either by implementing it ourselves or using lamina, but it is not as powerful as Go's select. What if you need to know not only the next value on the channels, but which channel it was read from? In that case, providing a function to execute per channel is a nice model. But to be more or less functional, the select statement still needs to return a value.

Let's hit an important point here: as I quoted at the start of this post, Piked has said that "it's hard to do control structures that depend on libraries". This is true in some languages, but not all - especially not in Lisps. You can do control structures with macros or sometimes just with functions and this is one of the key advantages of Lisp languages.

In the go-lightly library, I've implemented this as selectf and it turns out I didn't need a macro.

Here's an example of using go-lightly's selectf from the sleeping-barbers example app in the go-lightly-examples project:


  (defn barber-shop [clients-ch]
    (let [barber-ch (channel)]
      (loop [shop-state {:free-barbers (init-barber-vector)
                         :waiting-clients []}]
        (-> (selectf
             clients-ch #(client-walked-in % barber-ch shop-state)
             barber-ch  #(barber-available % barber-ch shop-state))
            (recur)))))

selectf takes pairs of arguments where the first member of the pair is a channel (or the :default keyword) and the second member of the pair is a function that takes one argument - the value read from that channel. (A function paired with :default takes no arguments.)

The return value of selectf is whatever the fn you provide returns. In the example above, I pass this value to the recur form so that I can reset the shop-state local var without having to use an atom to manage state changes.

And here is the implementation of selectf:


 (defn selectf
   "Control structure variable arity fn. Must be an even number of arguments where
   the first is either a GoChannel to read from or the keyword :default. The second
   arg is a function to call if the channel is read from.  Handler fns paired with
   channels should accept one argument - the value read from the channel.  The
   handler function paired with :default takes no args.  If no :default clause is
   provided, it blocks until a value is read from a channel (which could include
   a TimeoutChannel). Returns the value returned by the handler fn."
   [& args]
   (binding [*choose-fn* choose-tuple]
     (let [chfnmap (apply array-map args)
           [keywords chans] (partition-bifurcate
                             keyword?
                             (reduce #(conj % %2) [] (keys chfnmap)))
           choice (doselect chans nil (first keywords))]

       ;; invoke the associated fn
       (if choice
         ((chfnmap (nth choice 0)) (nth choice 1))
         ((chfnmap (first keywords)))))))

I won't give a full explanation of this implementation and all its helper functions, but notice this piece:


  (let [chfnmap (apply array-map args)
        ...
        ])

That's all that is required to turn the argument pairs into a control structure. It creates a map of channels to fns and once you have a map in Clojure, programming is straightforward.

`/* ---[ Next ]--- */`

In the next entry we'll implement some more interesting CSP examples in Go and Clojure and think about the pros and cons of using lamina vs. go-lightly.

`/* ---[ Resources ]--- */`

All of the code in this blog series, including the Go and lamina example code, is in the go-lightly project on GitHub.

Lamina library: https://github.com/ztellman/lamina

The Go examples are from Rob Pike's talk Google I/O 2012 - Go Concurrency Patterns

Alexey Kachayev wrote down the Go code that Pike used in the 2012 Google IO presentation, which otherwise doesn't seem to have been made available. Alexey published them as gists. They won't compile out of the box, so I've been modifying them, but wanted to link to his gists: https://gist.github.com/3124594.

Alexey also then brainstormed on ways to implement these examples in Clojure using the lamina library. Those gists are at: https://gist.github.com/3146759

Links to this blog series:

Go Concurrency Constructs in Clojure, part 1

2013-01-01T22:59:00.000-05:00

If you look at the programming languages of today, you'd probably get this idea that the world is object-oriented, but it's not. It's actually parallel.... There's all these things that are happening simultaneously in the world and yet the computing tools we have are really not very good at expressing that kind of world view. And that seems like a failing.

--Rob Pike, 2012 Heroku Waza conference

`/* ---[ Go ]--- */`

The Go language recently turned 3 years old, so it is about 2 years Clojure's junior. I have only started investigating Go and one of the things that has captured my attention are its primitives for concurrency. Rob Pike, the leading spokesman for Go and one of its co-creators, has done a number of interesting talks on Go concurrency patterns and how it is built into the language. If you haven't watched them, here are two that I recommend:

I'll be referring to the first one through this post.

Pike talks a lot about how Go routines and channels are first class entities in the language, with simple syntax and keywords baked in. Go routines are akin to threads that you kick off to run in "the background". Pike's analogy is to think of them like launching a process on the command line with the ampersand. Staying with the Unix analogy, if you launch a process in the background and then need to communicate with it, what do you do? In Unix/Linux you have a number of options, such as a socket, a pipe or some other form of IPC or to use shared memory.

The Go language creators have chosen the "channel" as a core "inter-routine" communication mechanism. Quoting Pike: "The Go model is not to communicate by sharing memory, but to share memory by communicating".

What is called "inter-routine" here would be called inter-process in Erlang or traditional C/Unix programming or "inter-thread" communication in languages like Java. But for Go it is more appropriate to say "inter-routine". Pike emphasizes that go routines are lighter weight than threads. Go routines can be shared or multiplexed onto multiple running threads over their lifetime, avoiding thread starvation issues. They have their own stack frame, but I believe it is managed on the heap (need to research this more).

The roots of the Go routine and Go channel start in Tony Hoare's Communicating Sequential Processes paper (now book). CSP addresses concurrency interaction patterns - how separate processes (in the Unix or Erlang sense), threads or routines communicate and coordinate with each other via message passing. We want constructs that reduce the complexity of inter-process/inter-thread communication using primitives that are easy to use and reason about. This means not having to be a deep expert in a system's memory model in order to do concurrent programming. Instead, it hides semaphores, mutexes, barriers and other low level concurrency constructs in higher-level abstractions.

`/* ---[ Go-style CSP in Clojure? ]--- */`

My primary interest here is in what support for Go-like CSP patterns exist, or can be made to exist, in Clojure. Clojure, after all, promises to bring sanity to concurrent programming by means of efficient immutable data structures and software transaction memory for mutable state.

Go channels come in two forms: synchronous blocking channels that cannot hold multiple entries and non-synchronous buffered channels that can have multiple entries. I'll explain the nuances of synchronous here in a moment.

The Go spec says: A channel provides a mechanism for two concurrently executing functions to synchronize execution and communicate by passing a value of a specified element type.

Does Clojure have this? There are two things in Clojure or Java we can potentially use to emulate Go channels:

Java concurrency queues. In particular, SynchronousQueue, BlockingQueue and TransferQueue.
Zach Tellman's Clojure lamina library, whose primary focus is asynchronous event-based programming.

In this blog series, I will introduce the go-lightly Clojure library I built to have Go concurrency constructs. The GitHub repo for it also has many examples, some of which use Java concurrent queues directly, some of which use the lamina library and many of which use the go-lightly library.

Note: I also ran across the Java CSP (JCSP) project, which I haven't investigated yet, but might be something to build a Clojure library around.

`/* ---[ "boring" basics ]--- */`

In his initial examples to show how Go channels work, Pike uses a background process that is, as he calls it, "boring" - it just prints its name or the word "boring" at random intervals. After listening for a while, the "main" process gets tired and wants to leave or end the conversation.

Here are the first two examples in Go from his 2012 Google IO talk, modified slightly to work from one main function:

Here is the output from each option (single vs. multiple):

$ ./boring-generators single
You say: "boring! 0"
You say: "boring! 1"
You say: "boring! 2"
You say: "boring! 3"
You say: "boring! 4"
You're boring: I'm leaving.

$ ./boring-generators multiples
Joe 0
Ann 0
Joe 1
Ann 1
Joe 2
Ann 2
Joe 3
Ann 3
Joe 4
Ann 4
Joe 5
Ann 5
Joe 6
Ann 6
Joe 7
Ann 7
Joe 8
Ann 8
Joe 9
Ann 9
You're boring: I'm leaving.

See appendix 1 if you would like to compile and run this on your system. I'll frequently list the outputs during this blog series, but it helps to see the latency between print statements to get a better feel for how these examples work.

See appendix 2 for links my GitHub project with this code and to other gists with related code examples.

Pike calls this the generator pattern because the invoked "boring" function creates a (synchronous) channel, then creates a closure that references that channel, launches that closure as a go routine and immediately returns the channel to the calling function.

The go routine sends messages to the channel with the <- operator.
For example, c <- "hello kitty" sends the feline greeting into the channel.

The other function receives messages from the channel with the same operator, by putting the channel on the right side of the operator: <- c.

These are blocking operations with a synchronous channel. If the sender sends a message to the channel when there is no receiver waiting, it will block until a receiver comes and grabs its message off the channel. And, conversely, a receiver will block waiting for a sender if there isn't already one pushing to the channel.

In the multipleGenerators version, two go routines are created, each with its own channel. The receiving "main" function now receives from each channel consecutively in a defined order, which is why you always get Joe's output first, then Ann's.

One final thing to note: we don't explicitly do any work to shutdown the go routines. Go routines will automatically stop operation when the main function exits. Thus, go routines are like daemon threads in the JVM (well, except for the "thread" part ...)

`/* ---[ "boring" in Clojure ]--- */`

Java has two analogs of a Go synchronous channel: the SynchronousQueue and the TransferQueue.

The SynchronousQueue has the same specs I mentioned above: it allows one sender at a time that will block waiting for a receiver and one receiver at a time that will block waiting for a sender. The "queue", despite its name, has no internal storage. It's size method always returns 0 and its peek method always returns null. If you use just the put method to send and the take function to receive, it behaves like the Go channel in the first example.

The TransferQueue is a more liberal - you can use it like a SynchronousQueue or more like a BlockingQueue where you can put multiple messages on the queue and only block if you try to put onto a full queue (if it is bounded) or take from an empty queue. To use it like a SynchronousQueue, use transfer and take. The API is rich enough that you can find other uses also, such as non-blocking offer and poll methods to send and receive without blocking.

In the go-lightly code base examples, I use both, but here I'm going to stick with the TransferQueue for reasons that will become clear when we get to the Go select statement in the next blog entry.

OK, so the channel will be a Java TransferQueue. How do we implement a go routine in Clojure?

Clojure's future function is similar to a go routine in that it launches a new "routine" (Java thread) with its own stack and flow of execution. The interface is similar too: in Go you give an invoked function to the go statement. You do the same to the Clojure future function.

// launch a go routine in Go
go func() { fmt.Println ("hello kitty") }()

;; launch a thread that returns a future in Clojure
(future (println "sayonara"))

But Clojure futures differ from a go routine in at least two important ways:

A Clojure future launches a new thread (or rather obtains a thread from the Clojure thread agent pool), whereas, as mentioned before, a Go routine is lighter weight than a Java thread. I don't know of any Clojure or Java facilities for creating something equally lightweight, so I will use threads.
Clojure futures are not daemon threads and I don't think there is any way to tell them to be daemon threads. Thus, you cannot give a future an infinite loop and expect your program to close down. If you launch such a future, you would have to either:
1. Call (future-cancel myfut) on the future, which means you have to retain a reference to the future. You can't just fork-and-forget.
2. Set a flag, either through shared memory or a message via a channel, that the future will check periodically to see if it should stop. However, if this future is stuck in a blocking call trying to read from or push onto the blocking queue, this approach won't work.

In addition, you also need to call (shutdown-agents) at the end of your Clojure program to shut down the agent/future Thread pool.

Thus, Clojure's future requires more bookkeeping than Go's go.

But Clojure has macros to help, so I've written two macros and some helper functions to lessen the bookkeeping.

The first macro is simply called go and has accompanying stop and shutdown functions. The second macro, go&, is meant for fork-and-forget use so I give it the Unix ampersand in the name. It has the least amount of ceremony, but cannot be controlled through the Java Executor framework like the future and agents threads can be.

And here's my implementation of the boring-generator using it. To match the Go terminology, Finally, I also define the function channel to simply return a LinkedTransferQueue.

While the go& macro is easier to use and more like the actual Go go-routine launcher, it has one downside: it is not REPL-friendly if your go routine doesn't shutdown naturally. In this case it will block on the (.transfer ch) call on line 39 and hang around in memory. And each time you invoke the function in the REPL it creates another daemon, draining JVM resources and memory over time.

Since Go language development is not REPL-based they can get away with it. If you create a main function:

(defn -main [& args]
  (thornydev.go-lightly.boring.generator-tq/multiple-generator&))

and run it from the command line:

lein run

it will behave just like the Go version. You'll get the same output as the Go example above and it will shutdown gracefully instead of hanging.

But to be REPL friendly, I'll typically use go and stop rather than go&.

Another way to handle this in Go would be to close the channel and have the go-routine check whether the channel is closed before continuing.

Unfortunately, this is not possible with the Java concurrency classes mentioned above - they are not Closeable resources.

However, the go-lightly library and the Clojure lamina libraries both provide the notion of a closeable channel, which leads to our next implementation.

`/* ---[ A "boring" lamina implementation ]--- */`

The lamina library created by Zach Tellman provides constructs for handling evented asynchronous programming. It defines a channel whose purpose is to be a event-driven data structure that represents a stream of messages or events. A lamina channel has basic send (enqueue) and receive (receive or read-channel) functionality, but also is composable with other channels using classic functional programming constructs, such as map, reduce and filter. It also provides very useful fork/join operators.

For the "boring" example, we only need the ability to enqueue, read, and close the channel and check whether the channel is closed.

Since the "boring" go routine stops when the channel is closed, this use of the go& macro is REPL-friendly and works as expected when run standalone from the command line.

`/* ---[ More macro goodness ]--- */`

Finally, let's add one more macro to clean up that last example. If you've done much Clojure or Lisp programming, you know that a common macro pattern is a with-xxx binding macro that cleans up resources for you.

Clojure in fact has a with-open binding macro that will call "close" on all the things specified in the bindings. So that should work here, right? Well, the devil being in the details, it doesn't. with-open actually calls the Java Closeable interface method .close, not "close". And lamina channels do not implement Java's Closeable interface - they do not have a .close. So I grabbed the source for clojure.core/with-open and made my own (and put it in the go-lightly.util namespace):

(defmacro with-channel-open
  "bindings => [name init ...]

  Evaluates body in a try expression with names bound to the values
  of the inits, and a finally clause that calls (close name) on each
  name in reverse order."
  [bindings & body]
  (assert (vector? bindings) "a vector for its binding")
  (assert (even? (count bindings)) "an even number of forms in binding vector")
  (cond
    (= (count bindings) 0) `(do ~@body)
    (symbol? (bindings 0)) `(let ~(subvec bindings 0 2)
                              (try
                                (with-channel-open ~(subvec bindings 2) ~@body)
                                (finally
                                  (~'close ~(bindings 0)))))
    :else (throw (IllegalArgumentException.
                   "with-open only allows Symbols in bindings"))))

So with that in place we can revise the "boring" lamina implementation:

`/* ---[ Looking ahead ]--- */`

In the next installment we'll dig into the Go select statement, and further evaluate how to use the Java concurrency queues and the lamina libraries as Go channels.

`/* ---[ Appendix 1: Compile and run Go examples ]--- */`

If you don't have Go installed, see the golang install guide. On Ubuntu, it is as simple as:

sudo apt-get install golang-go

Next, decide where you want your go projects to live (mine are in $HOME/lang/go/projects). cd to that directory and do:

$ export GOPATH=$HOME/lang/go/projects  # change to yours
$ mkdir src
$ mkdir src/boring-generators  # your Go project name here
$ cd boring-generators
$ emacs boring-generators.go   # or whatever editor you like
$ go build   # this invokes the compiler

You will then have a boring-generators executable that you can run:

$ ./boring-generators

(Note: I didn't find that setting GOPATH was really necessary, but it is in the instructions, so YMMV).

`/* ---[ Appendix 2: Resources ]--- */`

Alexey Kachayev wrote down the Go code that Pike used in the 2012 Google IO presentation, which doesn't seem to have been made available. Alexey published them as gists. They won't compile out of the box, so I've been modifying them, but wanted to link to his gists: https://gist.github.com/3124594.

Alexey also then brainstormed on ways to implement these examples in Clojure using the lamina library. Those gists are at: https://gist.github.com/3146759

My code for working through these ideas are in my go-lightly project on GitHub.

Links to this blog series:

My CS/Programming Top 10 for 2012

2012-12-26T20:02:00.000-05:00

As many will do and I did last year, I looked through my notes, projects, tweets, blog entries and personal wiki to assemble the highlights of my year on all topics computer science and software engineering.

These are the top 10 "things" that added the most to my knowledge, impressed me as excellent tools, and added to the joy of being a software developer. Here's my list in no particular order:

Security Now podcast
Chrome Dev Tools
Emacs 24
Groovy
Clojure
z (rupa/z)
Xubuntu
Coursera
Fiddler2 and Wireshark
TrueCrypt and other encryption tools
One that should be on the list: Datomic

`/* ---[ 1. Security Now podcast ]--- */`

In March I did a short blog entry on the podcasts I was listening to or had heard of and wanted to try out. Good technical podcasts are a gold mine of information that you can use to fill the interstices of your day - while commuting, cleaning the kitchen or taking a walk.

The podcast that had the biggest impact on me in 2012 is Security Now, done by Steve Gibson and Leo Laporte, one of the flagship podcasts of the twit.tv network.

To date, they have done 384 episodes, starting in August 2005 and the vast majority are still relevant and worth listening to. You can download them from Steve's GRC Security Now website.

The focus, of course, is on computer security, mostly for the individual user, not at the corporate level. The podcast also covers networking theory and practice in great detail, since the network is mainly how malware spreads and is a vast attack surface for it. For example, this year Steve did a deep dive into SPDY, the networking protocol developed by Google to speed up the web by reducing page load time by overcoming the TCP slow start problem.

And there's plenty of focus on cryptography and security. A highlight of the year was Steve's episode on Elliptic Curve Crypto, a crypto technology that will likely be used more heavily in days to come.

In addition, you learn a lot about how hard drives work, since Steve wrote SpinRite, a disk maintenance and recovery utility, which I use for maintenance on my spinning disks.

Also, starting with episode #233 ("Let's Design a Computer, part 1"), Steve does a 8+ episode series on the basics of how computers work, including what machine language is, how assembly language works, the role of stacks and registers, hardware interrupts and RISC vs. CISC architectures. You can learn (or be refreshed on) in surprising levels of detail for an audio-only medium. Steve is very good at explaining this stuff.

This year, while keeping up with the weekly new broadcasts, I went back and started at episode 1. At this point I've listened to about half of the episodes, so this will continue to be my mainstay into 2013.

`/* ---[ 2. Chrome Dev Tools ]--- */`

This year I got back into JavaScript programming. I remember the horrible days of debugging by alert statements, which contributed to the general consensus that JavaScript was a toy language and a piece of ill-thought out crap. Despite its warts, a result of its ridiculously over-short allowed design period, Brendan Eich created a rather fascinating and powerful programming language. Even though we like to complain about its issues, I agree with Crockford that given the conditions under which it was developed, we got better than we deserved and Mr. Eich is to thank for that.

So I was pleased to discover the awesome Chrome Dev Tools for browser based JavaScript development. JavaScript debugging can actually be pleasurable. Some resources to get you started if you aren't using it:

`/* ---[ 3. Emacs24 ]--- */`

Emacs is alive and well. In fact it is thriving more than ever. I've been a long time user of emacs and I use it for everything except Java (which really needs a full IDE).

Emacs 24, released this year, is a great text editor. I use it on both Linux and Windows, the exact same set up on both.

Most notably Emacs has package repositories now. Three, in fact, that I know of:
* ELPA, which is maintained/sponsored by GNU and has only the core emacs packages that adhere to the copyleft licensing model of the Free Software Foundation.
* Marmalade
* MELPA, which is where most of the bleeding edge work goes.

I used ELPA and MELPA by default, but I sometimes switch over to Marmalade if it has something not on the others. Generally MELPA and Marmalade seem to have the same packages, though MELPA often has the most recent. To make things confusing, MELPA moved to a date-based versioning system, like "20121206.1504", rather than the more traditional major.minor versioning system, such as "0.19".

There is a still a big learning curve to emacs and some things are still pretty esoteric (I still have trouble getting themes to work), but when people ask me why I use emacs I say: "if programming is your main career and hobby wouldn't you want to use the most powerful tool available? It's worth the few months of learning to enjoy the benefits for the rest of your life." But isn't emacs "old"??? (as if that's a bad thing) Seriously, when I use emacs I feel like I'm tapping into some of the collective wisdom of our programming culture from the last 30 years.

And no disrespect to vim. I like vim too. Pick one of those two and learn it. Stop using Notepad++ or worse.

A few emacs highlights from my year:

I love the nrepl package for Clojure. Now I can use those fancy keystrokes to auto-evaluate Clojure s-expressions. With the ac-nrepl package, it has code completion and will show you the argument signature for functions in the minibuffer! Some IDE-like goodness right there.
paredit. When I talk to people about Clojure (or Lisp in general), I sometimes get the story of how horrible it was balancing parentheses at 3 in the morning the day their CS class assignment was due. I am happy to announce to anyone that doesn't know: that problem is solved. It's name is paredit. Here is the slide deck I originally learned it from.
Learn to use emacs macros in two ways:
- named macros you'll use a lot and save in your init.el (or macros.el if you want a separate file).
- temp unnamed macros to automate some task you need to do some one-time repetitive thing, say, 10 times in a file. This EmacsRocks video shows a great example of that.

`/* ---[ 4. Groovy ]--- */`

When I was first learning Ruby, many years ago, I remember experiencing Matz' principle of least surprise. Once you learned the basic gist of Ruby and its blocks and classes, you could often just guess how to do something or what a method would be called and it would work. It was a very satisfying experience.

This year I joined a new company and they have largely adopted Groovy as their primary scripting language. As I jumped in to learn it, I had that deja vu feeling of learning Ruby, this time wrappering the Java language we know and love.

For example, I started to write a Groovy script that would have to recursively traverse a directory structure, and I remembered the pain of doing this in Java with its FilenameFilters and other APIs you had to learn to get anything done. I said to myself "I hope Groovy has made this easier". Holy smokes, they wrapped java.io.File to have an eachFileRecurse method that takes a closure:

new File('.').eachFileRecurse {
  if (it.name =~ /.*\.txt/) println it;
}

There is also an eachDirRecurse and variations where you can pass in a file type filter.

The more I learn about Groovy the more I like it. In fact, the "groovy-JDK" is one of my favorite things: http://groovy.codehaus.org/groovy-jdk/. The Groovy creators and contributors have wrapped a large number of the Java classes, using the Groovy metaclass concept, and given them additional useful methods. Such as:

String now has an eachLine method and versions of replaceAll and replaceFirst that take a closure, allowing arbitrarily complex logic to be executed to determine the replacement string.
Map now has an any method that takes a predicate closure to see if at least one entry passes the predicate test. It also now has map and reduce, though the authors unfortunately followed Ruby in calling them collect and inject respectively.
And thank the gods, they wrappered the horrible java.util.Date class and made it more useful.

It provides many functional programming constructs, such as closures (the lambdas of Groovy), immutable data structures, higher order functions and very importantly: regex, list and map literals, akin to JavaScript or Closure literals (though the map literal syntax is different in Groovy).

With GStrings you get string interpolation and multi-line strings. And Groovy gives you simpler syntax for accessing getters and setters - you grab them like properties.

In short when you are hacking out large swaths of boilerplate in Java, using tedious syntax to do stuff with Maps, Lists, Regular Expressions and a variety of other things, you constantly think to yourself, "man I wish I could be doing in this in Groovy". Groovy makes programming a pleasure.

I'm still learning it and look forward to using it for years to come.

`/* ---[ 5. Clojure ]--- */`

And speaking of bringing the joy back to programming, Clojure is a combination of elegance, joy and ... wait a minute, how do I do this in Clojure? I ran across someone who described himself as a "perennial Clojure beginner". I can identify with that. Since I don't come from a Lisp or functional programming background, the last year learning Clojure has been like learning to ride a bicycle again. Except this bicycle is tricked out and has gears, knobs and restrictions that are different from the other bicycles.

I've started proselytizing co-workers about Clojure. I get the "why Clojure?" question a lot, so here is my version:

Combines the best of Lisp, such as macros, and Java/JVM, such as its world class garbage collector (which a language built on immutable data structures needs)
Brilliant design for immutable data structures that is now being adopted by other languages (Scala for one)
Functional programming model, but with practical bent (not Haskell, but more pure than Common Lisp)
STM: software transaction memory -- brilliant solution to shared mutable state
Designed for concurrency (in a couple of different ways)
A fast dynamic language: faster than ruby and python, comparable to Java in many areas and can drop into Java easily when performance is the most important thing
ClojureScript: bring the power of Clojure macro writing, namespaces and better syntax to doing your JavaScript work
Data centric (like lisps), but even better by being abstraction centric
Clean design for solving the “expression problem”: http://www.ibm.com/developerworks/java/library/j-clojure-protocols/?ca=drs
Separation of concerns – an overall philosophy to tease things apart into simple (non-completed) pieces: http://www.infoq.com/presentations/Simple-Made-Easy
- Example: polymorphism is not tied to inheritance
Simple and elegant syntax. For example, I find Scala to be powerful but overwhelming and confusing in its approach to syntax and expression
Community:
- Small focused libraries (separation of concerns, non-complected)
- Datomic => one of the greatest examples of separation of concerns there is
- Core.logic => modern logic programming easily integrated into your program
Finally, an argument ad hominem: Rich Hickey. You need to watch the series of presentations he’s made over the past 5 years (perhaps one every week as Bodil suggests). Unquestionably the most impactful thinker in CS I’ve ever encountered. Even if you end up not agreeing with all of his views, you will learn a lot and think about things in a different way, possibly changing the way you think about our craft.

Finally, as a coda to this paean to Clojure: The O'Reilly Clojure Programming book came out this year. Chas Emerick, Brian Carper and Christophe Grand have written a fantastic book. It is a book you will learn from and come back to for its insights, examples and reference material for many years. Definitely belongs on my top 10 for 2012 list.

`/* ---[ 6. rupa/z ]--- */`

The z shell script (not zsh) is one of my favorite discoveries of 2012. To give it more press, I gave it its own blog entry, which you read here: http://thornydev.blogspot.com/2012/12/an-irritation-no-longer.html.

Here's the short summary: z is a 200-line shell script compatible with bash and z-shell that is a clever LRU-type cache of your directory visitations - the cache weighting is based on both frequency and recentness, which the author dubs "frecency". As you navigate around to different directories, it keeps track of where you've been, how often you've been there and how recently.

To navigate somewhere you've been, pass a part of the path to the z command and it will take you to the highest weighted directory in your cache.

`/* ---[ 7. Xubuntu ]--- */`

I'm a Linux guy. I was on the Ubuntu bandwagon for many years. I played with Linux Mint a little. I've got Fedora and CentOS running in VirtualBox VMs. But when Unity came out on Ubuntu, I struggled to get used to its desktop model. It does not fit how I work. I tried it for a month and was considering what to switch to when I saw a Slashdot article that Linus Torvalds was adopting XFCE to get away from the strangeness of many modern Linux desktop environments.

So that prompted me to try Xubuntu, based on XFCE and also Lubuntu, based on the LXDE desktop. Lubuntu was a little too minimal for me, but Xubuntu clicked for me right away. I don't like the Dash of Unity and I really really hate the fact that when I try to open a new console shell it brings the current one to the forefront. That is not what I want. I'll use Alt-Tab for that.

Xubuntu behaves as you expect. Click the terminal icon and it opens a new terminal. Xubuntu puts shortcut icons on the bottom, similar to Apple's desktop, but without the annoying enlargement animations. I don't do a lot of customization of my desktop. I just want one that has sane defaults and Xubuntu is that for me.

Ubuntu also stirred up criticism for its integration with Amazon affiliated advertisements, making the Dash a purchasing platform, in the process creating data leaks. Now you don't even have privacy when operating your desktop. The EFF write-up summarizes this nicely.

You can turn it off, but even among Linux users I suspect the "tyranny of the default" will mean that most users are leaking data and thus are at the mercy of Canonical, which people are starting to develop some mistrust for.

Well, Xubuntu doesn't have Dash. So you get the goodness of the Ubuntu ecosystem without the privacy violations. Its defaults are sane.

Try it out.

`/* ---[ 8. Coursera and Online Education ]--- */`

2012 is the year that online education skyrocketed. I've done a few CodeSchool courses and enjoyed those. But now there's Udacity and Coursera and Udemy and edX and probably 10 more I don't know about.

This year I took a Coursera course: Functional Programming Principles in Scala taught by Martin Odersky. It was a great experience. The format is excellent - each week there is about 2 to 3 hours of video lectures and a programming assignment that takes anywhere from 5 to 15 hours to complete. The examples were challenging enough to make the time investment worth it. And I got a nice certificate at the end for having a passing grade.

Uploading assignments was done via a command in Scala's sbt tool; it was easy and seamless. The assignments were graded automatically in about 10 minutes and gave good feedback, allowing you to fix problems and resubmit. The only part of the course I didn't enjoy was using the Scala Eclipse IDE, which is still quite painful compared to Java in Eclipse or Clojure in Emacs.

It's amazing what you can get online for free these days. I've signed up for two more courses and have my eye on a cryptography course there as well.

`/* ---[ 9. Fiddler2 and Wireshark ]--- */`

I spent a good deal of time this year maintaining and enhancing a large "legacy" web app written that uses Ajax calls to communicate with the Java back-end. In many cases, the shortcut to figuring out what is going on is to watch the traffic between the browser and server. Fiddler2 is an invaluable tool for that.

I also tried Wireshark, but the output from Fiddler2 is just as intuitive and easy to follow as can be, since it focuses only on HTTP traffic.

Wireshark is more general purpose. I started learning it this year and want to get better at configuring and customizing it, so I can use it effectively (and efficiently) on Linux, since Fiddler2 is unfortunately a Windows-only product.

`/* ---[ 10. TrueCrypt, GPG and other encryption tools ]--- */`

If you aren't using encryption for your files, hard drives and passwords, make it your new year resolution to learn the tools. Ever since Phil Zimmerman bravely pioneered encryption for the everyman, the suite of tools available to do this have gotten better and better.

I use GPG to encrypt individual files, TrueCrypt to encrypt thumb drives and external drives and Ubuntu's full disk encryption for my laptops. If you have a laptop and thumb drives, they should be encrypted.

A nice file encryption tool on Windows is AxCrypt.

For passwords, I use LastPass, which I believe does it all correctly and securely in a "trust no one" fashion.

Consider using an encrypted "Trust No One" backup and file syncing service. Dropbox is not encrypted, nor is SkyDrive or Google Drive or many other popular services. Do not upload anything to those systems that you wouldn't mind having broadcast on the internet or at least read by employees of those companies.

Steve Gibson (of the Security Now podcast) did a multi-episode analysis of backup and file syncing services from an encryption and "trust no one" perspective. Start with episode #349. There are a number of good solutions. I use SpiderOak on Linux.

If you already know and do this stuff, have a CryptoParty in your area. If you live in my area (Raleigh/Durham, North Carolina, USA), join the DC919 group.

`/* ---[ Datomic: Mine goes to 11 ]--- */`

While I did attend a Datomic training course this year and wrote a fairly long blog post about it, I just haven't made the time to really study it yet. I fully intend to, as I think it is one of the most profound and important things to have come out in 2012. I've queued it up to be on my top 10 list in 2013.

An irritation no longer: command line joy

2012-12-24T15:43:00.000-05:00

`/* ---[ pushd and dirs ]--- */`

I have long been a command line person. I hate having to use the mouse. One thing that is a little cumbersome about the command line is jumping around into various deeply nested directory structures. I've long been a user of pushd, dirs and popd on Unix/Linux/Cygwin consoles. But if you are alternating between 3 or more directories with some regularity, those commands require some care to use correctly.

`/* ---[ Improvement #1: pd ]--- */`

An improvement on that is a small bash function that I found on stackexchange:

function pd() { 
  if [ "$1" ]; then
    pushd "${1/#[0-9]*/+$1}";
  else
    pushd;
  fi > /dev/null
}

which simplifies using pushd. A basic session of use would be:

midpeter444:~/lang/clojure/concurrency-fun$ pd .
midpeter444:~/lang/clojure/concurrency-fun$ dirs
 0  ~/lang/clojure/concurrency-fun
 1  ~/lang/clojure/concurrency-fun
midpeter444:~/lang/clojure/concurrency-fun$ cd ~/.mozilla/
midpeter444:~/.mozilla$ dirs
 0  ~/.mozilla
 1  ~/lang/clojure/concurrency-fun
midpeter444:~/.mozilla$ pd .
midpeter444:~/.mozilla$ cd /tmp
midpeter444:/tmp$ dirs
 0  /tmp
 1  ~/.mozilla
 2  ~/lang/clojure/concurrency-fun
midpeter444:/tmp$ pd 2
midpeter444:~/lang/clojure/concurrency-fun$

pd can take either a dot, which means "remember this directory" or a number which refers to the position on the dirs history list. The random-access list metaphor is easier to work with the the pushd stack-based metaphor.

`/* ---[ Vast Improvement #2: z ]--- */`

But recently I discovered rupa/z or "z" and now I only occasionally use pd anymore. z is the biggest change and improvement to my command line life in years. I really love it. It works with cygwin as well for my time on Windows machines in my day job.

If you use the command line much, go get it now: https://github.com/rupa/z

What is it?

First it is not z-shell (I'm a bash user), which is what I thought initially. (It doesn't help that the main GitHub page for it starts with "ZSH USERS BACKWARD COMPATIBILITY WARNING").

What it is, is a 200-line shell script compatible with bash and z-shell that is basically a clever LRU-type cache of your directory visitations - the cache weighting is based on both frequency and recentness, which the author dubs "frecency". As you navigate around to different directories, it keeps track of where you've been, how often you've been there and how recently.

To see your current cache in ascending order of 'frecency', just type z:

midpeter444:~$ z
0.313808   /home/midpeter444/apps/apache-ant-1.8.4/bin
0.313808   /home/midpeter444/lang/clojure/books/land-of-lisp/wizards-adventure/doc
0.313808   /home/midpeter444/lang/java/projects/mybatis-koans
0.392263   /tmp
0.429067   /home/midpeter444/lang/clojure/source-code
0.627622   /home/midpeter444/lang/lisp
0.784525   /home/midpeter444/.mozilla/firefox
0.83702    /home/midpeter444/lang/clojure/projects/clj-how-tos/clj-sockets
0.86298    /home/midpeter444/media/ebooks
1.62       /home/midpeter444/Dropbox/scripts-and-config-files
2.32335    /home/midpeter444/lang/clojure/sandbox/src/sandbox
5.6486     /home/midpeter444/lang/clojure/books/land-of-lisp/wizards-adventure
8.54205    /home/midpeter444/lang/clojure/books/land-of-lisp/orc-battle
10.7351    /home/midpeter444/lang/clojure/projects/clj-how-tos/clj-sockets
20.9559    /home/midpeter444/lang/clojure/books/land-of-lisp
30.7926    /home/midpeter444/lang/clojure/books/land-of-lisp/webserver
32.099     /home/midpeter444/Downloads
192.24     /home/midpeter444/lang/clojure/concurrency-fun

The number on left indicates the frecency score. So ambiguous entries will resolve in favor of the one with the higher score.

To navigate somewhere you've been, pass a part of the path to the z command:

midpeter444:~$ z fun
midpeter444:~/lang/clojure/concurrency-fun$

midpeter444:~$ z lisp
midpeter444:~/lang/clojure/books/land-of-lisp/webserver$

midpeter444:~$ z moz
midpeter444:~/.mozilla$

It also has tab-completion. If I hit tab for the example above where I typed moz it expands to:

$ z /home/midpeter444/.mozilla

Now the only part of the command line usage I once found irritating is pure joy.

Programming Praxis Amazon Interview Question, part 2

2012-12-15T16:43:00.000-05:00

In part 1 of this blog entry I covered two relatively efficient implementations of an algorithm to keep the top N entries in a stream of points that came from the Programming Praxis web site:

Given a million points (x, y), give an O(n) solution to find the 100 points closest to (0, 0).

I was happy with my ad-hoc solution since it performed twice as fast as using a sorted-set data structure.

The answer chosen by the author of the Programming Praxis site used a Heap data structure, where the max value is kept at the top of the heap. His implementation for this exercise was a mutable heap, so it wasn't not a viable candidate for a Clojure implementation. However, he has links to other praxis challenges where he implemented heaps (in Scheme), some of which use immutable data structures. I chose to (manually) transpile his Scheme-based "Leftist heap":

This is only one implementation of a heap and there might be (probably is) one that is more efficient, but I chose this as a representative to see how it would compare to the other solutions.

Here is my implementation of the "top 100" challenge using the Leftist Priority Queue Heap:

The Priority Queue does not keep track of its size and I didn't want to modify the data structure to do that. To compensate, I used the Clojure split-at function to split the points vector into two lazy-seqs: the first max-size entries from the points vector, which are just directly inserted into the heap and the rest.

Those remaining points then have to be sifted: if a point's distance from the origin is less than the first element on the heap, then that top entry on the heap needs to be pulled off and the new point is inserted. That is done with the code:

(q/pq-insert dist-lt? pt (q/pq-rest dist-lt? pq))

pq-rest is like Clojure's pop - it gives you the data structure minus its head and we insert the new point into that remaining data structure.

The dist-lt? function is a comparator function required by the Leftist Heap algorithm.

I did this additional exercise, because I suspected that a heap would a more efficient implementation that a fully sorted-set.

Here are some representative benchmark results again using the Clojure criterium tool. (This time I truncated some of the output to make it easier to read.)

user=> (def points (shuffle (for [x (range 500) y (range 200)] [x y])))
#'user/points
user=> (count points)
100000

;; this is the "ad-hoc" solution from part 1
user=> (bench (topN points 100))
Evaluation count : 120 in 60 samples of 2 calls.
             Execution time mean : 525.526800 ms

;; this is the sorted-set solution from part 1
user=> (bench (topNSS points 100))
Evaluation count : 60 in 60 samples of 1 calls.
             Execution time mean : 1.250241 sec

;; this is the new heap-based implementation
user=> (bench (top-heap points 100))
Evaluation count : 60 in 60 samples of 1 calls.
             Execution time mean : 1.063965 sec

I've only shown three results, but I ran them many times in varied order and got remarkably consistent results.

So without going into specific precision, the heap-based implementation is about 20% faster than the sorted-set implementation and my ad-hoc solution is about 50% faster than the heap-based implementation.

Which is about what I expected. I thought the heap-based solution might even be a little faster than this. One problem with the heap implementation I'm using is that it's central function (merge) uses true recursion. I didn't see an easy way to make it use loop-recur or a lazy-seq. If anyone sees a clever way to do that, I'd love to hear it.

Programming Praxis - Would Amazon hire me?

2012-12-14T18:57:00.000-05:00

I was finishing up my work for the day and decided to check Hacker News. High on the list that day was Programming Praxis, which I'd never seen before and the featured post was the following Amazon interview question:

Given a million points (x, y), give an O(n) solution to find the 100 points closest to (0, 0).

I had been planning to go home and start working on Dice of Doom (in Clojure) from Chapter 16 of Conrad Barski's Land of Lisp, but this problem sounded intriguing enough that I would take it up.

After sketching out a few ideas, I concluded that a strict O(n) solution isn't possible, but something near-O(n) would be feasible. Similar to how Clojure's persistent data structures often operate in "essentially constant" time -- O(log-32 n) being close enough to constant time to be considered basically a constant factor.

I decided to try an ad-hoc solution and then a sorted-set implementation. My guess was that the ad-hoc solution would be faster and that gave me a good excuse to try out Hugo Duncan's criterium benchmark tool to prove it (or prove me wrong).

`/* ---[ The ad-hoc solution ]--- */`

The approach I decided upon was to use an unsorted hash-map with mixed keys.

One key :ct would keep track of the count: how many entries ("points") were in the hashmap. It's max would the max-size (100 in the challenge statement).

The second key :hd, short for highest distance, would keep track of the entry (or entries) with the farthest distance from the the center point.

The rest of the keys are integers representing the distance to the origin. This distance key is mapped to a vector of points. Each point is a tuple (vector) of the x and y coordinate.

I decided to interpret distance from (0,0) not as the square root of the sum of the squares of x and y, but rather the absolute value of x plus the absolute value of y, but it wouldn't be too hard to imagine this with the other distance formula.

So the data structure would look like this:

{:ct 4
 :hd 7
 0 [[0 0]]
 2 [[1 1] [0 2]]
 3 [[1 2] [0 3] [3 0]]
 6 [[5 1]]
 7 [[4 3]]}

In this example, the data structure has 7 points, 1 with distance 0, 2 with distance 2, and so on.

Based on showing you this data structure I probably don't need to describe the algorithm I'm going to use. Rob Pike's Rule 5 is:

Data dominates. If you've chosen the right data structures and organized things well, the algorithms will almost always be self-evident. Data structures, not algorithms, are central to programming

Or Brooks' famous statement:

Show me your flowcharts and conceal your tables, and I shall continue to be mystified. Show me your tables, and I won’t usually need your flowcharts; they’ll be obvious.

So now the fun is to implement this in Clojure's immutable data structures using nicely composed functions.

We know we have to iterate through all the input points - there's our O(n) cost without doing any thing else - and we want to end up with one data structure. Therefore, we'll use reduce.

Even though the problem statement says the closest 100 points, for testing, I want to parameterize the max-size and what set of points I'll feed it, so the main function will look like this:

(def init-map {:hd Long/MIN_VALUE, :ct 0})

(defn topN [points max-size]
  (reduce (mk-sift-fn max-size) init-map points))

points will be a collection or seq of x-y tuples and max-size will indicate how many points the final data structure should retain. I pass max-size to a "make-sift-function", which is a higher-order function that will return the function that will "sift" through each point and determine whether it goes into the data structure and if so, where. A place for everything and everything in its place.

(defn mk-sift-fn [max-size]
  (fn [m pt]
    (let [dist (distance pt)]
      (if (< (:ct m) max-size)
        (add-point m dist pt)
        (if (< dist (:hd m))
          (-> m (remove-point (:hd m)) (add-point dist pt))
          m)))))

The flow to the function returned by mk-sift-fn is:

if you haven't seen max-size entries yet, add it to the map (letting the add-point fn figure out where to stick it)
if the data structure is at max capacity, then if the distance from (0,0) of the current point under consideration is less than the current highest distance in the map, then we have to remove one point that maps to the highest-point and add this new point.

I use the -> threading macro to weave through the remove-point and add-point functions, allowing nice composition of the functions with immutable structures.

Here is the full implementation with all the helper methods:

Here's an example run with a small data set:

user=> (require '[sandbox.top100 :refer [topN]] :reload)
nil
user=> (def points (shuffle (for [x (range 2 5) y (range 3)] [x y])))
#'user/points
user=> points
[[3 1] [3 0] [4 0] [2 2] [2 1] [3 2] [4 1] [4 2] [2 0]]
user=> (topN points 5)
{2 [[2 0]], 3 [[3 0] [2 1]], 4 [[4 0] [2 2]], :hd 4, :ct 5}

`/* ---[ The sorted-set solution ]--- */`

Clojure's sorted-sets are binary (persistent) trees. The sort order I use for the sorted set is distance from (0,0) descending.

As before, we'll directly add the first max-size entries we see. After that, we have to remove entries from the set if one with a shorter distance is seen.

Due to our sort order, the point with the greatest distance would be at the top of the tree and is easily removed in constant time using disj when we find a point that is closer to the origin. However, we then have to add that new point to the sorted set and all of these additions average O(log-n) insertion time. I was pretty sure my ad-hoc solution would be more efficient overall because of this extra sort time for all elements that get added.

To define a sorting comparator in Clojure, you use the sorted-set-by fn which takes a comparator of your choosing.

I stated above that the sort order would by distance descending, but since this is a sorted set, not a sorted list or vector, that won't actually work:

user=> (require '[sandbox.top100 :refer [distance]] :reload)
nil
user=> (defn by-dist [pt1 pt2]
  #_=>   (> (distance pt1) (distance pt2)))
#'user/by-dist
user=> points
[[3 1] [3 0] [4 0] [2 2] [2 1] [3 2] [4 1] [4 2] [2 0]]
user=> (apply sorted-set-by by-dist points)
#{[4 2] [3 2] [3 1] [3 0] [2 0]}

We lost some points in the set. We have [3 1], but not [4 0]. Since these have the same "value" in the eyes of the set and a set can keep only one value, the other is dropped.

So I made the sort-by method to take into account equal distances and then do a secondary sort based on the value of the x coordinate, thus keeping all the points we are fed:

(defn dist-then-first [pt1 pt2]
  (let [dist1 (distance pt1)
        dist2 (distance pt2)]
    (if (= dist1 dist2)
      (> (first pt1) (first pt2))
      (> dist1 dist2))))

As before I used reduce to iterate over all the input points. The overall solution is nicely shorter than the ad-hoc one:

`/* ---[ Performance showdown ]--- */`

At the Clojure Conj this year, I discovered Hugo Duncan's criterium benchmark tool, which gives you more robust benchmarks than simply using time.

I used it to compare the solutions above. I redefined the points vector to have 100,000 points. I ran bench twice (in the opposite order). The first time I kept the closest 6 points. The second time I kept the closest 100.

user=> (def points (shuffle (for [x (range 500) y (range 200)] [x y])))
#'user/points
user=> (count points)
100000

user=> (use 'criterium.core)
nil
user=> (bench (topNSS points 6))
Evaluation count : 60 in 60 samples of 1 calls.
             Execution time mean : 1.022929 sec
    Execution time std-deviation : 10.070356 ms
   Execution time lower quantile : 1.006426 sec ( 2.5%)
   Execution time upper quantile : 1.044943 sec (97.5%)

user=> (bench (topN points 6))
Evaluation count : 120 in 60 samples of 2 calls.
             Execution time mean : 545.035140 ms
    Execution time std-deviation : 4.335731 ms
   Execution time lower quantile : 538.861529 ms ( 2.5%)
   Execution time upper quantile : 554.198797 ms (97.5%)

user=> (bench (topN points 100))
Evaluation count : 120 in 60 samples of 2 calls.
             Execution time mean : 531.174287 ms
    Execution time std-deviation : 4.642063 ms
   Execution time lower quantile : 522.942875 ms ( 2.5%)
   Execution time upper quantile : 541.571260 ms (97.5%)

user=> (bench (topNSS points 100))
Evaluation count : 60 in 60 samples of 1 calls.
             Execution time mean : 1.260036 sec
    Execution time std-deviation : 15.670337 ms
   Execution time lower quantile : 1.240810 sec ( 2.5%)
   Execution time upper quantile : 1.292583 sec (97.5%)

Sweet! My ad-hoc solutions runs twice as fast for this dataset. In the end, it's a trade off between the more elegant code and performance, which is often the case.

[Updates]

See part 2 to compare another solution using a heap data structure.
The code for this exercise is now available on my GitHub account: https://github.com/midpeter444/clojure-qad/tree/master/top100

Reading user input from STDIN in Clojure

2012-10-01T22:49:00.000-04:00

Recently I was working on a simple Clojure program where I need to read input from STDIN. I hadn't actually done this before, so I searched online and found others had similar questions, but had to cobble an answer together due to two issues:

In most languages you use a while loop, check for some condition (or input) to be false in order to stop. Clojure actually has a while loop construct, but making it work seems tricky and likely involves mutable state. Paying the overhead of STM (Software Transactional Memory) in order to do a simple input loop, seems like excessive overkill.
When you try to read from STDIN using lein run, it doesn't work. The input is ignored.

So, here's what I have working that is reasonably idiomatic and concise.

In my code example, I want to read a user's input until they type in a sentinel value, which I chose to be :done. Here I just echo the output back:

The trick to getting it work with with Leiningen (I'm using lein-2.0.0-preview10) is to use lein trampoline run, rather than lein run. The trampoline feature allows you to run your app in a separate JVM process rather than within the Leiningen process (as a child process). Running it as a child process somehow blocks STDIN input from being captured.

Example usage:

$ lein trampoline run -m example.stdin
Enter text:
First line
You entered: >>First line<<
Second line
You entered: >>Second line<<
Foo
You entered: >>Foo<<
:done
End

In all likelihood, you would want to capture and retain the input lines in a collection. To do that use an accumulator in your loop:

Ideas on more idiomatic ways to do this in Clojure are welcome.

ThornyDev

An Exercise in Profiling a Go Program

/* ---[ Performance Baseline ]--- */

/* ---[ Profiling the Code ]--- */

/* ---[ Optimization #1 ]--- */

/* ---[ Optimization #2 ]--- */

/* ---[ Lessons Learned ]--- */

/* ---[ Misc Appendix Notes ]--- */

Merkle Tree

/* ---[ Merkle Trees ]--- */

/* ---[ Merkle Tree as Checkpoint Data ]--- */

/* ---[ My Implementation of a Merkle Tree ]--- */

/* ---[ Hash/Digest Algorithm ]--- */

/* ---[ Serialization / Deserialization ]--- */

/* ---[ Usage ]--- */

Hooking up Kafka to Storm with the KafkaSpout in Storm 0.9.3

Updated microbenchmarks for Java String tokenization

/* ---[ Someone is wrong on the internet ]--- */

/* ---[ Splitting a Java string into tokens ]--- */

/* ---[ What does JMH say? ]--- */

/* ---[ Find the flaw ]--- */

/* ---[ The code and setup ]--- */

Rust namespaces by example

/* ---[ Cheat sheet for Rust namespaces ]--- */

Select over multiple Rust Channels

/* ---[ Channels in Rust ]--- */

/* ---[ Using the Rust Channel Select API ]--- */

Debugging Rust with gdb

/* ---[ GDB ]--- */

/* ---[ The setup ]--- */

/* ---[ Aside: rbreak ]--- */

/* ---[ Let's debug ]--- */

/* ---[ Round 2: Set breakpoints on all methods in main file ]--- */

/* ---[ Round 3: Demangle function names ]--- */

Telephonic Whispers in Rust, revisited

/* ---[ Hitting the big time ]--- */

/* ---[ Clarifications ]--- */

/* ---[ The interesting follow up: segmented stacks ]--- */

/* ---[ Valid consideration: Number of native threads ]--- */

/* ---[ Final thoughts ]--- */

Telephonic Whispers in Rust

/* ---[ CSP: an idea whose time has come ]--- */

/* ---[ Telephonic Whispers ]--- */

Debugging Go (golang) programs with gdb

/* ---[ Update: July 2015 ]--- */

/* ---[ Debugging Go ]--- */

/* ---[ GDB ]--- */

/* ---[ Using GDB with Go ]--- */

/* ---[ Looking into Go structs ]--- */

/* ---[ Inspecting "slices" of Go arrays ]--- */

Side note 0: Key bindings with -tui

Side note 1: python error when starting gdb

Side note 2: Function "fslocate.TestReadDatabaseProps" not defined.

Launching a Cascading job from Apache Oozie

/*---[ Failed attempts ]---*/

/*---[ Solution: use the java action ]---*/

Beautiful Code Ported to Go

/* ---[ The C version ]--- */

/* ---[ My Go version ]--- */

/* ---[ Runeology ]--- */

/* ---[ No do-while ]--- */

/* ---[ One addition ]--- */

How to compile Groovy scripts and run them on systems with no Groovy installed

/*---[ Problem ]---*/

/*---[ Solution ]---*/

/*---[ Quoth the Maven ]---*/

/*---[ Two plugins ]---*/

/*---[ Treat groovy like a first class citizen ]---*/

/*---[ Package, push, run ]---*/

VMWare Player Crashes in Ubuntu After Kernel Upgrade

/*---[ Annoyance ]---*/

/*---[ Solution ]---*/

Querying JSON records via Hive

/* ---[ Opacity: A brief rant ]--- */

/* ---[ JSON and Hive: What I've found ]--- */

Built in function #1: get_json_object

Built in function #2: json_tuple

The best option: rcongiu's Hive-JSON SerDe

/* ---[ A tool to automate creation of Hive JSON schemas ]--- */

Signing and Promoting your Clojure libraries on Clojars

`/* ---[ Performance Baseline ]--- */`

`/* ---[ Profiling the Code ]--- */`

`/* ---[ Optimization #1 ]--- */`

`/* ---[ Optimization #2 ]--- */`

`/* ---[ Lessons Learned ]--- */`

`/* ---[ Misc Appendix Notes ]--- */`

`/* ---[ Merkle Trees ]--- */`

`/* ---[ Merkle Tree as Checkpoint Data ]--- */`

`/* ---[ My Implementation of a Merkle Tree ]--- */`

`/* ---[ Hash/Digest Algorithm ]--- */`

`/* ---[ Serialization / Deserialization ]--- */`

`/* ---[ Usage ]--- */`

`/* ---[ Someone is wrong on the internet ]--- */`

`/* ---[ Splitting a Java string into tokens ]--- */`

`/* ---[ What does JMH say? ]--- */`

`/* ---[ Find the flaw ]--- */`

`/* ---[ The code and setup ]--- */`

`/* ---[ Cheat sheet for Rust namespaces ]--- */`

`/* ---[ Channels in Rust ]--- */`

`/* ---[ Using the Rust Channel Select API ]--- */`

`/* ---[ GDB ]--- */`

`/* ---[ The setup ]--- */`

`/* ---[ Aside: rbreak ]--- */`

`/* ---[ Let's debug ]--- */`

`/* ---[ Round 2: Set breakpoints on all methods in main file ]--- */`

`/* ---[ Round 3: Demangle function names ]--- */`

`/* ---[ Hitting the big time ]--- */`

`/* ---[ Clarifications ]--- */`

`/* ---[ The interesting follow up: segmented stacks ]--- */`

`/* ---[ Valid consideration: Number of native threads ]--- */`

`/* ---[ Final thoughts ]--- */`

`/* ---[ CSP: an idea whose time has come ]--- */`

`/* ---[ Telephonic Whispers ]--- */`

`/* ---[ Update: July 2015 ]--- */`

`/* ---[ Debugging Go ]--- */`

`/* ---[ GDB ]--- */`

`/* ---[ Using GDB with Go ]--- */`

`/* ---[ Looking into Go structs ]--- */`

`/* ---[ Inspecting "slices" of Go arrays ]--- */`

`/---[ Failed attempts ]---/`

`/---[ Solution: use the java action ]---/`

`/* ---[ The C version ]--- */`

`/* ---[ My Go version ]--- */`

`/* ---[ Runeology ]--- */`

`/* ---[ No do-while ]--- */`

`/* ---[ One addition ]--- */`

`/---[ Problem ]---/`

`/---[ Solution ]---/`

`/---[ Quoth the Maven ]---/`

`/---[ Two plugins ]---/`

`/---[ Treat groovy like a first class citizen ]---/`

`/---[ Package, push, run ]---/`

`/---[ Annoyance ]---/`

`/---[ Solution ]---/`

`/* ---[ Opacity: A brief rant ]--- */`

`/* ---[ JSON and Hive: What I've found ]--- */`

`/* ---[ A tool to automate creation of Hive JSON schemas ]--- */`

`/* ---[ Signing your Clojure libraries ]--- */`

`/* ---[ STEP 1: Generate GPG Keys ]--- */`

`/* ---[ STEP 2: Publish your public GPG key to a keyserver ]--- */`

`/* ---[ STEP 3: Add your GPG key to your Clojars account ]--- */`

`/* ---[ STEP 4: Prepare your project and its metadata ]--- */`

`/* ---[ STEP 5: Commit your code ]--- */`

`/* ---[ STEP 6: Deploy to Clojars ]--- */`

`/* ---[ STEP 7: Check whether your jar was signed ]--- */`

`/* ---[ Optional STEP 8: Promote to release status ]--- */`

`/* ---[ Functional Clojure Idioms ]--- */`

`/* ---[ Channels as Queues ]--- */`

`/* ---[ Making channels more idiomatic ]--- */`

`/* ---[ Next ]--- */`

`/* ---[ Resources and Notes ]--- */`

`/* ---[ Why go-lightly? ]--- */`

`/* ---[ Why bounded blocking queues? ]--- */`

`/* ---[ Beautiful concurrency ]--- */`

`/* ---[ Next ]--- */`

`/* ---[ Resources ]--- */`

`/* ---[ A construct to wait for the next available channel ]--- */`

`/* ---[ Implementing select in Clojure ]--- */`

`/* ---[ Adding "default" and timeouts to Clojure select ]--- */`

`/* ---[ Emulating Go's select in lamina ]--- */`