Saturday, December 24, 2011

My CS/Programming Top 10 for 2011

As a year end summary, I took a little time look back through my personal wiki where I keep notes on technical topics in order to make a "Top 10" List of the papers, books, articles, blogs entries, podcasts, videocasts and conferences that impacted and challenged me to expand my knowledge this year.  I made a list and checked it twice.

So in no particular order, here are my top 10 with a short bit of commentary on why you may want to read/watch/investigate these topics if you haven't already.



Simple-Made-Easy, Video Presentation, by Rich Hickey

Link: http://www.infoq.com/presentations/Simple-Made-Easy

Summary and Thoughts: If this presentation hadn't been given by Rich Hickey, I probably would have skipped it. But this year I ran across a saying (which I put as a moniker on my tech wiki): When brilliant people create things, study them. I'm not alone in thinking that Rich Hickey falls into the above category. 

I ended up watching this presentation three times over the course of a couple of months, in part because I got into a debate with work colleagues over what is a more simple vs. a more complex design. This talk helped me define those terms in terms of entanglement or intertwining - less is simpler.

It also challenges some key articles of faith of the recent modern software engineering thinking around how following TDD almost blindly will lead to proper design. All things in moderation, said the Greeks (the ancient ones, not the ones in current financial disarray) - and that applies to the often polarized comparisons between TDD vs. up-front design. On this I agree with Hickey. 




The Vietnam of Computer Science, blog entry, by Ted Neward

Link: http://blogs.tedneward.com/2006/06/26/The+Vietnam+Of+Computer+Science.aspx

Summary and Thoughts: This blog post is from 2006, so you can tell this is a 2011 top 10 from my personal experience, not publications of 2011. In addition to providing a refresher course in the history of the Vietnam War, Neward argues that Object-Relational Mapping (ORM) is in general an extremely leaky abstraction whose siren song of an easy life is alluring, but eventually a nasty law of diminishing returns kicks in. To pile on, Rich Hickey, in the presentation listed in item #1 above, says that ORMs have OMG complexity (OMG = oh my goodness), where the goal of software design is simplicity, not complexity.

Neward uses the analogy of the "Drug Trap": you start off taking low doses of a pharmaceutical your doctor prescribes and it works great for a while. Then it stops working, so you increase the dose, which works great for a while and then you increase it again, forming a cycle of ratcheting up until unhealthy extremes are reached. For an ORM, the ratcheting is the amount of effort you need to get your complex domain model requirements into and out of your (relational) database - and as the leaky abstraction begins to bleed more heavily you have to make some tough choices: either bear down and force the O/R model to do what you want with increasingly complex APIs or workarounds (and Neward outlines a number of those),  decide to stop using O/R for the hard stuff and go back to good ol' SQL, or find another way altogether, such as integration of relational concepts into the language. In the summary section, he lists the six possible options - to which a seventh could now be added of using a non-relational (NoSQL) database.

As popular as Hibernate is in the Java space and ActiveRecord in the Ruby (Rails) space as the near-absolute default way of working, this is a controversial opinion.  One could argue that 5 years later, ORMs have gotten better and there are good arguments that given some scenarios, ORMs make good sense.

However, where I work we have settled on MyBatis (formerly iBatis), which is not an ORM tool, as a better mapping model for most database scenarios and also integrates nicely with Spring. I need to research LINQ (which even Rich Hickey praised in his talk on simplicity) and the Spring Data efforts to get a more rounded view on the whole area before I make up my mind. And in the Ruby world, Sequel is getting a lot of press, and is also something on my "to-research" list.



REST to My Wife, blog entry, by Ryan Tomayko


Link:  http://tomayko.com/writings/rest-to-my-wife

Summary and Thoughts: I got deeper into REST and RESTful design this year and this blog entry started me off early in 2011. A good read to step up a level from the details of REST vs. SOAP type designs to see the bigger picture of what we are grappling with designing inter-system communication.



VoltDB and NewSQL, presentations and writings by Michael Stonebraker


Links: 
Presentation 1: http://www.slideshare.net/Dataversity/newsql-vs-nosql-for-new-oltp-michael-stonebraker-voltdb
Presentation 2http://bit.ly/v6ANsq
ACM article, 10 rules for scalable performance in 'simple operation' datastores: http://bit.ly/tAlaVe

Summary and Thoughts:
 Now that the NoSQL movement is in full swing and has gained wider corporate adoption, some of its tenets, such as dropping full ACID consistency and the relational model, are being revisited and challenged as being necessary requirements to doing web-scale distributed databases. This movement has branded itself as "NewSQL" to distinguish itself from "Old SQL" (Oracle, MySQL, PostgreSQL and the like) and from the non-relational, mostly non-ACID NoSQL data stores (Neo4j being a pleasant exception). NewSQL, and in particular VoltDB, are targeting the high-data-volume and high-data-throughput OLTP space - directly taking on the traditional relational databases. Michael Stonebraker, one of the early pioneers of relational databases (in his work in Ingres, which lead directly to PostgreSQL), is the CTO of VoltDB and proponent of its new model.

I haven't seen VoltDB get a lot of press, but after I watched the presentations above, I was enamored by this approach, in part because it provides a very clean solution to the problem domain I am working on at my day job.  The key aspects to their approach include:

  1. Individual node performance matters, not just parallelization: get rid of the overhead of databases by making all writes (inserts/updates) single threaded - no read/write lock contention or B-Tree latch contention to handle
  2. Have an in-memory database, with very lightweight persistence of the "commands", not the current state of the db.  
  3. Push the database logic and processing to the database using VoltDB style stored procedures (which you write in Java)
  4. Use shared-nothing scalability: find the keys by which to shard your data and distribute your data over as many machines as you need/can as long as you try to make the vast majority of your transactions single-sharded - replicate fully the other data. Make sure you database can do this for you transparently.
  5. The SQL and relational models are solid foundations on which to continue to build OLTP type applications (but granted they do not fit all models/use cases).
By following this model, Stonebraker claims you can potentially achieve 40x faster database operations that are fully ACID-transactional.  If you combine this with a processing model using the LMAX Disruptors (next entry below), one could build an extraordinarily fast OLTP processing and data storage system. This is something I'm extremely interested in prototyping and considering for production use.


LMAX Disruptors, software library (and presentations/blogs), by LMAX team in the UK

Link:
 
Presentation: http://www.infoq.com/presentations/LMAX
Fowler bliki entry: http://martinfowler.com/articles/lmax.html

Summary and Thoughts:
 Many of us are trying to find ways to leverage parallel computation. The trend towards Functional Programming promises easier concurrency models. Google formalized and consolidated many threads around massively parallel multi-node processing with MapReduce - now made open source with Hadoop and similar tools. We also see implementations of highly concurrent fault-tolerant systems using Erlang and Akka to spin up thousands of Actors in a fault-tolerant system to do highly concurrent processing. Using technologies like these many, like Google, are processing astounding amounts of data on compute grids or handling massive numbers of simultaneous users. 

At a time like this when the industry is focused on going parallel, the LMAX team comes along and says there is at least one class of problems for which massively parallel systems (and often more functional style of programming) is actually not the answer.  In fact, there are times when a single threaded processing model solves a host of problems - but can we make it performant enough to be web-scale?

The team at LMAX faces very high throughput demands that requires predictable and very low latency. They tried using a SEDA queue based system (such as the Actor model), but found that the queue is actually, perhaps surprisingly, a rather poor data structure for a highly concurrent environment.  As with the Stonebraker/Harizopoulos analysis on where multi-threaded databases spend their time (see the "VoltDB and NewSQL" entry above), a system with very high volumes that uses queues to pass data from one processing thread/stage to another actually spends far too much of its time negotiating locks rather than doing real work.

So they devoted research to understand how to optimize L1/L2 cache and utilize multiple cores efficiently. They combined this with a RingBuffer data structure (not something new - it is used in the Linux kernel for example) to create a Java library they call a Disruptor and then released it with an open source license.

With their model, they measure processing 6 million transactions per second on a single server, including persisting "commands" or "events" to disk. There are a couple of ways to use their API (one form is a simplified "DSL version"). It will manage "barriers" between writers (producers) and readers (consumers) for you. Consumers will not run ahead of producers and you can have consumers process in parallel or be gated off each other to process serially.

They have done a number of presentations available on the web and Martin Fowler was impressed enough to write a long and detailed bliki entry about it - those are the links above.

This model is hard to describe in a few short sentences, so I recommend watching the presentations and reading Fowler's write up. I think this is really impressive battle-tested stuff to have in your arsenal.


Neo4j koans, software learning tool via unit tests


Link: https://github.com/jimwebber/neo4j-tutorial


Summary and Thoughts: I've already written a blog entry on my experience setting up the koans: http://thornydev.blogspot.com/2011/11/neo4j-koans-how-do-i-begin.html. Neo4j is good stuff and these koans definitely help you learn most aspects of the Neo4j API (using Java).



The problem of Garbage Collection in the face of really really big RAM sizes, video presentation, by Gil Tene (Azul)


Link: http://www.infoq.com/presentations/Understanding-Java-Garbage-Collection

Summary and Thoughts: At my first full-time software job, during the dot-com boom, I and a team of 3 others built an internet ad-server in Java.  We built a large (at the time) heap of 512M and tried to reuse objects through our own object pools as much as possible to avoid gc. We tried to get fancy in a few ways, including trying to use JNI to control garbage collection. In the end, despite valiant efforts, we failed, with 10 second gc pauses killing our real-time ad serving every 15 or so minutes. Management wasn't happy, we realized that gc is hard and decided to give up and rewrite the ad server in C++.

This presentation brought back all those memories and pointed out that we are in an absurd dilemma now as "memory becomes the new disk". At the time of this presentation, the author says that 512 GB (that's a "G") of memory is actually the cheapest price point on servers when evaluated on a per unit memory cost. And most Java developers don't have a prayer of using that much memory in a single application. Garbage collection pauses can be delayed, but grow basically linearly with the growth of memory size. If we faced 10s "stop-the-world" pauses with 512M memory, you might be facing ~5 ''minute'' stop-the-world pauses with 512GB memory (Tene says ~1 sec pause per GB live memory). Talk about management not being happy...

"Most of what people know about garbage collection turns out to be wrong" says the presenter, Tene. Kind of like trying to get advice on software security and cryptography by searching google. Hit-or-miss for sure.

You'll learn a lot about garbage collection in general during the first hour or so of the presentation and even though the last bit is a bit rushed, there are key insights in the last 10-15 minutes well worth waiting for and rewatching and researching. This problem, along with concurrent programming with multiple cores, are two of the key challenges that face software development for the next 5+ years (I'd throw in solving mobile development as another key challenge). The Azul guys seem to be leading the way on the first problem in the JVM space.

I haven't used their garbage collector, but I'm definitely keeping it mind as we consider putting more and more data and cache in memory.  (And I wonder if this would make VoltDB even better?)

More on the Azul gc: http://www.infoq.com/articles/azul_gc_in_detail




A Clojure Cornucopia:
Joy of Clojure, book by Michael Fogus, et al.

The Clojure koans, software program
The suggestion of Clojure as Guiding Light into the next phase of programming, blog by Uncle Bob Martin


Links:  


[31-Dec-2011 Update: I just discovered the online Clojure koans at: http://www.4clojure.com and highly recommend them as well.]

Summary and Thoughts:
I had to put something about Clojure in here, as there is just so much right about it. I'm still a Clojure newbie and I've dabbled on and off with it, and currently the Joy of Clojure is kicking my butt.  Start with the new Programming Clojure coming out early next year or Clojure in Action, out now, do the koans, do the Little Schemer (see last entry in this blog) and do Joy of Clojure. In many ways, one has to re-learn programming while learning Clojure. Learning to think more functionally is the obvious one, but also learning to think in terms of immutable data structures and how to use loop/recur kind-of-like tail recursion, using accumulators. 

Since I'm still swimming in the sea-of-newness, I don't have anything profound to say on Clojure at this point, but I will invoke 
argumentum ad verecundiam (argument from authority), from well respected people that see great promise in Clojure:
For those of us using the JVM or .NET, Clojure feels like a minor miracle. It’s an astoundingly high-quality language, sure—in fact, I’m beginning to think it’s the best I’ve ever seen—yet somehow it has still managed to be fashionable. That’s quite a trick. It gives me renewed hope for the overall future of productivity in our industry. We might just dig ourselves out of this hole we’re in and get back to where every project feels like a legacy-free startup, just like it was in the early days of Java. 
 - Steve Yegge, forward to Joy of Clojure

How are we going to write that code? What language are we going to use to express the concepts that execute in thousands of machines?
Of course the current answer is “Functional Programming”. OK, maybe — maybe. But I gotta tell ya, the new functional languages out there aren’t looking too good to me. Scala and F# are still closely tied to the hardware. Scala is java with a few cute twists. And F#? Is it really the language that’s going to take us into the next age? My fear is that these languages suffer from the Sapir/Whorf trap. Their mode of expression does not sufficiently change our world view. They are too much like Java and C# and all the other old languages.
The only language I’ve seen so far that comes close to a different mode of expression, and that is capable for use in the enterprise, is Clojure.
- Uncle Bob Martin, blog entry above


Akka Cornucopia:
Akka library, 
Akka se-radio podcast, and 
Concurrent Programming Concurrency on the JVM, book

Links: 

Summary and Thoughts: At the NFJS conference this year, Akka was one of the prominent libraries being touted and one of the NFJS conference presenters, Venkat Subramaniam, released a new book Programming Concurrency on the JVM, which I read that heavily explores Akka. 

The Akka model is an Actor implementation based on Erlang's model. Jonas Boner says in the podcast mentioned above that he learned Erlang a while back and really liked its model, but wasn't able to convince enough people to adopt Erlang, so he decided to bring Erlang's Actor model to the JVM via Scala. Scala already had an Actor implementation, but was not as sophisticated or robust as Erlang's so he created the Akka library. Akka now has a large following and can be used not only by Scala developers but Java developers. I can attest that we are starting to use it in our projects where I work. 

If your problem domain either breaks down into nicely composable SEDA stages or you want a model with fault-tolerant concurrency with no shared mutable state, Akka and the Actor model are worth considering.

Interestingly, however, Clojure (Rich Hickey) has decided against incorporating the Actor model into Clojure, preferring instead agents as he describes here: http://clojure.org/state. I have to confess I haven't fully grokked why - something that may need to be learned through experience. Of course, being a JVM language and the fact that Akka has a nice Java API, you could use Akka in Clojure via its Java interop features :-)  However, Hickey does say at the end: Clojure may eventually support the actor model for distributed programming, paying the price only when distribution is required, but I think it is quite cumbersome for same-process programming.

This points up another bonus feature of using Akka - it has remoting built in and includes integration with Apache Camel. We haven't started using these features yet, but since we have a Camel integration framework, this is a nicer feature to know we can leverage this built-in features.



Little Schemer

Link: http://www.amazon.com/Little-Schemer-Daniel-P-Friedman/dp/0262560992/ref=sr_1_1?ie=UTF8&qid=1324704715&sr=8-1

Summary and Thoughts: I was not fortunate enough (some would say unfortunate enough) to learn Lisp/Scheme in college.  It has been on my list for years and when Clojure came along and started getting popular, I decided it was time.

Douglas Crockford says in
JavaScript: The Good Parts that all computer scientists should read The Little Schemer to truly understand recursion and how to think recursively.  He, in fact, re-did the little schemer exercises in JavaScript, which is actually a very expressive language for functional concepts (just with C-style syntax).

So this year, I finally decided to sit down and read it. I combined this reading with learning the modern Lisp-on-the-JVM, Clojure.  So every time The Little Schemer challenged you to think through the algorithm, I wrote it in Clojure.

About half way through, I started doing it in Scala as well, which I am also in the early stages of learning. (Trying to figure out how to do functional programming in Scala does not seem as natural as doing it in Clojure (for a newbie at least).)


In any case, I can attest that after "doing" this book, recursive thinking is definitely more natural now and sometimes when I approach a problem I can immediately see how to do it recursively and how to do it iteratively takes longer to think through - a Necker Cube-like transition. I got
The Seasoned Schemer (the sequel) for Christmas this year, so that will be on my todo list for 2012.

Sunday, December 11, 2011

Happiness requires discipline


Staying singly focused on a task in this digital era is like trying to resist eating while sitting in a bakery as cookies, pies, and cakes emerge fresh and fragrant from the oven.

We live in the era of rampant multitasking, but I've seen studies that prove people aren't as good at as they think.  Kinda like people think they can learn to do with less sleep. But here we fight our biology. It turns out that most of really do need a full 8 hours of sleep. Not 6, not even 7. Empirical task-based studies show that if you don't get 8 pretty much every night, you are not performing optimally. And as programmers, we need to be optimal to do the hard task of programming.

Self-discipline, self-control and focus are the keys to success in programming and life - including health and happiness.

At the NFJS conference this year, I heard about the Pomodoro technique for the first time - a form of focused intense concentration alternated with a short break.  I've been trying it out. For my environment at work, I find that it is pretty effective.  But for big problems where you really need to keep a lot of state in your head for prolonged periods, a double pomodoro may be needed.  Of course, one could argue that even longer than 50 minutes is optimal. Probably true, especially if you are in flow, but here too, sigh, it turns out we fight our biology.

A lot of studies lately have been showing that sitting for prolonged periods is detrimental to your health.  Sitting longer than an hour is unfortunately not advised (see links below).  So do some walking, stretching and knee bends during those pomodoro breaks.

So a programmer is happy and optimal when self-disciplined to:
  • minimize distractions (my old roommate used to program with the TV on in the background, aargh)
  • spend most of the day in periods of intense concentration mentally stepping through code paths, design considerations, testing approaches and problem solving
  • balance the mental discipline with the physical discipline of exercise, breaks and, *sigh*, 8 hours of sleep a night, even when hacking late into the night would be more fun

That's how I see it, anyway.  (And now I need to get my *rse out of the chair and get some blood flowing.)



Links about prolonged sitting:
Excellent book on our biological requirements for sleep:

Wednesday, November 30, 2011

Neo4j Koans - How do I begin?

Koans have become a smart way for people to learn a new language or technology.  I've seen them in the Ruby space with the ruby koans: http://rubykoans.com/, but I've never actually done one before.  They are set up as a set of failing unit tests and one by one you have to fix each test to gain enlightenment.

I recently became very interested in the Neo4j graph database: http://neo4j.org/, as I'm working on data structures that are hierarchical and tricky to walk in a relational database.

I came across an interesting video presentation on neo4j by Ian Robinson on the skillsmatter website - http://skillsmatter.com/podcast/design-architecture/neo4j. He and colleagues have put together a database of facts from the Doctor Who British TV series. That's when I became aware that he, Jim Webber and some other colleagues created set of Neo4j koans available on github: https://github.com/jimwebber/neo4j-tutorial/.

So I decided to tackle my first ever set of koans.


Soon after, I discovered that while Jim Webber and crew put in a lot of time creating the neo4j koan tutorial, they seem to have to put in little time on how a newbie to koans can actually run them - at least as of November 2011 when I downloaded it. I don't understand why you would put hours into it and not put a few minutes into documenting what to do. Please provide a README on how to do things like this. If you go to all the effort of creating it, why wouldn't you?

[Dec 2011 Update: There is now good starter info on the neo4j koan github site, so many thanks to Jim et al. for improving it!]

So in case it helps anyone else, here is my version of how I fumbled around to do the neo4j koans.


/* ---[ Getting the Koans Installed and Set up ]--- */

These instructions are targeted particularly for a Unix/Linux environment. My koan work was done on Linux (Xubuntu 11.10). I also tested them on Windows where I have Mingw32 that comes with the git download (and I have cygwin).

[Dec 2011 Update:  The neo4j koan github site now says it will work with cygwin on Windows - they actually say "sorry" about this, but being a Unix/Linux devotee my advice is that if you don't have cygwin yet, this is a good reason to get it and learn it. Consider it a bonus to learn a better way of doing things.]

I'm also primarily a command line kind of guy, though I do also use Eclipse for many things. In this case, I went commando - so I didn't use Eclipse's JUnit integration or download the neoclipse plug in (which sounds cool -- need to try it someday).

First download it from github (URL above). Be aware that the download is around 350 MB, so it may take a awhile if you have a lower-speed internet connection like me.

Second, cd to the main directory (neo4j-tutorial) and type ant - this will run ivy and download half the known universe in good ivy/maven fashion (grrr...). After waiting about an hour (or less if you have a better internet connection than me), you can begin by wondering what to do next. First make sure the build ran to successful completion - happily the koans as unit tests all passed out of the box for me, so you want to make sure that is all working on your system before beginning.

The authors provide a presentation in the presentation directory (for some reason in ppt and not converted to pdf for more general viewing), which can be helpful, but wasn't enough to really know how to do the koans. I recommend coming back to the presentation periodically and reviewing it for the section you are working on.  Some of its visuals and notes are helpful, but mostly you'll just need to read the codebase and neo4j javadocs to really know how to get things done.

Next run the tree command to get a look around (you may need to download tree - its a great Unix command line tool to see files/dirs in a compact tree structure).

You'll see that the koans are in the src directory (output from the tree cmd):
├── src
│   ├── koan
│   │   └── java
│   │       └── org
│   │           └── neo4j
│   │               └── tutorial
│   │                   ├── Koan01.java
│   │                   ├── Koan02.java
│   │                   ├── Koan03.java
│   │                   ├── Koan04.java
│   │                   ├── Koan05.java
│   │                   ├── Koan06.java
│   │                   ├── Koan07.java
│   │                   ├── Koan08a.java
│   │                   ├── Koan08b.java
│   │                   ├── Koan08c.java
│   │                   ├── Koan09.java
│   │                   ├── Koan10.java
│   │                   ├── Koan11.java

and if you look into them, you'll see comments and snippet sections that say "your code here", but the come pre-filled in with the answers.

However, if you go into the src/main/scripts directory you'll notice a a "release.sh" script, which extracts the relevant portion of the koans from the current github dir and copies it to a temp directory and runs the remove_snippets script. However I couldn't get it to work after 15 minutes of futzing with it and the documentation for it is basically useless.

[Dec 2011 Update: I tried it again and it works now for me. Either I did something wrong the first time or Jim Webber tweaked it.]


There is also a remove_snippets.sh.  You can run that directly - you are supposed to do it from the top-level dir, not the scripts directory. Either way I got an error message.  But it does work if you run it from the top level dir, despite the error message.  Here's what I got:

$ src/main/scripts/remove_snippets.sh
sed: can't read : No such file or directory
$ git st
 M src/koan/java/org/neo4j/tutorial/Koan02.java
 M src/koan/java/org/neo4j/tutorial/Koan03.java
 M src/koan/java/org/neo4j/tutorial/Koan04.java
 M src/koan/java/org/neo4j/tutorial/Koan05.java
 M src/koan/java/org/neo4j/tutorial/Koan06.java
 M src/koan/java/org/neo4j/tutorial/Koan07.java
 M src/koan/java/org/neo4j/tutorial/Koan08a.java
 M src/koan/java/org/neo4j/tutorial/Koan08b.java
 M src/koan/java/org/neo4j/tutorial/Koan08c.java
 M src/koan/java/org/neo4j/tutorial/Koan09.java
 M src/koan/java/org/neo4j/tutorial/Koan10.java
 M src/koan/java/org/neo4j/tutorial/Koan11.java

(git st is aliased to git status -s on my machine)

So it did modify the Koans and remove the parts I'm supposed to fill in.

After doing this, I recommend doing git reset --hard to get back the filled in koans. Copy them to another directory, so you can peek at them when you are doing the koans in case you get stuck or want to compare your solution with the official one.

Then run the remove_snippets.sh script again and do a git add and git commit. Now we are ready to start the koans (whew!).


/* ---[ Doing the Koans  ]--- */

The koans are unit tests. After you run the remove_snippets script, all the koan unit tests will fail (except for Koan01, which for some reason has no snippet for you to replace - it is a just a reading koan, not a doing koan, I guess).

You need to fix the koans one by one and get each test passing. Unfortunately, I couldn't find a way to run each test separately, you have to do the full battery, plus the annoying-as-hell ivy checks.  <rant>Speaking of which, you won't be able to run these tests while offline, even after you've download everything via ivy.  This is my biggest complaint about the ivy/maven model.  I frequently want to do offline working, so I curse setups that require everything to be done via ivy/maven.</rant>

One way you can sift through all the noise of the output is to run this unit tests like this:
$ ant | grep Koan11

Then you will just get the output of Koan11 (though you have to run everything unless you want to edit the Ant build.xml file), like this:

$ ant | grep Koan11
    [junit] Running org.neo4j.tutorial.Koan11
    [junit] TEST org.neo4j.tutorial.Koan11 FAILED
    [junit] Tests FAILED


BUILD FAILED
/home/midpeter444/java/projects/neo4j-koans/neo4j-tutorial/build.xml:68: Build failed due to Koan failures

I also found context lines and simple pattern matching to help, such as:
ant | grep -C 4 Koan0[123]



/* ---[ Only Running the Tests You Want  ]--- */

As I said, there is no target in the ant file (or any helper scripts) to only run one test/koan at a time. And each one takes many seconds, so the whole thing can take well over a minute (depending on machine speed).

To do the easiest thing that would work, I issued the following command (you could do it with sed if you don't have perl installed):

find src/koan/java -name Koan*.java -print | xargs perl -pi -e "s/([@]Test)/\/\/\1/g"

It just comments out all the @Test annotations in the Koan files.  Run this once at the beginning. Then you can remove the comments from each test as you are working on them.  So the ant file stills runs all the koans, but they don't take very long if you haven't uncommented them.


/* ---[ Reading the error report output  ]--- */

Don't make the same mistake I did spending lots of time going through target/koan/reports/TESTS-TestSuites.xml output. I later found that a nice html report is provided one more directory down.  Open the target/koan/reports/output/index.html in your browser and refresh after each test - this is very nice!

So the cycle is:
1. Edit the Koan test until you are ready to run (uncomment that one's @Test annotation)
2. Run: ant | grep -C 4 Koan0[34]  (modify the numbers as needed)
3. Refresh target/koan/reports/output/index.html in your browser and refresh after each test.

Note also that if you debug by printing to stdout, there is a link on the index.html output to view it - use that as needed.

Finally, I wrote a little helper class that will print out all the properties and values of a Node - this is helpful in debugging. Here is the code:


package org.neo4j.tutorial;


import org.neo4j.graphdb.Node;


public class NodePP {
  public static String pp(Node n) {
    String s = "Node: ";
    for (String k: n.getPropertyKeys()) {
      s += String.format("\n  %s: %s", k, n.getProperty(k));
    }
    return s;
  }
}


Complaints aside, I'm currently finishing Koan08c and I recommend them as a good way to learn neo4j and think in terms of graphs. And so far, I really like the cypher query language....


[Dec 2011 Update:] I've finished them all now. Since I only broke down and cheated once by looking at the pre-filled in version (Koan11 was a doozy), it took me a while, but I feel like I have a very good sense of how to use neo4j now.  The koans provide great coverage of the approaches to using the database.


Neo4j is a very promising database and I think it is a serious player in the NoSQL space. The next question for me is how to use it with one's domain model in POJOs. I see four options:
  1. Serialize/deserialize POJOs to/from JSON and use the neo4j REST API. This will be less performant, but is the option if you are using a standalone database server.
  2. Make your POJOs Neo-aware - have them wrap Nodes and Relationship and keep their attributes in neo Node/Relationship properties. This obviously tightly couples your domain to your persistance layer.
  3. Use CQL (cypher query language) like you do SQL when not using an ORM. Cypher is very nice and well thought out. I wonder how hard it would be to construct a MyBatis-like mapper between Cypher and your POJOs.
  4. Use the Spring Data Neo4j annotation bindings for neo4j. This looks promising. I've started looking at it, but no strong opinion yet. They do say that it will be less performant than directly using the direct neo4j API (such as the Traversal API), as there is metaprogramming (Java Reflection API usage) going on.