Wednesday, November 30, 2011

Neo4j Koans - How do I begin?

Koans have become a smart way for people to learn a new language or technology.  I've seen them in the Ruby space with the ruby koans: http://rubykoans.com/, but I've never actually done one before.  They are set up as a set of failing unit tests and one by one you have to fix each test to gain enlightenment.

I recently became very interested in the Neo4j graph database: http://neo4j.org/, as I'm working on data structures that are hierarchical and tricky to walk in a relational database.

I came across an interesting video presentation on neo4j by Ian Robinson on the skillsmatter website - http://skillsmatter.com/podcast/design-architecture/neo4j. He and colleagues have put together a database of facts from the Doctor Who British TV series. That's when I became aware that he, Jim Webber and some other colleagues created set of Neo4j koans available on github: https://github.com/jimwebber/neo4j-tutorial/.

So I decided to tackle my first ever set of koans.


Soon after, I discovered that while Jim Webber and crew put in a lot of time creating the neo4j koan tutorial, they seem to have to put in little time on how a newbie to koans can actually run them - at least as of November 2011 when I downloaded it. I don't understand why you would put hours into it and not put a few minutes into documenting what to do. Please provide a README on how to do things like this. If you go to all the effort of creating it, why wouldn't you?

[Dec 2011 Update: There is now good starter info on the neo4j koan github site, so many thanks to Jim et al. for improving it!]

So in case it helps anyone else, here is my version of how I fumbled around to do the neo4j koans.


/* ---[ Getting the Koans Installed and Set up ]--- */

These instructions are targeted particularly for a Unix/Linux environment. My koan work was done on Linux (Xubuntu 11.10). I also tested them on Windows where I have Mingw32 that comes with the git download (and I have cygwin).

[Dec 2011 Update:  The neo4j koan github site now says it will work with cygwin on Windows - they actually say "sorry" about this, but being a Unix/Linux devotee my advice is that if you don't have cygwin yet, this is a good reason to get it and learn it. Consider it a bonus to learn a better way of doing things.]

I'm also primarily a command line kind of guy, though I do also use Eclipse for many things. In this case, I went commando - so I didn't use Eclipse's JUnit integration or download the neoclipse plug in (which sounds cool -- need to try it someday).

First download it from github (URL above). Be aware that the download is around 350 MB, so it may take a awhile if you have a lower-speed internet connection like me.

Second, cd to the main directory (neo4j-tutorial) and type ant - this will run ivy and download half the known universe in good ivy/maven fashion (grrr...). After waiting about an hour (or less if you have a better internet connection than me), you can begin by wondering what to do next. First make sure the build ran to successful completion - happily the koans as unit tests all passed out of the box for me, so you want to make sure that is all working on your system before beginning.

The authors provide a presentation in the presentation directory (for some reason in ppt and not converted to pdf for more general viewing), which can be helpful, but wasn't enough to really know how to do the koans. I recommend coming back to the presentation periodically and reviewing it for the section you are working on.  Some of its visuals and notes are helpful, but mostly you'll just need to read the codebase and neo4j javadocs to really know how to get things done.

Next run the tree command to get a look around (you may need to download tree - its a great Unix command line tool to see files/dirs in a compact tree structure).

You'll see that the koans are in the src directory (output from the tree cmd):
├── src
│   ├── koan
│   │   └── java
│   │       └── org
│   │           └── neo4j
│   │               └── tutorial
│   │                   ├── Koan01.java
│   │                   ├── Koan02.java
│   │                   ├── Koan03.java
│   │                   ├── Koan04.java
│   │                   ├── Koan05.java
│   │                   ├── Koan06.java
│   │                   ├── Koan07.java
│   │                   ├── Koan08a.java
│   │                   ├── Koan08b.java
│   │                   ├── Koan08c.java
│   │                   ├── Koan09.java
│   │                   ├── Koan10.java
│   │                   ├── Koan11.java

and if you look into them, you'll see comments and snippet sections that say "your code here", but the come pre-filled in with the answers.

However, if you go into the src/main/scripts directory you'll notice a a "release.sh" script, which extracts the relevant portion of the koans from the current github dir and copies it to a temp directory and runs the remove_snippets script. However I couldn't get it to work after 15 minutes of futzing with it and the documentation for it is basically useless.

[Dec 2011 Update: I tried it again and it works now for me. Either I did something wrong the first time or Jim Webber tweaked it.]


There is also a remove_snippets.sh.  You can run that directly - you are supposed to do it from the top-level dir, not the scripts directory. Either way I got an error message.  But it does work if you run it from the top level dir, despite the error message.  Here's what I got:

$ src/main/scripts/remove_snippets.sh
sed: can't read : No such file or directory
$ git st
 M src/koan/java/org/neo4j/tutorial/Koan02.java
 M src/koan/java/org/neo4j/tutorial/Koan03.java
 M src/koan/java/org/neo4j/tutorial/Koan04.java
 M src/koan/java/org/neo4j/tutorial/Koan05.java
 M src/koan/java/org/neo4j/tutorial/Koan06.java
 M src/koan/java/org/neo4j/tutorial/Koan07.java
 M src/koan/java/org/neo4j/tutorial/Koan08a.java
 M src/koan/java/org/neo4j/tutorial/Koan08b.java
 M src/koan/java/org/neo4j/tutorial/Koan08c.java
 M src/koan/java/org/neo4j/tutorial/Koan09.java
 M src/koan/java/org/neo4j/tutorial/Koan10.java
 M src/koan/java/org/neo4j/tutorial/Koan11.java

(git st is aliased to git status -s on my machine)

So it did modify the Koans and remove the parts I'm supposed to fill in.

After doing this, I recommend doing git reset --hard to get back the filled in koans. Copy them to another directory, so you can peek at them when you are doing the koans in case you get stuck or want to compare your solution with the official one.

Then run the remove_snippets.sh script again and do a git add and git commit. Now we are ready to start the koans (whew!).


/* ---[ Doing the Koans  ]--- */

The koans are unit tests. After you run the remove_snippets script, all the koan unit tests will fail (except for Koan01, which for some reason has no snippet for you to replace - it is a just a reading koan, not a doing koan, I guess).

You need to fix the koans one by one and get each test passing. Unfortunately, I couldn't find a way to run each test separately, you have to do the full battery, plus the annoying-as-hell ivy checks.  <rant>Speaking of which, you won't be able to run these tests while offline, even after you've download everything via ivy.  This is my biggest complaint about the ivy/maven model.  I frequently want to do offline working, so I curse setups that require everything to be done via ivy/maven.</rant>

One way you can sift through all the noise of the output is to run this unit tests like this:
$ ant | grep Koan11

Then you will just get the output of Koan11 (though you have to run everything unless you want to edit the Ant build.xml file), like this:

$ ant | grep Koan11
    [junit] Running org.neo4j.tutorial.Koan11
    [junit] TEST org.neo4j.tutorial.Koan11 FAILED
    [junit] Tests FAILED


BUILD FAILED
/home/midpeter444/java/projects/neo4j-koans/neo4j-tutorial/build.xml:68: Build failed due to Koan failures

I also found context lines and simple pattern matching to help, such as:
ant | grep -C 4 Koan0[123]



/* ---[ Only Running the Tests You Want  ]--- */

As I said, there is no target in the ant file (or any helper scripts) to only run one test/koan at a time. And each one takes many seconds, so the whole thing can take well over a minute (depending on machine speed).

To do the easiest thing that would work, I issued the following command (you could do it with sed if you don't have perl installed):

find src/koan/java -name Koan*.java -print | xargs perl -pi -e "s/([@]Test)/\/\/\1/g"

It just comments out all the @Test annotations in the Koan files.  Run this once at the beginning. Then you can remove the comments from each test as you are working on them.  So the ant file stills runs all the koans, but they don't take very long if you haven't uncommented them.


/* ---[ Reading the error report output  ]--- */

Don't make the same mistake I did spending lots of time going through target/koan/reports/TESTS-TestSuites.xml output. I later found that a nice html report is provided one more directory down.  Open the target/koan/reports/output/index.html in your browser and refresh after each test - this is very nice!

So the cycle is:
1. Edit the Koan test until you are ready to run (uncomment that one's @Test annotation)
2. Run: ant | grep -C 4 Koan0[34]  (modify the numbers as needed)
3. Refresh target/koan/reports/output/index.html in your browser and refresh after each test.

Note also that if you debug by printing to stdout, there is a link on the index.html output to view it - use that as needed.

Finally, I wrote a little helper class that will print out all the properties and values of a Node - this is helpful in debugging. Here is the code:


package org.neo4j.tutorial;


import org.neo4j.graphdb.Node;


public class NodePP {
  public static String pp(Node n) {
    String s = "Node: ";
    for (String k: n.getPropertyKeys()) {
      s += String.format("\n  %s: %s", k, n.getProperty(k));
    }
    return s;
  }
}


Complaints aside, I'm currently finishing Koan08c and I recommend them as a good way to learn neo4j and think in terms of graphs. And so far, I really like the cypher query language....


[Dec 2011 Update:] I've finished them all now. Since I only broke down and cheated once by looking at the pre-filled in version (Koan11 was a doozy), it took me a while, but I feel like I have a very good sense of how to use neo4j now.  The koans provide great coverage of the approaches to using the database.


Neo4j is a very promising database and I think it is a serious player in the NoSQL space. The next question for me is how to use it with one's domain model in POJOs. I see four options:
  1. Serialize/deserialize POJOs to/from JSON and use the neo4j REST API. This will be less performant, but is the option if you are using a standalone database server.
  2. Make your POJOs Neo-aware - have them wrap Nodes and Relationship and keep their attributes in neo Node/Relationship properties. This obviously tightly couples your domain to your persistance layer.
  3. Use CQL (cypher query language) like you do SQL when not using an ORM. Cypher is very nice and well thought out. I wonder how hard it would be to construct a MyBatis-like mapper between Cypher and your POJOs.
  4. Use the Spring Data Neo4j annotation bindings for neo4j. This looks promising. I've started looking at it, but no strong opinion yet. They do say that it will be less performant than directly using the direct neo4j API (such as the Traversal API), as there is metaprogramming (Java Reflection API usage) going on.