Sunday, April 8, 2012

Git as RCS: The Joy of Personal Incremental Savepoints


/*---[ Distributed Version Control: The Best of Both Worlds ]---*/


"The enjoyment of one's tools is an essential ingredient of successful work."
— Donald E. Knuth

Not long after I first learned to program and gain enough proficiency with emacs to start to enjoy it, the person who was closest to what I would call a mentor told me that my next step was to learn to use a version control system. Having gotten me started as an emacs user, he suggested that I use RCS: Revision Control System from GNU.

RCS was developed in the 1980s. It is "local-only" - it has no concept of a central server. It also works on single files, not a project of files. It's been a while since I used RCS and I never really deeply mastered it, but I used it as a tool to make sure I had "version backups" of important files that I was working on. I never used it as part of a team.

If I have my history right, CVS, in the open source world, was the next generation version control system after RCS. CVS built a model where you could centrally share version controlled files with members of a team and think in terms of projects, not just individual files. Of course I'm ignoring lots of other version control systems, mostly proprietary. I'm just giving a little history based on my own personal history, as the next thing I learned after RCS was CVS (and then a very brief descent into ClearCase and then SVN and now git).

Nevertheless, with RCS I remember well a sort of thrill at the idea of having this local, personal manager of my history of edits. It could be used as a set of savepoints on the way to completing a program. CVS always felt more heavyweight (which sounds insane compared to ClearCase and some other tools), because it wasn't local. I couldn't put my finger on why back then, but now that I've used git I have achieved the next rung on the path to software guru enlightenment (... is there a special badge for that on coderwall?).

Git has been my first experience with a distributed version control system. Soon after trying it, the primary emotion I experienced was that initial joy of my brief RCS-days. I have this wonderfully powerful intelligent version control system and I can run it completely locally: standalone, off-the-network doing all the things RCS used to do and far far more. All my personal savepoints are back. I can use it as a local lever to help try new things, back out of messes and alter history as appropriate before I push it to some public or shared repo.

This is not possible with the CVS/SVN-type model. There everything you commit is immediately shared. Worse, if your team has a rule that you can't publish to the repo partial code that breaks the build, you can't do any local savepoints. It's analogous to being in a video game where you can only save your progress after you finish some level, rather than incrementally at any point you choose. Perhaps that's an intended extra challenge of some video games, but we need to have our version control system make software development easier, not harder.


/*---[ Lots of small savepoints that git calls commits ]---*/

If you come from the mindset of a non-distributed version control system, like SVN, a commit is a serious things. You are publicly pushing your changes to a central repo off of which (ideally) some automated Continuous Integration tool pulls your changes and runs unit tests, static code analysis tools, perhaps functional tests and even performance tests. You'd better have your act together when you commit or there will be finger-pointing!

A commit in git is far less ominous. It just means I'm saving my work to my local repo, just like saving my progress in that video game. The git equivalent of an SVN-commit is a push. This is the beauty of git. Purists would disagree, but if you are learning git, it can be helpful to think of it as both RCS and CVS combined together with a much more intelligent tree (DAG) based view of commits.


/*---[ Using git rebase, amend and reset to alter local history ]---*/

I struggle with CSS. In fact, one of my all-time favorite blog posts is from Zed Shaw on the joys he experiences with CSS. For fun, I'll quote my favorite passage (with a bit of editing of Zed's flowery language):

My first problem with CSS is simply that it just never does what you tell it to. I say, "make this a column CSS" and it goes, "What? No that should go over here totally on the left and [deleted] you I like apples." I say, "make this fill all of the parent div" and CSS says, "Sharks love tiny needles, and no that will only take up the top part." I say, "Hey, CENTER THIS" and CSS says, "My shoes have centered worms but your heading will stay to the left."

In the battle with CSS, git is your friend. Any time you achieve a small success and get what you want, commit it to your local repo and put in lots of exclamation points of celebration or cursing in the commit message, because later you can alter this history (more on that later). Now with the save point, you can start the next salvo against CSS. At some points, you may be in so far over your head that the only sound tactic is full-scale retreat. git reset is your path back to sanity. Or you can make a branch and try some insane experiment that just might work. Branches are cheap and easy in git.

Another great value of doing lots of little commits is that not only is backing out with a reset not a big deal, but you can do a git diff to see only the very few changes you've recently made and quickly grok the problem. This can help you decide whether to trudge on and change what you have or back out entirely.

In the heat of battle, you might make 20 commits. That's going to look a little messy and honestly just a bunch of repo noise when you finally push that out. git commit --amend and git rebase stand ready to save the day again.


/*---[ A short tutorial on using git "RCS-style" ]---*/

If you haven't used git or these features of git, this might sound intriguing, but how do I actually do all that? Below I present a tutorial to illustrate. For this I assume you have a basic understanding of how git works. If you are new to git or want more information there are plethora of good git books and web sites. Start with these:

To start, I have a very simple git repo with an initial commit of my html and css file:

$ tree
.
|-- css
|   |-- style.css
|-- main.html

$ git log
commit 1fdf8ad40a67140803eea81effaee4a4fabf29f6
Author: Michael <blahblah@gmail.com>
Date:   Sun Apr 1 19:50:23 2012 -0400

    Init. CSS Design in Progress

Now, I open main.html in the browser and style.css in emacs. Fiddle fiddle fiddle. The changes work reasonably well and I commit - only after a few minutes.

$ git add .
$ git commit -m "Intmd commit: Changed width of main divs to 600.
$ git log    # just to show you the messages
commit 27fcb7fef6a2ff007f9ba313db9574d7d0036be9
Author: Michael <blahblah@gmail.com>
Date:   Sun Apr 1 19:52:35 2012 -0400

    Intmd commit: Changed width of main divs to 600.

commit 1fdf8ad40a67140803eea81effaee4a4fabf29f6
Author: Michael <blahblah@gmail.com>
Date:   Sun Apr 1 19:50:23 2012 -0400

    Init. CSS Design in Progress

If I know that I'm going to subsume this commit into one larger commit, then the commit message will be changed later. It's always smart to say what and why you did what you did, but I also often put in "intermediate commit" so I can see that I didn't intend on keeping this little savepoint.

So I continue and make three more commits over the next few minutes. Here's the log after that:

$ git log    
commit 4e4666204e2a82c257e1fdcc5bb5ab5abdf4ca56
Author: Michael <blahblah@gmail.com>
Date:   Sun Apr 1 19:54:38 2012 -0400

    Intmd commit: header: has bottom border, padding decent now.

commit ac05681b6baff0ec3442f0880a00e31f98439241
Author: Michael <blahblah@gmail.com>
Date:   Sun Apr 1 19:53:56 2012 -0400

    Intmd commit: Header region: so far width + height good.

    Set background color to white - may want to adjust later.

commit 27fcb7fef6a2ff007f9ba313db9574d7d0036be9
Author: Michael <blahblah@gmail.com>
Date:   Sun Apr 1 19:52:35 2012 -0400

    Intmd commit: Changed width of main divs to 600.

commit 1fdf8ad40a67140803eea81effaee4a4fabf29f6
Author: Michael <blahblah@gmail.com>
Date:   Sun Apr 1 19:50:23 2012 -0400

    Init. CSS Design in Progress

Now that the basic layout is done, I start the hard stuff. One smart move here would be to make a new branch and futz around on that. If I don't like it, just move back to the main ("master") branch and delete the branch that didn't work. But since this isn't a tutorial on git branching, I'll stick to the master branch.

So on master I get to it. [Work, work, curse, futz, work ~~CSS pain~~]. OK, this isn't working. Let's dump this and get back to where we were. For that we will use git reset. There are actually three flavors of git reset, just to make life interesting.

The three flavors are:

  1. git reset --soft
  2. git reset (same as git reset --mixed)
  3. git reset --hard

Which one do we want? To explain the difference takes a full blog post on its own. You have to understand all the "trees" in git of which there are three (actually there are three locally and at least a fourth one if you have a remote, often centralized, repo).

But here is my simplified explanation. After reading Scott Chacon's blog (the one above) and reading some of the comments to the post, I have set up the following aliases for reset:

uncommit = reset --soft HEAD~
unstage = reset HEAD

git uncommit (my alias for git reset --soft HEAD~) means to move HEAD from pointing at the last commit I did, to its parent (the commit before it). Thus, "undo the last commit to the repo" or "uncommit".

git unstage (my alias for git reset HEAD) means to make the files in the index (staging area) look like what was in the last commit. So if I've done git add to add new changes/files to the index and then I run git unstage, those changes are removed from the staging area and the index now looks like the previous commit (which is what HEAD points to).

Neither of the above commands affect my working directory. To do that I have to issue git --hard reset, which I leave unaliased, since the "hard" should remind me that I am about to overwrite my working directory with the files from HEAD the last commit. (git --hard reset is the same as git --hard reset HEAD: HEAD is assumed if you don't provide an argument to reset.)

Only git reset --hard is destructive and will cause you to lose work, since it will overwrite your working directory, so use this command carefully.

So, which one do I want here? In my case working with the CSS, since I've been keeping savepoints every few minutes, throwing away the last few minutes of non-productive work is a blessing. I can I always go back to the previous savepoint, just like in a video game where if my player dies or loses too many energy points trying to get past some hurdle, I can just hit the reset button and try again.

So we are going to hit the reset button "hard":

$ git status -s
M css/style.css
$ git reset --hard
HEAD is now at 4e46662 Intmd commit: header: has bottom border, padding decent now.
$ git status -s

The second time I run git status -s it shows no output because the working directory has been reset to look like the previous commit.

In this case, because I was only dealing with one file and I had not yet staged it with git add ., I could have also done git checkout -- css/style.css and it would have checked out the last version of style.css from the index. That's the key thing to remember about git checkout - it pulls from the index, not the commit repo.


/*---[ Rolling up lots of little commits ]---*/

So now, suppose after an hour of toil and labor with CSS, I have something close to what I hoped for. In the process I've done 9 commits, with lots of "Intmd commit" prefixes in the comments. I want to remove those little commits and roll them into one big commit before I push them to the remote repo and make them "public" (or least "shared").

Here's an abbreviated view of all 9 commits using:

$ git reflog
74f7e9a HEAD@{0}: commit: Like how it looks now. Ship it!
14efb50 HEAD@{1}: commit: Much pain, but progress: header.css along with adjustments to main.html.
02ed6fc HEAD@{2}: commit: Intmd: header is shaping up. More to do.
85e8dcb HEAD@{3}: commit: Split out header styles from style.css into header.css.
3ab3101 HEAD@{4}: commit: Intmd commit: Got buttons styled well now.
4e46662 HEAD@{5}: commit: Intmd commit: header: has bottom border, padding decent now.
ac05681 HEAD@{6}: commit: Intmd commit: Header region: so far width + height good.
27fcb7f HEAD@{7}: commit: Intmd commit: Changed width of main divs to 600.
1fdf8ad HEAD@{8}: commit (initial): Init. CSS Design in Progress

I'd like to rollup the last 8 commits, so I do:

$ git rebase -i HEAD~8

or I can specify a particular SHA-1 commit id. Here I choose the SHA of the initial commit:

$ git rebase -i 1fdf8ad

I asked for an interactive rebase (-i), so it opens an editor with all the commit messages from those 8 commits:

pick 27fcb7f Intmd commit: Changed width of main divs to 600.
pick ac05681 Intmd commit: Header region: so far width + height good.
pick 4e46662 Intmd commit: header: has bottom border, padding decent now.
pick 3ab3101 Intmd commit: Got buttons styled well now.
pick 85e8dcb Split out header styles from style.css into header.css.
pick 02ed6fc Intmd: header is shaping up. More to do.
pick 14efb50 Much pain, but progress: header.css along with adjustments to main.html.
pick 74f7e9a Like how it looks now. Ship it!

# Rebase 1fdf8ad..74f7e9a onto 1fdf8ad
#
# Commands:
#  p, pick = use commit
#  r, reword = use commit, but edit the commit message
#  e, edit = use commit, but stop for amending
#  s, squash = use commit, but meld into previous commit
#  f, fixup = like "squash", but discard this commit's log message
#  x, exec = run command (the rest of the line) using shell
#
# If you remove a line here THAT COMMIT WILL BE LOST.
# However, if you remove everything, the rebase will be aborted.

What this is asking is for you edit the commands in front of each commit. To merge (roll up) an intermediate commit into the previous one, change it to "squash". The key to making this work is to squash all the commits except the first one listed. If you squash them all, you will not get what you expect. Here is the rebase editor screen after I squash ("s") all but the first one listed:

pick 27fcb7f Intmd commit: Changed width of main divs to 600.
s ac05681 Intmd commit: Header region: so far width + height good.
s 4e46662 Intmd commit: header: has bottom border, padding decent now.
s 3ab3101 Intmd commit: Got buttons styled well now.
s 85e8dcb Split out header styles from style.css into header.css.
s 02ed6fc Intmd: header is shaping up. More to do.
s 14efb50 Much pain, but progress: header.css along with adjustments to main.html.
s 74f7e9a Like how it looks now. Ship it!

# Rebase 1fdf8ad..74f7e9a onto 1fdf8ad

Now I save. Git does the rebase and then pops up another editor window that says:

# This is a combination of 8 commits.
# The first commit's message is:
Intmd commit: Changed width of main divs to 600.

# This is the 2nd commit message:

Intmd commit: Header region: so far width + height good.

Set background color to white - may want to adjust later.

# This is the 3rd commit message:

Intmd commit: header: has bottom border, padding decent now.

# This is the 4th commit message:

Intmd commit: Got buttons styled well now.

# This is the 5th commit message:

Split out header styles from style.css into header.css.

# This is the 6th commit message:

Intmd: header is shaping up. More to do.

# This is the 7th commit message:

Much pain, but progress: header.css along with adjustments to main.html.

# This is the 8th commit message:

Like how it looks now. Ship it!

# Please enter the commit message for your changes. Lines starting
# with '#' will be ignored, and an empty message aborts the commit.
# Not currently on any branch.
# Changes to be committed:
#   (use "git reset HEAD <file>..." to unstage)
#
#   new file:   css/header.css
#   modified:   css/style.css
#   modified:   main.html

This is your chance to modify the commit message to the one "squashed" single commit you just created. Here's how I edited the commit message:

Like how it looks now. Ship it!

Header region: so far width + height good.
Set background color to white - may want to adjust later.
header: has bottom border, padding decent now.
Got buttons styled well now.
Split out header styles from style.css into header.css.

# Please enter the commit message for your changes. Lines starting
# with '#' will be ignored, and an empty message aborts the commit.
# Not currently on any branch.
# Changes to be committed:
#   (use "git reset HEAD <file>..." to unstage)
#
#   new file:   css/header.css
#   modified:   css/style.css
#   modified:   main.html

After the rebase finishes I get the message:

 3 files changed, 19 insertions(+), 3 deletions(-)
 create mode 100644 css/header.css
Successfully rebased and updated refs/heads/master.

Now when I run git log, you see that the savepoints I made have all been squashed into one master commit:

$ git log
commit f52b1c2ba8543c999ba00e862af829e440f0b027
Author: Michael <blahblah@gmail.com>
Date:   Sun Apr 1 19:52:35 2012 -0400

    Like how it looks now. Ship it!

    Header region: so far width + height good.
    Set background color to white - may want to adjust later.
    header: has bottom border, padding decent now.
    Got buttons styled well now.
    Split out header styles from style.css into header.css.

commit 1fdf8ad40a67140803eea81effaee4a4fabf29f6
Author: Michael <blahblah@gmail.com>
Date:   Sun Apr 1 19:50:23 2012 -0400

    Init. CSS Design in Progress

Now I'm ready to push it to the central or community repo (such as GitHub).


/*---[ Caveats ]---*/

A caution about rebase - it is very powerful and used for more things than shown here. If you do it wrong, you can create a bit of mess to clean up (but you can always recover, just don't panic). But there are two things to be very careful of:

  1. Don't delete commits from the rebase message message when doing interactive rebase. Note the message in the editor during an interactive rebase that says "# If you remove a line here THAT COMMIT WILL BE LOST."
  2. Do not rebase past where you have already pushed to the central or community repo. If I've done 10 commits, and only the last 2 are local, then don't rebase farther back than the last two commits.

Lastly, I recommend you read the Rewriting History chapter of Pro Git in order to be clear on what you are doing with rebase.

Once you get comfortable with it, rebase gives you freedom to use git with any level of mini-savepoints you want and rewrite your local history to reduce repo noise. Here is a good example of where a version control system is giving you freedom, rather than restricting it.

[Update: 08-Apr-2012]: On Hacker News today there is a link to an article about the "many faces of git rebase". Worth reading in combination with my post to get a feel for how rebase can be used.