Catching Up to the Conversation: Code Spelunking in Git

Code as a Conversation

Have you ever joined a conversation a little too late... and made the horrifying mistake of assuming you heard correctly what the conversation was about? You bounce up to a group of your friends, with a hilarious quip based on what you heard... then watch in horror as your joke bleeds on the floor. They weren't talking about that at all... nearly the opposite. There's almost no recovery. You have no choice. You just have to change your identity and move to a different state. Just me? Ok. Imagine my pain, and bear with me.

Code isn't static, it moves and breathes with product decisions, tech changes, and language shifts. It is the result of a conversation of commits and decisions that lead up to this point.

We can make almost the same mistake in code that we can talking with our friends... we make assumptions about definitions and their meaning, and assume that we fully understand the conversation. We often dive in without spending a little time listening to the conversation in the code and git history before we begin to speak.

I've spent most of my career staring at legacy code... and I was showing a friend of mine some of the tricks I've picked up over the years to help me deal with the complexities of joining the conversation in other people's code. I figured it'd be helpful to post a list.

git pickaxe (git log -S)

The git pickaxe is a tool I always forget... but is incredibly useful. In short, the pickaxe (so called because of the resemblance of a -S to a pickaxe) will search through all the diffs in history for occurrences of a string. This means you can read the cliff's notes (of all the commits) that affected this string all the way to the Dawn of Git Init. This can be a great way to learn who might be an expert, who has touched this last, what else it touches and where... useful stuff. Also handy for finding renames, although that requires a little more work.

git follow (git log --follow)

A feature I learned about recently is that git log has a --follow option that will follow file renames and moves. I used to follow history back in time until a commit that was clearly a rename/refactor, and then use git log -- <filename> to chase the original filename further back in time... now I can just use git log --follow <filename> it is a lot easier.

git blame

git blame is my favorite tool for listening in to the conversation that happened on this source code before you. You can use it to walk back and forwards in time to understand a little more about what is going on. You may not discover why the code you are looking at is broken, but you may learn how it came to be in this state. git blame at its lowest level shows you each line in the file and the commit sha1 and commit title for each one. At very least, git blame allows you to recognize who is part of the conversation most recently. It can be very helpful to know who touched this part of code last, and how long ago that change occurred.

Where git blame really becomes useful for me is when it is given super powers by some other tool for extra-magic code exploration.

You can do this in github's blame view as well...there's a little icon that looks like layers that allows you to "reblame" the file at that point in history. I use this to chase a piece of logic or definition back in time, to the very first moment it appeared in the code base. This can be very handy to watch the code evolve and change. Sometimes I validate hunches about the validity of the code, sometimes I discover that someone already tried and failed the approach I'm considering attempting. More often than not, as I chase the logic back through time, I also find that it was named something different at one point in the past, and I realized that this code was attached to some other piece of business logic that I never realized. All this can change and inform my approach, with just a little bit of scanning and listening to the existing conversation.

Being a vim user, I use vim-fugitive by Tim Pope in vim for exploring history... I can type :Gblame and not only see who last touched my file, but I can even dig backwards through history, by pressing <CR> on the SHA in the left split. It's very handy to be able to do this in my editor right in my workspace. There are similar plugins for VSCode, Emacs, you name it, it is worth the time to add that plugin and learn how your particular tool works.

If you can't find a suitable one for your editor, a lot of graphical git interfaces (including tig, a favorite of mine!) have really cool interfaces for crawling backwards and forwards through history. Just released, sublime merge appears to have such functionality in a very nice interface.

Go To Definition

This is kind of unrelated to the main thought of using git to explore the code base, but I think it's still very helpful. A lot of editors have functionality that lets you search for a keyword under the cursor, and call that "jump to definition." I'm not talking about that. I'm talking about a true, context sensitive, jump to definition. I use universal ctags in vim to accomplish this, although if you are using a more IDE-esque editor you can do this in several ways. The key thing I wanted to mention is the workflow of using the shortcut to dive into a definition, read through, and back out. In vim, you can accomplish this by going to tag (<C-]>), and then jumping out (<C-O>) to the previous location in the jumplist. This lets you swim easily through the whole logic of a function, regardless of where the flow takes you, kind of like stepping through a debugger. I use it a lot when understanding the context of a file in code review or just when I'm working through some code.

Joining the Conversation

You may have seen a pile of rocks like this during a walk in a woodsy park. It has grown in popularity thanks to instagram and pinterest... but the original concept is to raise a "cairn" as a trail marker. Its a message to those who follow. If your cairn has a trail or rocks leading off a direction... then you now know the direction the "author" has walked. Find three piles nearby? Someone needs help, or wants you to beware.

Once you start traversing the trails and history in your codebase, you'll learn to appreciate and read the more helpful markers that others have left... and if you are like me you will be inspired to be more thoughtful about the markers we leave behind in our code.

Although the author's personal behavior is in question at the moment, he has left a great example of a commit that explains not what the author did, but why.

He wasn't content to just say what he did, he knows that's what the diff says. He used the description to explain to anyone walking the same trail why he made this choice, so that we have direction on our own paths. That's a helpful commit. A good example for us as we contribute to history and add our voice to the conversation, and make someone in the future's life a little easier as they use these tools to try and understand the code conversation that has gone before.

References:


🔖
Changelog
  • 2022-06-08 11:31:29 -0500
    Rename articles

  • 2020-06-18 14:26:02 -0500
    Move everything to CST

    Don't know why I didn't do that before. It caused _no_ end of
    problems.

  • 2019-11-14 19:33:21 -0600
    Auditing the tags in the site...

    Many removed, cleaned up, or renamed.

    Tags with only one child got yanked.

  • 2018-10-02 13:23:58 -0500
    Adjust tags in posts

  • 2018-10-02 09:18:57 -0500
    Write quick post about pomodoro

  • 2018-09-28 13:51:23 -0500
    Adding code spelunking blog post