Catching Up to the Conversation: Code Spelunking in GitEvan Travers
Code as a Conversation
Have you ever joined a conversation a little too late... and made the horrifying mistake of assuming you heard correctly what the conversation was about? You bounce up to a group of your friends, with a hilarious quip based on what you heard... then watch in horror as your joke bleeds on the floor. They weren't talking about that at all... nearly the opposite. There's almost no recovery. You have no choice. You just have to change your identity and move to a different state. Just me? Ok. Imagine my pain, and bear with me.
Code isn't static, it moves and breathes with product decisions, tech changes, and language shifts. It is the result of a conversation of commits and decisions that lead up to this point.
We can make almost the same mistake in code that we can talking with our friends... we make assumptions about definitions and their meaning, and assume that we fully understand the conversation. We often dive in without spending a little time listening to the conversation in the code and git history before we begin to speak.
I've spent most of my career staring at legacy code... and I was showing a friend of mine some of the tricks I've picked up over the years to help me deal with the complexities of joining the conversation in other people's code. I figured it'd be helpful to post a list.
git pickaxe (
git log -S)
The git pickaxe is a tool I always forget... but is incredibly useful. In
short, the pickaxe (so called because of the resemblance of a
-S to a
pickaxe) will search through all the diffs in history for occurrences of a
string. This means you can read the cliff's notes (of all the commits) that
affected this string all the way to the Dawn of Git Init. This can be a great
way to learn who might be an expert, who has touched this last, what else it
touches and where... useful stuff. Also handy for finding renames, although
that requires a little more work.
git follow (
git log --follow)
A feature I learned about recently is that git log has a
that will follow file renames and moves. I used to follow history back in time
until a commit that was clearly a rename/refactor, and then use
git log -- <filename> to chase the original filename further back in time...
now I can just use
git log --follow <filename> it is a lot easier.
git blame is my favorite tool for listening in to the conversation that
happened on this source code before you. You can use it to walk back and
forwards in time to understand a little more about what is going on. You may
not discover why the code you are looking at is broken, but you may learn how
it came to be in this state.
git blame at its lowest level shows you each
line in the file and the commit sha1 and commit title for each one. At very
git blame allows you to recognize who is part of the conversation
most recently. It can be very helpful to know who touched this part of code
last, and how long ago that change occurred.
git blame really becomes useful for me is when it is given super powers
by some other tool for extra-magic code exploration.
You can do this in github's blame view as well...there's a little icon that looks like layers that allows you to "reblame" the file at that point in history. I use this to chase a piece of logic or definition back in time, to the very first moment it appeared in the code base. This can be very handy to watch the code evolve and change. Sometimes I validate hunches about the validity of the code, sometimes I discover that someone already tried and failed the approach I'm considering attempting. More often than not, as I chase the logic back through time, I also find that it was named something different at one point in the past, and I realized that this code was attached to some other piece of business logic that I never realized. All this can change and inform my approach, with just a little bit of scanning and listening to the existing conversation.
Being a vim user, I use vim-fugitive by Tim Pope in
vim for exploring history... I can type
:Gblame and not only see who last
touched my file, but I can even dig backwards through history, by pressing
<CR> on the SHA in the left split. It's very handy to be able to do this in my
editor right in my workspace. There are similar plugins for VSCode, Emacs, you
name it, it is worth the time to add that plugin and learn how your particular
If you can't find a suitable one for your editor, a lot of graphical git interfaces (including tig, a favorite of mine!) have really cool interfaces for crawling backwards and forwards through history. Just released, sublime merge appears to have such functionality in a very nice interface.
Go To Definition
This is kind of unrelated to the main thought of using
git to explore the
code base, but I think it's still very helpful. A lot of editors have
functionality that lets you search for a keyword under the cursor, and call
that "jump to definition." I'm not talking about that. I'm talking about a
true, context sensitive, jump to definition. I use universal
ctags in vim to accomplish this,
although if you are using a more IDE-esque editor you can do this in several
ways. The key thing I wanted to mention is the workflow of using the shortcut
to dive into a definition, read through, and back out. In vim, you can
accomplish this by going to tag (
<C-]>), and then jumping out (
the previous location in the jumplist. This lets you swim easily through the
whole logic of a function, regardless of where the flow takes you, kind of
like stepping through a debugger. I use it a lot when understanding the
context of a file in code review or just when I'm working through some code.
Joining the Conversation
You may have seen a pile of rocks like this during a walk in a woodsy park. It has grown in popularity thanks to instagram and pinterest... but the original concept is to raise a "cairn" as a trail marker. Its a message to those who follow. If your cairn has a trail or rocks leading off a direction... then you now know the direction the "author" has walked. Find three piles nearby? Someone needs help, or wants you to beware.
Once you start traversing the trails and history in your codebase, you'll learn to appreciate and read the more helpful markers that others have left... and if you are like me you will be inspired to be more thoughtful about the markers we leave behind in our code.
Although the author's personal behavior is in question at the moment, he has left a great example of a commit that explains not what the author did, but why.
He wasn't content to just say what he did, he knows that's what the diff says. He used the description to explain to anyone walking the same trail why he made this choice, so that we have direction on our own paths. That's a helpful commit. A good example for us as we contribute to history and add our voice to the conversation, and make someone in the future's life a little easier as they use these tools to try and understand the code conversation that has gone before.
2022-06-08 11:31:29 -0500Rename articles
2020-06-18 14:26:02 -0500Move everything to CST
Don't know why I didn't do that before. It caused _no_ end of
2019-11-14 19:33:21 -0600Auditing the tags in the site...
Many removed, cleaned up, or renamed.
Tags with only one child got yanked.
2018-10-02 13:23:58 -0500Adjust tags in posts
2018-10-02 09:18:57 -0500Write quick post about pomodoro
2018-09-28 13:51:23 -0500Adding code spelunking blog post