Migrating a Decade of Plaintext to Zettelkasten

zettelkasten icon

I’m exploring zettelkasten-esque Creativity Systems. I am attempting to “own” my own data by building out the networked notes in markdown. I’m a vim fan, but I’m looking at other tools too.

tl;dr: This is going to be a code-heavy post. Use a repeatable script, take advantage of good programming practices to control the data going into your system and normalize it. Borrow liberally from me if that helps. 🙏

I think I’m done with an eight month-long struggle to import all my previous plaintext notes into the Simple Markdown Zettelkasten format. It was not easy… but I thought I’d share some of the lessons that I learned and tricks that I used.

I’ve been playing around with Zettelkasten style note-taking for a while now… and I’m pretty convinced that having everything in one place is the right path for me for now.

Abbreviated version of my plaintext adventures to this point…

  1. Notepad documents from Windows and Ubuntu computers.
  2. Plain text on an Android Dev-1 phone… some terrible text editor.
  3. nvALT and Simplenote for years.
  4. Daily devotional note on iPhone using Byword that got so long it routinely crashed my phone.
  5. Vimwiki diary and wiki for five years at my last job.
  6. Experimental zettelkasten using the MMD metadata.

There’s some duplication, some craziness… and in the end it’s 2,068 files. Let’s get to work.

Use a script.

Just like any large scale migration, using a script has huge advantages over a series of manual “find and replace.” If you write it correctly, it’s repeatable, and you’ll get the same results each time, which means that if you get more data to migrate, or change your mind about a format… you change the script and run it again. I changed my mind half a dozen times while I was working on this script, and I was actively using the zettelkasten as I went.

For instance… while working on the script, Obsidian released the aliases feature that lets backlinks work with Zettelkasten notes… I just adjusted my script and kept on rolling.

(Also, running the script while having Obisdian open on graph mode made for a really cool building animation that was definitely fun!)

I did flex some computer science muscles, and defined a class with a bunch of (inconsistently named) convenient methods and a standardized metadata format.

require 'Zettel'

zettel = Zettel.new

zettel.add_tag('knowledge')
zettel.set_meta('title', 'Zettelkasten is cool!')
zettel.content = "This note is useful, but short."
zettel.set_meta('id', '20201228369')

# this would render as:
"""
---
title: 'Zettelkasten is cool!'
tags: ['#knowledge']
id: 20201228369
---

This note is useful, but short.
"""

(using zettel.rb)

The code is not pretty… you can rip it off, but don’t look too closely. 🤣

Having a single code-path to make a zettel makes the next step easy…

Adjust your script per source.

I wrote a customized script per “source” that I listed in my history above. Each one has details specific to that system…

My nvALT notes have titles in their filenames:

title = File.basename(entry, ".*")

(from migrate.rb)

But, vimwiki diary entries have <h2>s for some reason:

title = content.scan(/^#+ .*/).first.to_s.gsub(/^#+ /, "")

(from migrate.rb)

This kind of flexibility is super handy.

Guarantee unique IDs.

This is critical… and hard. I used a Set in Ruby like I have in the past, and it worked pretty well. I added a bunch of instrumentation to help me troubleshoot.

class Migrator
  DST = '../../wiki/'

  def initialize
    @unique_ids = Set.new
  end

  # [ ... ]

  def ensure_uniqueness(zettel)
    if @unique_ids.include? zettel.id then
      # we've got a dup
      puts "🔴 Duplicate ID! #{zettel.title}: #{zettel.id}"

      zettel.set(:id, zettel.id.to_i + 1)
      puts "trying #{zettel.id}..."
      self.ensure_uniqueness(zettel)
    else
      if zettel.id.size > 12 then
        puts "#{zettel.id} 🟡 ID is too long! #{zettel.title}"
      elsif zettel.id.size < 12 then
        puts "#{zettel.id} 🟡 ID is too short! #{zettel.title}"
      end
      @unique_ids << zettel.id
    end
  end
end

(from migrate.rb)

Filter tags in place.

I also used the Zettel class to change and filter data going into the knowledge system graph:

def add_tag(tag)
  if tag then # not nil
    tag = tag
          .gsub("#", "")
          .gsub(" ", "_")

    if
      !tag.match(/\d+-?/) && # not just a number
      tag.size > 1 # not a single character
    then

      # blacklist
      unless [
        'responsible',
        'todos_archived',
        'wizardsetupstep',
        'tags',
        'id',
        'create_tickets',
        # [ ... ]
        'todo',
      ].include? tag then

        tag =
          case tag
          when 'pmux'
            'ux'
          when 'interview'
            'career'
          when 'project'
            'projects'
          when 'book'
            'books'
          # [ ... ]
          else
            tag
          end

        if @meta[:tags] then
          @meta[:tags].push(tag)
        else
          @meta[:tags] = [tag]
        end
      end
    end
  end
end

(from zettel.rb)

By having a single “point of contact” with the Zettel class, I was able to filter out or modify tags as they went into the final product.

Use the internet to supplement metadata.

Probably the cleverest thing I did was to grab data from Google Books to add metadata to my booknotes.

filename = File.basename(booknote)
query  = filename.gsub(/\..{2,3}$/, '')
title  = query.split(" by ").first.strip
author = query.split(" by ").last.strip

books =
  JSON.parse(Net::HTTP.get_response(
    URI("https://www.googleapis.com/books/v1/volumes?q=#{URI.encode(query)}")).body)

book = books["items"].min_by do |b|
  levenshtein_distance(b["volumeInfo"]["title"], title)
end["volumeInfo"]

zettel.set(:title, title)
zettel.set(:subtitle, book['subtitle'])
zettel.set(:author, book['authors'].join(', ')) # close enough to MLA
zettel.set(:publisher, book['publisher'])
zettel.set(:identifer, book["industryIdentifiers"][0]["identifier"])

(from migrate.rb)

I’m pretty proud of using using the Levenshtein distance to choose the most likely search result. It works fine for me, and my “database” has a little more information.

Go do it!

I’m really happy that I’m done, and I’ve got the script if I ever discover something isn’t quite out of whack. I can now start using my knowledge system fearlessly… except I have to finish writing my Drafts actions.


Changelog
  • 2020-12-03 04:28:03 +0000

    Add example

    Using a script is an awesome advantage.

  • 2020-12-03 04:20:44 +0000

    New post: ZK Migration