
Migrating a Decade of Plaintext to Zettelkasten

I'm exploring zettelkasten-esque Creativity Systems. I am attempting to "own" my own data by building out the networked notes in markdown. I'm a vim fan, but I'm looking at other tools too.
I now have an "evergreen" page for recommendations and links: Simple Markdown Zettelkasten
This is going to be a code-heavy post. Use a repeatable script, take advantage of good programming practices to control the data going into your system and normalize it. Borrow liberally from me if that helps. 🙏_
I think I'm done with an eight month-long struggle to import all my previous plaintext notes into the Simple Markdown Zettelkasten format. It was not easy… but I thought I'd share some of the lessons that I learned and tricks that I used.
I've been playing around with Zettelkasten style note-taking for a while now… and I'm pretty convinced that having everything in one place is the right path for me for now.
Abbreviated version of my plaintext adventures to this point…
- Notepad documents from Windows and Ubuntu computers.
- Plain text on an Android Dev-1 phone… some terrible text editor.
- nvALT and Simplenote for years.
- Daily devotional note on iPhone using Byword that got so long it routinely crashed my phone.
- Vimwiki diary and wiki for five years at my last job.
- Experimental zettelkasten using the MMD metadata.
There's some duplication, some craziness… and in the end it's 2,068 files. Let's get to work.
Use a script.
Just like any large scale migration, using a script has huge advantages over a series of manual "find and replace." If you write it correctly, it's repeatable, and you'll get the same results each time, which means that if you get more data to migrate, or change your mind about a format... you change the script and run it again. I changed my mind half a dozen times while I was working on this script, and I was actively using the zettelkasten as I went.
For instance... while working on the script, Obsidian released the aliases
feature that lets backlinks work with Zettelkasten notes... I just adjusted my script and kept on rolling.
(Also, running the script while having Obisdian open on graph mode made for a really cool building animation that was definitely fun!)
I did flex some computer science muscles, and defined a class with a bunch of (inconsistently named) convenient methods and a standardized metadata format.
require 'Zettel'
zettel = Zettel.new
zettel.add_tag('knowledge')
zettel.set_meta('title', 'Zettelkasten is cool!')
zettel.content = "This note is useful, but short."
zettel.set_meta('id', '20201228369')
# this would render as:
"""
---
title: 'Zettelkasten is cool!'
tags: ['#knowledge']
id: 20201228369
---
This note is useful, but short.
"""
(using zettel.rb)
The code is not pretty... you can rip it off, but don't look too closely. 🤣
Having a single code-path to make a zettel makes the next step easy...
Adjust your script per source.
I wrote a customized script per "source" that I listed in my history above. Each one has details specific to that system...
My nvALT notes have titles in their filenames:
title = File.basename(entry, ".*")
(from migrate.rb)
But, vimwiki diary entries have <h2>
s for some reason:
title = content.scan(/^#+ .*/).first.to_s.gsub(/^#+ /, "")
(from migrate.rb)
This kind of flexibility is super handy.
Guarantee unique IDs.
This is critical... and hard. I used a Set
in Ruby like I have in the past, and it worked pretty well. I added a bunch of instrumentation to help me troubleshoot.
class Migrator
DST = '../../wiki/'
def initialize
@unique_ids = Set.new
end
# [ ... ]
def ensure_uniqueness(zettel)
if @unique_ids.include? zettel.id then
# we've got a dup
puts "🔴 Duplicate ID! #{zettel.title}: #{zettel.id}"
zettel.set(:id, zettel.id.to_i + 1)
puts "trying #{zettel.id}..."
self.ensure_uniqueness(zettel)
else
if zettel.id.size > 12 then
puts "#{zettel.id} 🟡 ID is too long! #{zettel.title}"
elsif zettel.id.size < 12 then
puts "#{zettel.id} 🟡 ID is too short! #{zettel.title}"
end
@unique_ids << zettel.id
end
end
end
(from migrate.rb)
Filter tags in place.
I also used the Zettel
class to change and filter data going into the knowledge system graph:
def add_tag(tag)
if tag then # not nil
tag = tag
.gsub("#", "")
.gsub(" ", "_")
if
!tag.match(/\d+-?/) && # not just a number
tag.size > 1 # not a single character
then
# blacklist
unless [
'responsible',
'todos_archived',
'wizardsetupstep',
'tags',
'id',
'create_tickets',
# [ ... ]
'todo',
].include? tag then
tag =
case tag
when 'pmux'
'ux'
when 'interview'
'career'
when 'project'
'projects'
when 'book'
'books'
# [ ... ]
else
tag
end
if @meta[:tags] then
@meta[:tags].push(tag)
else
@meta[:tags] = [tag]
end
end
end
end
end
(from zettel.rb)
By having a single "point of contact" with the Zettel
class, I was able to filter out or modify tags as they went into the final product.
Use the internet to supplement metadata.
Probably the cleverest thing I did was to grab data from Google Books to add metadata to my booknotes.
filename = File.basename(booknote)
query = filename.gsub(/\..{2,3}$/, '')
title = query.split(" by ").first.strip
author = query.split(" by ").last.strip
books =
JSON.parse(Net::HTTP.get_response(
URI("https://www.googleapis.com/books/v1/volumes?q=#{URI.encode(query)}")).body)
book = books["items"].min_by do |b|
levenshtein_distance(b["volumeInfo"]["title"], title)
end["volumeInfo"]
zettel.set(:title, title)
zettel.set(:subtitle, book['subtitle'])
zettel.set(:author, book['authors'].join(', ')) # close enough to MLA
zettel.set(:publisher, book['publisher'])
zettel.set(:identifer, book["industryIdentifiers"][0]["identifier"])
(from migrate.rb)
I'm pretty proud of using using the Levenshtein distance to choose the most likely search result. It works fine for me, and my "database" has a little more information.
Go do it!
I'm really happy that I'm done, and I've got the script if I ever discover something isn't quite out of whack. I can now start using my knowledge system fearlessly... except I have to finish writing my Drafts actions.
Changelog
-
2022-06-08 11:31:29 -0500Rename articles
-
2022-06-07 15:27:31 -0500Adding Admonitions
-
2020-12-02 22:28:03 -0600Add example
Using a script is an awesome advantage.
-
2020-12-02 22:20:44 -0600New post: ZK Migration