Making a Book Using Ruby

When I realized my then-girlfriend was saving all her old cellphones to preserve our texts and had print-outs of all our emails… I thought I could surprise her and make an archival PDF of all our emails and texts.

And then some scope creep occurred… and I wound up writing the program that provided me the most joy and satisfaction of anything I've ever made.

MVP: A combined PDF.

My first thought was simply to use gmail to print all our emails, and to use ecamm's PhoneView to export a PDF of our texts, and then just use Preview.app to smush the two together.

This worked… but it doesn't read chronologically… first you get all the emails and then you see all the texts. It was all there, but it didn't read like a story.

PhoneView can export conversations in xml format. If I could get the emails in a readable format, I could sort them by their date stamp and be able to read our story in order!

The seed was planted, and scope grew… exponentially.

v2: A giant webpage.

I had to do some finagling… eventually found a roundabout way of downloading the emails. I used OSX's Mail.app to connect to my gmail via IMAP, and therefore downloaded all my email to my computer.

At first I was running this manually… eventually it became part of a makefile:

cd ~/Library/Mail/V3/[email protected]/ && ag -l '[email protected]' | xargs -I{} cp {} ~/src/bitbucket.org/evantravers/sarahsbook/src/story/email

Pretty quickly… I had a really basic ruby script that crawled the SMS-oriented xml file and email files, built some ruby objects, sorted by date, and printed them to an HTML file. I could read an email, and read the texts that were a response to the email. I could read our story!

Except… there were so many other artifacts that were mentioned in the messages… songs added to our shared Spotify list, posts on our instagram accounts… even my journal entries that Sarah hadn't read yet. What if I could connect all these different pieces of story and read our relationship right through time!

So I turned on lolcommits so I'd have a record of my working, and kept hacking away.

Lolcommits are seriously underrated... and eventually made it into the final book.

I wrote a simple class that all my different media classes could inherit from… it just implemented Comparable so that all different objects could display historically.

class Media
  include Comparable

  attr_accessor :date, :from, :to

  def <=>(media)
    if self.date > media.date
      return 1
    elsif self.date < media.date
      return -1
    else
      return 0
    end
  end
end

The heart of the story-building ruby script became quite simple looking... process all the sources of media, sort by time, and away we go!

def collect_the_story
  sms      = process_sms
  email    = process_emails
  notebook = process_images
  music    = process_music
  insta    = process_instagram

  history  = email + sms + notebook + music + insta

  history.sort!
end

Over time, the classes that inherited from Media fell into a semblance of order. There was some setup, some helper classes, a to_s method for debugging and a to_html method for writing itself into the "story." Here's what the Instapost class looks like, which wound up being one of the simplest.

require 'redcloth'
require 'htmlentities'
require_relative 'media'

class Instapost < Media
  attr_accessor :id, :message

  def initialize(date, id, message, author, likes)
    @date    = date
    @id      = id
    @message = message
    @author  = author
    @likes   = likes
  end

  def filtered_message
    message = Gemojione.replace_unicode_moji_with_images(@message)
      .gsub(/\uFE0F|\u{1F3FF}|\u{1F3FB}|\u{1F3FC}|\u{1F3FE}/, '')
    message = RedCloth.new(message).to_html
    HTMLEntities.new.decode(message)
  end

  def to_html
    %Q{
      <div class="media  instapost">
        <div class="instapost--author"><img class="instapost--profile" src="../src/story/instagram/#{@author}.png" alt="">#{@author}</div>
        <img class="instapost--image" src="../src/story/instagram/#{@id}.png">
        <div class="instapost--likes">#{@likes} likes</div>
        <div class="instapost--caption">#{filtered_message}</div>
      </div>
    }
  end
end

The Hardest Problem: Emails

Instagram, Spotify, lolcommits, and scans from my journal were relatively simple to import and include in the story. They had at least some kind of understandable API that I could query, organize, and import.

Emails… emails were hard. Sarah used multiple email accounts in our conversations. She replied from different phones and different operating systems.

Replies

The first big problem is that every most email replies include the full content of every email and attachment before it… and Sarah and I hit reply a lot. I guess we didn't want to let the email conversation die? Whatever.

After much travail and experimentation, (made easier thanks to the to_s method and pry) I wound up with this monstrosity.

bodyend = content.index(/On ([A-Z][a-z][a-z]).*wrote:|---------- Forwarded message ---------|Sent from my iPhone|> On.*wrote:|From:.*\]/)

# handle fwds?
next if bodyend == 0

if bodyend.nil?
  bodyend = content.length
else
  # this is to remove the first character. :P
  bodyend-=1
end
content = content[0..bodyend].rstrip!

The main trick was covering all the different ways our email client's represented the end of a reply and the beginning of the email to which we were replying. I pretty much kept adding to the regex as I read through our history, covering new case after new case.

However, replies weren't done with me yet.

Attachments

Attachments were worse. In terms of reading the story, the reader just cares about the first time an image was sent.

After a while, I came up with this.

# email_collection is all the emails and all their attachments
email_collection.sort!

# unique all the attachments by date
email_collection.each do |email|
  if !email.attachments.empty?
    email_attch[email.subject] ||= Set.new

    known_images = email_attch[email.subject]

    email.attachments.each do |attch|
      if known_images.include?(attch)
        email.attachments = email.attachments - [attch]
      end
      known_images << attch
    end
  end
end

return email_collection

I probably could have used some smarter code, but this made sure that only the first occurrence of an attachment made it to the final HTML. (This cut down several hundred pages worth of space!!!)

Intermission: Architecture

architecture

This thing is still a sprawling, very customized script, but there is some method to the madness.

There is basically a three step process… mostly driven by the makefile.

Get the data

This is a combination of PhoneView's SMS export, my makefile and ruby scripts for getting all the emails out of Mail.app's file store, and some simple ruby API calls to get data from Spotify and Instagram.

The scripts included things like imagemagick calls to clean up the scans of my journals and notebooks that I was photographing with my phone.

As you can see, it became an obsessive quest to capture the "whole story". I used scanbot to grab jpgs of my journals, then cropped manually to the sections about our relationship, and used mogrify -colorspace Gray -level 20%,75% -sharpen 0x1 *.jpg to normalize them.

Clean the data

The simple API calls produced very clean data, but the Email and SMS data is filled with duplications and gotchas. A lot of those "fixes" are found in at the end of this article.

Build the book

Now that we have data in a clean format, we can pull it in, sort it, and we are good to go!

All through this, I was honing the presentation of the "webpage" using CSS. I had written some BEM style classes to style everything, and it was really fun to emulate things like the iMessage format and the way Gmail displays emails.

The more I worked, the more excited I became. It was a really fun project.

I kept adding little features... date stamps that only appeared when there wasn't a new message for an hour, pink and blue bubbles for messages from me to her, new styling for attachments and emails... I was having a blast.

v4: Printing a program.

At this point, my goal was still a PDF that we could read on our phones. I could generate some HTML, style it with CSS, and print to PDF. While I was originally just going to use Chrome's Print to PDF, I had found Prince PDF.

While reading Prince's documentation, I found special "CSS" rules for preparing your PDF for professional printing. I started learning about bleed, about the weird ways I would have to re-order the pages in order to be printed on a professional printer…

What if I could hold this book? Time had been flying by, and I was starting to plan my proposal to Sarah… what if I could hand her a copy of our story on the day I ask her to marry me? Suddenly I had new goals, and a new deadline.

One problem… the book was over 900 pages long. I guess we texted a lot. :P

After some massaging the font size and making the messages run two columns per page, I was able to bring the page count under 600 pages, but that was still way out of range of most self-publishing options. Blurb and its ilk limits you to 200 pages… there isn't an amateur self-publishing option I could find that would do a huge, cookbook-sized 600 page book.

It's a monster book, here's an iPhone for scale.

So I turned to professional options. After searching and talking to a lot of folks I found BookBaby. I was desperate at this point… and was willing to pay for a run of 25 just to have one.

While talking with their team about what I was trying to do, they were incredibly kind. It turns out a lot of people like to help young people in love. When the first "proof" arrived damage, they replaced it for free and they even rushed production and overnighted it to me so that I'd have it in time for The Big Day.

Genevieve… I had chatted with Mike Taylor earlier and he had assured me that it would be shipping by today… I am proposing to my girlfriend the 29th, using this book… I waited on Mike to order it and now I'm a liiiiittle distraught at the moment. Is there any way to get it to my door before the 29th?

Evan


Genevieve (BookBaby)

Let me check with our production team for this. Hang tight and I'll follow up shortly. Thank you, Genevieve


Hi Evan - I just heard back from our production team. It looks like this order was actually able to ship out today. (woohoo!)

Here is the tracking number: 1ZA5R4620102748342

It might take a few hours to refresh (it's usually around 8PM EST that the tracking will update by if it's shipping out of the West Coast facility).

Take care and good luck! Thank you, Genevieve


Praise the Lord!! Thank you so much!

Mike, John, and Genevieve, if you ever read this, thank you.

Holding the hack

This is within seconds of me tearing the book out of its wrapping...

I will never forget the feeling of unwrapping the first proof and holding the results of my labor in my hands. It actually worked. It was beautiful. I had gotten all the weird printing-specific CSS and measured all the bleeds right, and now I was holding a hard-cover copy of my silly little love story, just in time to present it to Sarah.

book open

Just the other night we opened it again and read through it together, just to remind ourselves of what we were like four years ago. It'll hopefully be here to remind us of what God has done for years to come.

Piper's vision for the logo was effortless and perfect.

I owe so much to those who helped me with the book: friends like Tom and Ben that encouraged me and helped me debug some of the intricacies of dealing with email replies, the amazing Piper Weaver who designed an incredible logo for the cover, and the kind folks at BookBaby who overnighted a proof for a fictional run of books so that a tired programmer could propose.

It's a treasure.


Addendum: Hacks

It's been a while since I looked through this code… here are some weird gotchas I noticed while I was researching for this article.

I'm not proud of this. In my defense, I wasn't sleeping much.

I've seen this before

You know that thing when a text decides to send itself twice? Sarah's phone did that constantly for months, and it really made the book ugly. So…

# check for duplicates
thisisaduplicate = false
sms_collection.last(3).map do |prev_message|
  if content == prev_message.message && !content.empty? && prev_message.from == from
    thisisaduplicate = true
  end
end

if (!content.empty? || !attachment.nil?) && !thisisaduplicate
  sms_collection << Sms.new(date, from, to, content, attachment)
end

You can bring a plus one

PhoneView had a naive solution for handling duplicate SMS attachment filenames… it just incremented the last digit of the file name.

This means I had to handle this in my code.

if file.match(/jpg|jpeg|png|JPG/)
  # stupid test case
  # if the image has the same name, the SMS export
  # incremented the last digit of the file name

  if global_sms_attachments.include? file
    if !file.match(/-\d+?\./)
      f = file.split('.')
      f[0] += "-1."
      file = f.join
    end
    while global_sms_attachments.include? file
      file.gsub!(/(?!-)(\d+?)(?=\.)/) do |match|
        match.to_i + 1
      end
    end
  end

  attachment = file
  global_sms_attachments << file
end

Cringey.

Cheating on attachments

I had some emails that were missing attachments after import… but if I searched my whole computer for the filename, I'd find it in some random folder buried in the filesystem. I don't rightly know why… I suspect that Mail.app was caching them.

Rather than figure out what was going on... I wrote this. Not very performant.

if attch.body.nil? || attch.body.decoded == ""
  # time to cheat! Find the saved file in the mailbox
  cheated_file_path =
    Find.find('<my user path>/Library/Mail/V3/').select{ |f| /\/#{item.split('.').first}\/.*\/#{filename}/ =~ f }[0]
  # FileUtils.cp(cheated_file_path, image_path)
  puts cheated_file_path
  image = Image.read(cheated_file_path)[0]
  image.auto_orient!
  image.write(image_path)
else
  File.open(image_path , "w+b", 0644 ) { |f| f.write attch.body.decoded }
end

Naughty list…

In my makefile, I used a imagemagick command to re-orient all my images… but had to make a special case for this one image that never would behave.

fix:
	# fix that one bad one
	mv src/story/SMS/Christmas.png src/story/SMS/Christmas.bak
	cd src/story/SMS/ && mogrify -auto-orient *.{JPG,jpg,gif,jpeg,png}
	mv src/story/SMS/Christmas.bak src/story/SMS/Christmas.png

🔖
Changelog
  • 2022-06-08 11:31:29 -0500
    Rename articles

  • 2020-06-18 14:26:02 -0500
    Move everything to CST

    Don't know why I didn't do that before. It caused _no_ end of
    problems.

  • 2019-11-20 20:24:43 -0600
    Change the date

  • 2019-11-20 20:23:19 -0600
    Add images and fix typos

  • 2019-11-19 20:53:41 -0600
    Draft: Book Post