Gutenberg ipsum

Stop using boring, boilerplate Lorem ipsum filler text and build your own random, semi-coherent text from Project Gutenberg books.
lorem ipusm

Monday, July 2, 2012

I’ve decided to try to be more like Ben Crowder, Tod Robbins, and Brett Terpstra and release my code tinkerings into the public more often.

Since upgrading to TextExpander 4 a couple weeks ago, I’ve decided to delve into it a lot more. I recently converted from Typinator, where I only really used a few expansions for typing transliterated Arabic. After belatedly stumbling upon a couple posts by Brett and Dr. Drang, I found that TextExpander can be used for some pretty fun stuff. Like random Lorem Ipsum-like text based on n-gram algorithms!

I really liked Dr. Drang’s concept of using text from Project Gutenberg to build completely random—yet mostly sensible—dummy text, since the standard “Lorem ipsum” looks really repetitive and boring. Isn’t this Brothers Karamazov-esque paragraph a lot better?

Mitya uttered his wild speech. He turned quickly on his way. “Do you suppose gentlemen, that doesn’t care for an escort, for?” I am gone? You won’t be frightened and cry out: ‘I want to join the choir and shout nastiness into both ears,’ while he ran to his bedroom, lay down and I walked along both of us. Dear ones, why do you wait.

Or this Homeric sounding stuff?

And of bounding steeds: Seven captives next a little space Then rush d amid the bright band: great Ithacus, before, First of mortals! for the gods Then swift pursued her urged, and crown’d Then sunk unpitied to the dire alarms; Both breathing slaughter, follow’d and Patroclus loved remains defend. Beneath.

I made a couple minor modifications to Dr. Drang’s original Perl script, clunkily removing paired characters like quotes and parentheses and allowing new corpus files to be specified with the command line, which makes it easier to repeat the code for multiple TextExpander snippets.

Here’s how to get it working:

#!/usr/bin/perl -w
# Modified from Dr. Drang's original script at

use Games::Dissociate;

# Choose the corpus file
if ($#ARGV == -1) {
  $corpus = "totc.txt";
} else {
  $corpus = [0];

# Slurp in the given corpus as a single string.
open(my $fh, "{HOME}/bin/gutenberg_ipsum/words/" . $corpus) or die "Can't open";
{local ; $corpus = <$fh>;}

# Dissociate the corpus, using word pairs, and return 15-50 pairs.
$length = int(15 + rand(35));
$dis = dissociate($corpus, -2, $length);

# Remove quotes and other paired characters, since there might be some that are unmatched
# But this is an incredibly clunky fix. If I had more time/better Perl chops, I'd probably build some algorithm to find unmatched quotes or parentheses and insert them randomly in the text. But that's hard :)
$dis =~ s/[\"\[\]\_\(\)]//gm;

# Capitalize the first word and end it with a period.
$dis =~ s/^(.)/\u/;
$dis =~ s/[.);:?'", -]+$/./;

print $dis;

Once the script is installed and running, It’s trivial to make it work with TextExpander. Make a new Shell Script snippet with this code:

#!/usr/bin/env bash
~/bin/gutenberg_ipsum/ princessofmars.txt

Make as many snippets as you want—one for each of your corpus files. All you need to change in each script is the argument for the file. Replace princessofmars.txt with the name of whatever corpus you want that snippet to use.

In the end you can create a good collection of different automatic random filler text from any corpus you want.

Gutenberg ipsum snippets

If TextExpander 4 allowed its new Fill Ins to work as variables in a shell script, this whole collection could be a lot simpler—it could just be one dropdown menu of all the corpus files in the words folder. But since that’s not possible, this will have to do.

All the credit to this collection of snippets goes to Dr. Drang and Brett Terpstra. You should read their stuff—really.