Game of Life text and image generator generator

I saw this image the other day on Hacker News and Reddit:

Golly Ticker

It’s a Game of Life pattern that prints out “Golly”. Neat, but I wanted my own. After about 5 minutes of playing around with the Golly logo pattern in Golly (a program for experimenting with the Game of Life), I gave up and wrote a program to do it.

The program takes the top and bottom portions of a template pattern (based on the Golly pattern) and positions them, then fills in the gliders between them for the correct number of columns. Then it duplicates the entire pattern for each row. Finally it “draws” some text (using the sample font from here) by deleting the gliders corresponding to empty space.

This was great, but I had essentially created a dot matrix printer that could draw anything, so it would be a waste to not draw images with it. A few lines of Ruby/RMagick code later, I had a program that did just that. Here’s an example using the Reddit Logo (watch it in HD for the best results):

The code is available on GitHub. It requires Ruby, and RMagick for images. Pass it “-s yourtext” to generate a text based pattern, or “-i imagepath” for an image pattern.

Download Golly to open up the generated “.rle” files.

A Better BugMeNot Bookmarklet

BugMeNot is a great little service for bypassing the registration process for websites that really shouldn’t require it (ahem, nytimes.com). The bookmarklet brings up BugMeNot for the current website you’re viewing, and gives you login/password pairs which you can then copy and paste.

But wouldn’t it be better if it automagically filled in the username and password for you? I thought so, so I wrote a few lines of code in the form of a bookmarklet and a JSONP web service to do this.

BugMeNot doesn’t provide an API so I had to do a little screen scraping with Hpricot. They also try to obfuscate the usernames and passwords by shifting the characters by some offset calculated from a “key” then Base64 encoding the string, and prepending 4 characters. Luckily their obfuscation was no match for a single line of Ruby:

def bmn_decode(input, offset)
  # decode base64, strip first 4 chars, convert chars to ints, substract offset, convert back ints back to chars
  input.unpack("m*")[0][4..-1].unpack("C*").map{|c| c – offset }.pack("C*")
end

The bookmarklet makes the request via an injected <script> tag. When it’s callback gets called it finds the most likely input elements for the username and password and fills them in with the result.

The Rails app consists of a single action that makes a request to bugmenot.com for the specified site, extracts and decodes the usernames and passwords, and picks the one with the highest rating. It then returns the result as JSON wrapped in a function callback (i.e. JSONP)

I’m not going to post the location of the live JSONP web service since BugMeNot limits the number of requests you can make, but the code is available on GitHub.

Scraping USC DPS's incident logs

This post describes how to extract incident summaries and metadata from the USC Department of Public safety’s daily incident logs, which is used extensively in [TOOBS](http://tlrobinson.net/projects/toobs)

### Background ###

A couple of years ago when the Google Maps API was first introduced I wanted to make something useful using it. The USC Department of Public safety sends out these “Crime Alert” emails whenever a robbery or assault or other violent crime against a student occurs, so I decided to plot each of the locations along with the summary of the crime on a map of USC and the surrounding area.

This was fine for a small number of crimes, but unfortunately the Crime Alert emails were completely unstructured and never formatted consistently, so automating the process was out of the question. I ended up creating each entry by hand, and hard coding the data. The result wasn’t pretty, but it’s still available on my USC page here: [http://www-scf.usc.edu/~tlrobins/crimealert/](http://www-scf.usc.edu/~tlrobins/crimealert/)

For UPE’s P.24 programming contest I decided to rewrite the whole thing to make it far more automated and flexible. Since my first attempt, I discovered that DPS publishes every single incident they respond to as [daily PDF logs](http://capsnet.usc.edu/DPS/CrimeSummary.cfm). Obviously I would have preferred XML or some other structured format, but the PDFs will have to do for now.

### Method ###

My language of choice was Ruby since I originally planned on using the project as an excuse to learn Ruby on Rails. Due to some ridiculously strange bugs I gave up on Rails for the project, but not before writing the incident logs parser.

The main script can either take a list of URLs as arguments, or if no arguments are specified it will try to download the previous day’s log (good for running it as a cron job). A HTTP request to the URL is made, and if successful the PDF is downloaded into memory. To convert the PDF to text I used [rpdf2txt](http://raa.ruby-lang.org/project/rpdf2txt/).

Once in text form, a variety of regular expressions and string manipulation functions are used to extract each field from the entry. When an entry is complete, it is inserted into a database. Spurious lines are discarded.

### Code ###

The import script is available here: [http://tlrobinson.net/projects/toobs/import.rb](http://tlrobinson.net/projects/toobs/import.rb)

require "net/http"
require "rpdf2txt/parser"
require "date"

require "rubygems"
require_gem "activerecord"

# misc regular expressions constants
datetimeRE = /[A-Z][a-z]{2} [0-9]{2}, [0-9]{4}-[A-Z][a-z]+ at [0-9]{2}:[0-9]{2}/
stampRE = /[0-9]{2}-[0-9]{2}-[0-9]{2}-[0-9]+/

# connect to the database
ActiveRecord::Base.establish_connection(
  :adapter  => "mysql",
  :host     => "host",
  :database => "database",
  :username => "username",
  :password => "password"
)

class Incident < ActiveRecord::Base
  set_table_name "crimes"
end
  
def import_url(url)
  puts "================== Processing: " + url + " =================="
  
  resp = Net::HTTP.get_response(URI.parse(url))
  if resp.is_a? Net::HTTPSuccess
    # parse the pdf, extract the text, split into lines
    parser = Rpdf2txt::Parser.new(resp.body)
    text = parser.extract_text
    lines = text.split("\n")
    
    incidents = Array.new # array containing each incident
    summary = false       # for multiple line summaries
    disp = false          # for cases when the "disp" data is on the line after the "Disp:" header
    
    # try to match each line to a regular expression or other condition
    # then extract the data from the line
    lines.each do |line|
    
      # first line
      if (line =~ stampRE)

        # special case for missing identifier of previous incident
        if (incidents.size > 0 && incidents.last.identifier == nil)
          puts "+++ Last identifier is empty, searching for identifier in summary…"
          tempRE = /DR\#[\d]+/;
          tempId = incidents.last.summary[tempRE];
          if (tempId != nil)
            puts "+++ Found! {" + tempId[3..tempId.length-1] + "}"
            incidents.last.identifier = tempId[3..tempId.length-1];
          end
        end
    
        # create new incident
        incidents << Incident.new
        summary = false
        disp = false
  
        # extract category, subcategory, time, and stamp
        cat_subcat_index = line.slice(/[^a-z]*(?=[A-Z][a-z])/).length
        incidents.last.category = line[0..cat_subcat_index-1].strip
        incidents.last.subcategory = line[cat_subcat_index..line.index(datetimeRE)-1].strip
        incidents.last.time = DateTime.parse(line.slice(datetimeRE))
        incidents.last.stamp = line.slice(stampRE)
        
      # identifier
      elsif (line =~ /^[0-9]+$/)
        incidents.last.identifier = line.slice(/^[0-9]+$/).to_i
        
      # location
      elsif (line =~ /Location:/)
        incidents.last.location = line.sub(/Location:/, "").strip
        
      # cc
      elsif (line =~ /cc:/)
        incidents.last.cc = line.sub(/cc:/, "").strip
        summary = false
      
      # disposition
      elsif (disp)
        incidents.last.disp = line.sub(/Disp:/, "").strip
        disp = false
      
      # summary
      elsif (line =~ /Summary:/ || summary)
        if (incidents.last.summary.nil?)
          incidents.last.summary = line.sub(/Summary:/, "").strip
        else
          incidents.last.summary << (" " + line.sub(/Summary:/, "").strip)
        end
    
        if (incidents.last.summary =~ /Disp:/)
          # find the "Disp:" header and data, remove from summary
          disp = incidents.last.summary.slice!(/\s*Disp:.*/)
          incidents.last.disp = disp.sub(/Disp:/, "").strip
          
          disp = (incidents.last.disp == "") # check that we actually got the "disp" data
          summary = false
        else
          summary = true
        end
      
      # no match
      else
        puts "discarding line: {" + line + "}"
      end
    end
    
    # at the end save each incident and print a list
    incidents.each do |incident|
      begin
        puts( ("%8d" % incident.identifier) + " " +
              ("%25s" % ("{" + incident.category    + "}")) + " " +
              ("%45s" % ("{" + incident.subcategory + "}")) + " " +
              ("%60s" % ("{" + incident.location    + "}")));
        incident.save
      rescue Exception => exp
        puts exp
      end
    end
    
  end
end

if (ARGV.length > 0)
  # import each argument
  ARGV.each do |arg|
    import_url(arg)
  end
else
  yesterday = Date.today – 1;
  urlToImport = "http://capsnet.usc.edu/DPS/webpdf/"+
    ("%02d" % yesterday.mon) + ("%02d" % yesterday.mday) + yesterday.year.to_s[2..3] + ".pdf"
  import_url(urlToImport)
end

### Conclusion ###
This system works fairly well with a few exceptions. While the PDFs are far more consistent than the emails, occasionally a PDF that can’t be parsed by rpdf2txt shows up. So far I haven’t found a solution (perhaps using a different PDF to text converter). Also, sometimes entries are missing an identifier, or it shows up in a different location. Some special rules are used to try to find it, but it’s not always successful.

Overall it was a success, as demonstrated by the 4000+ incidents currently in the TOOBS database.