<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>tlrobinson.net blog &#187; Ruby</title>
	<atom:link href="http://tlrobinson.net/blog/category/ruby/feed/" rel="self" type="application/rss+xml" />
	<link>http://tlrobinson.net/blog</link>
	<description></description>
	<lastBuildDate>Mon, 06 Apr 2009 08:37:15 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.2</generator>
		<item>
		<title>Game of Life text and image generator generator</title>
		<link>http://tlrobinson.net/blog/2009/02/game-of-life-generator/</link>
		<comments>http://tlrobinson.net/blog/2009/02/game-of-life-generator/#comments</comments>
		<pubDate>Sun, 08 Feb 2009 03:03:08 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[Ruby]]></category>
		<category><![CDATA[Game of Life]]></category>

		<guid isPermaLink="false">http://tlrobinson.net/blog/?p=78</guid>
		<description><![CDATA[I saw this image the other day on Hacker News and Reddit: It&#8217;s a Game of Life pattern that prints out &#8220;Golly&#8221;. Neat, but I wanted my own. After about 5 minutes of playing around with the Golly logo pattern &#8230; <a href="http://tlrobinson.net/blog/2009/02/game-of-life-generator/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>I saw this image the other day on <a href="http://news.ycombinator.com">Hacker News</a> and <a href="http://www.reddit.com/">Reddit</a>:</p>
<p><img src="http://golly.sourceforge.net/ticker.gif" alt="Golly Ticker" /></p>
<p>It&#8217;s a <a href="http://en.wikipedia.org/wiki/Conway's_Game_of_Life">Game of Life</a> pattern that prints out &#8220;Golly&#8221;. Neat, but I wanted my own. After about 5 minutes of playing around with the Golly logo pattern <em>in</em> <a href="http://golly.sourceforge.net/">Golly</a> (a program for experimenting with the Game of Life), I gave up and wrote a program to do it.</p>
<p>The program takes the top and bottom portions of a template pattern (based on the Golly pattern) and positions them, then fills in the gliders between them for the correct number of columns. Then it duplicates the entire pattern for each row. Finally it &#8220;draws&#8221; some text (using the sample font from <a href="http://pentacom.jp/soft/ex/font/">here</a>) by deleting the gliders corresponding to empty space.</p>
<p>This was great, but I had essentially created a dot matrix printer that could draw anything, so it would be a waste to not draw images with it. A few lines of Ruby/RMagick code later, I had a program that did just that. Here&#8217;s an example using the Reddit Logo (<a href="http://vimeo.com/3124876">watch it in HD</a> for the best results):</p>
<p><object width="640" height="360"><param name="allowfullscreen" value="true" /><param name="allowscriptaccess" value="always" /><param name="movie" value="http://vimeo.com/moogaloop.swf?clip_id=3124876&amp;server=vimeo.com&amp;show_title=1&amp;show_byline=1&amp;show_portrait=1&amp;color=FF7700&amp;fullscreen=1" /><embed src="http://vimeo.com/moogaloop.swf?clip_id=3124876&amp;server=vimeo.com&amp;show_title=1&amp;show_byline=1&amp;show_portrait=1&amp;color=FF7700&amp;fullscreen=1" type="application/x-shockwave-flash" allowfullscreen="true" allowscriptaccess="always" width="640" height="360"></embed></object></p>
<p>The code is <a href="http://github.com/tlrobinson/life-gen/">available on GitHub</a>. It requires Ruby, and RMagick for images. Pass it &#8220;-s yourtext&#8221; to generate a text based pattern, or &#8220;-i imagepath&#8221; for an image pattern.</p>
<p>Download <a href="http://golly.sourceforge.net/">Golly</a> to open up the generated &#8220;.rle&#8221; files.</p>
]]></content:encoded>
			<wfw:commentRss>http://tlrobinson.net/blog/2009/02/game-of-life-generator/feed/</wfw:commentRss>
		<slash:comments>15</slash:comments>
		</item>
		<item>
		<title>A Better BugMeNot Bookmarklet</title>
		<link>http://tlrobinson.net/blog/2008/11/a-better-bugmenot-bookmarklet/</link>
		<comments>http://tlrobinson.net/blog/2008/11/a-better-bugmenot-bookmarklet/#comments</comments>
		<pubDate>Sun, 09 Nov 2008 10:46:21 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Hacks]]></category>
		<category><![CDATA[JavaScript]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[Ruby]]></category>
		<category><![CDATA[Ruby on Rails]]></category>
		<category><![CDATA[Web]]></category>

		<guid isPermaLink="false">http://tlrobinson.net/blog/?p=71</guid>
		<description><![CDATA[BugMeNot is a great little service for bypassing the registration process for websites that really shouldn&#8217;t require it (ahem, nytimes.com). The bookmarklet brings up BugMeNot for the current website you&#8217;re viewing, and gives you login/password pairs which you can then &#8230; <a href="http://tlrobinson.net/blog/2008/11/a-better-bugmenot-bookmarklet/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.bugmenot.com/">BugMeNot</a> is a great little service for bypassing the registration process for websites that really shouldn&#8217;t require it (ahem, nytimes.com). The bookmarklet brings up BugMeNot for the current website you&#8217;re viewing, and gives you login/password pairs which you can then copy and paste.</p>
<p>But wouldn&#8217;t it be better if it automagically filled in the username and password for you? I thought so, so I wrote a few lines of code in the form of a bookmarklet and a JSONP web service to do this.</p>
<p>BugMeNot doesn&#8217;t provide an API so I had to do a little screen scraping with <a href="https://code.whytheluckystiff.net/hpricot/">Hpricot</a>. They also try to obfuscate the usernames and passwords by shifting the characters by some offset calculated from a &#8220;key&#8221; then Base64 encoding the string, and prepending 4 characters. Luckily their obfuscation was no match for a single line of Ruby:</p>
<div style="text-align:left;color:#000000; background-color:#ffffff; border:solid black 1px; padding:0.5em 1em 0.5em 1em; overflow:auto;font-size:small; font-family:monospace; "><span style="color:#881350;">def</span> bmn_decode(input, offset)<br />
&nbsp;&nbsp;<span style="color:#236e25;"># decode base64, strip first 4 chars, convert chars to ints, substract offset, convert back ints back to chars<br />
</span>&nbsp;&nbsp;input.unpack(<span style="color:#760f15;">&quot;m*&quot;</span>)[<span style="color:#0000ff;">0</span>][<span style="color:#0000ff;">4.</span>.<span style="color:#0000ff;">-1</span>].unpack(<span style="color:#760f15;">&quot;C*&quot;</span>).map{|c| c &#8211; offset }.pack(<span style="color:#760f15;">&quot;C*&quot;</span>)<br />
<span style="color:#881350;">end</span></div>
<p>The bookmarklet makes the request via an injected &lt;script&gt; tag. When it&#8217;s callback gets called it finds the most likely input elements for the username and password and fills them in with the result.</p>
<p>The Rails app consists of a single action that makes a request to bugmenot.com for the specified site, extracts and decodes the usernames and passwords, and picks the one with the highest rating. It then returns the result as JSON wrapped in a function callback (i.e. JSONP)</p>
<p>I&#8217;m not going to post the location of the live JSONP web service since BugMeNot limits the number of requests you can make, but the code is available <a href="http://github.com/tlrobinson/tlrobinson/tree/master/bbmn">on GitHub</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://tlrobinson.net/blog/2008/11/a-better-bugmenot-bookmarklet/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Scraping USC DPS&#039;s incident logs</title>
		<link>http://tlrobinson.net/blog/2007/03/scraping-usc-dpss-incident-logs/</link>
		<comments>http://tlrobinson.net/blog/2007/03/scraping-usc-dpss-incident-logs/#comments</comments>
		<pubDate>Mon, 12 Mar 2007 22:41:19 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Ruby]]></category>
		<category><![CDATA[TOOBS]]></category>
		<category><![CDATA[Web]]></category>

		<guid isPermaLink="false">http://tlrobinson.net/blog/?p=6</guid>
		<description><![CDATA[This post describes how to extract incident summaries and metadata from the USC Department of Public safety&#8217;s daily incident logs, which is used extensively in [TOOBS](http://tlrobinson.net/projects/toobs) ### Background ### A couple of years ago when the Google Maps API was &#8230; <a href="http://tlrobinson.net/blog/2007/03/scraping-usc-dpss-incident-logs/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>This post describes how to extract incident summaries and metadata from the USC Department of Public safety&#8217;s daily incident logs, which is used extensively in [TOOBS](http://tlrobinson.net/projects/toobs)</p>
<p>### Background ###</p>
<p>A couple of years ago when the Google Maps API was first introduced I wanted to make something useful using it. The USC Department of Public safety sends out these &#8220;Crime Alert&#8221; emails whenever a robbery or assault or other violent crime against a student occurs, so I decided to plot each of the locations along with the summary of the crime on a map of USC and the surrounding area.</p>
<p>This was fine for a small number of crimes, but unfortunately the Crime Alert emails were completely unstructured and never formatted consistently, so automating the process was out of the question. I ended up creating each entry by hand, and hard coding the data. The result wasn&#8217;t pretty, but it&#8217;s still available on my USC page here: [http://www-scf.usc.edu/~tlrobins/crimealert/](http://www-scf.usc.edu/~tlrobins/crimealert/)</p>
<p>For UPE&#8217;s P.24 programming contest I decided to rewrite the whole thing to make it far more automated and flexible. Since my first attempt, I discovered that DPS publishes every single incident they respond to as [daily PDF logs](http://capsnet.usc.edu/DPS/CrimeSummary.cfm). Obviously I would have preferred XML or some other structured format, but the PDFs will have to do for now.</p>
<p>### Method ###</p>
<p>My language of choice was Ruby since I originally planned on using the project as an excuse to learn Ruby on Rails. Due to some ridiculously strange bugs I gave up on Rails for the project, but not before writing the incident logs parser.</p>
<p>The main script can either take a list of URLs as arguments, or if no arguments are specified it will try to download the previous day&#8217;s log (good for running it as a cron job). A HTTP request to the URL is made, and if successful the PDF is downloaded into memory. To convert the PDF to text I used [rpdf2txt](http://raa.ruby-lang.org/project/rpdf2txt/).</p>
<p>Once in text form, a variety of regular expressions and string manipulation functions are used to extract each field from the entry. When an entry is complete, it is inserted into a database. Spurious lines are discarded.</p>
<p>### Code ###</p>
<p>The import script is available here: [http://tlrobinson.net/projects/toobs/import.rb](http://tlrobinson.net/projects/toobs/import.rb)</p>
<div style="text-align:left;color:#000000; background-color:#ffffff; border:solid black 1px; padding:0.5em 1em 0.5em 1em; overflow:auto;font-size:small; font-family:monospace; "><span style="color:#881350;">require</span> <span style="color:#760f15;">&quot;net/http&quot;</span><br />
<span style="color:#881350;">require</span> <span style="color:#760f15;">&quot;rpdf2txt/parser&quot;</span><br />
<span style="color:#881350;">require</span> <span style="color:#760f15;">&quot;date&quot;</span></p>
<p><span style="color:#881350;">require</span> <span style="color:#760f15;">&quot;rubygems&quot;</span><br />
require_gem <span style="color:#760f15;">&quot;activerecord&quot;</span></p>
<p><span style="color:#236e25;"># misc regular expressions constants<br />
</span>datetimeRE = <span style="color:#c700c2;">/[A-Z][a-z]{2} [0-9]{2}, [0-9]{4}-[A-Z][a-z]+ at [0-9]{2}:[0-9]{2}/</span><br />
stampRE = <span style="color:#c700c2;">/[0-9]{2}-[0-9]{2}-[0-9]{2}-[0-9]+/</span></p>
<p><span style="color:#236e25;"># connect to the database<br />
</span>ActiveRecord::Base.establish_connection(<br />
&nbsp;&nbsp;<span style="color:#d6771c;">:adapter</span> &nbsp;=&gt; <span style="color:#760f15;">&quot;mysql&quot;</span>,<br />
&nbsp;&nbsp;<span style="color:#d6771c;">:host</span> &nbsp;&nbsp;&nbsp;&nbsp;=&gt; <span style="color:#760f15;">&quot;host&quot;</span>,<br />
&nbsp;&nbsp;<span style="color:#d6771c;">:database</span> =&gt; <span style="color:#760f15;">&quot;database&quot;</span>,<br />
&nbsp;&nbsp;<span style="color:#d6771c;">:username</span> =&gt; <span style="color:#760f15;">&quot;username&quot;</span>,<br />
&nbsp;&nbsp;<span style="color:#d6771c;">:password</span> =&gt; <span style="color:#760f15;">&quot;password&quot;</span> <br />
)</p>
<p><span style="color:#881350;">class</span> Incident &lt; ActiveRecord::Base<br />
&nbsp;&nbsp;set_table_name <span style="color:#760f15;">&quot;crimes&quot;</span><br />
<span style="color:#881350;">end</span><br />
&nbsp;&nbsp;<br />
<span style="color:#881350;">def</span> import_url(url)<br />
&nbsp;&nbsp;<span style="color:#881350;">puts</span> <span style="color:#760f15;">&quot;================== Processing: &quot;</span> + url + <span style="color:#760f15;">&quot; ==================&quot;</span><br />
&nbsp;&nbsp;<br />
&nbsp;&nbsp;resp = Net::HTTP.get_response(URI.parse(url))<br />
&nbsp;&nbsp;<span style="color:#881350;">if</span> resp.is_a? Net::HTTPSuccess<br />
&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#236e25;"># parse the pdf, extract the text, split into lines<br />
</span>&nbsp;&nbsp;&nbsp;&nbsp;parser = Rpdf2txt::Parser.new(resp.body)<br />
&nbsp;&nbsp;&nbsp;&nbsp;text = parser.extract_text<br />
&nbsp;&nbsp;&nbsp;&nbsp;lines = text.<span style="color:#881350;">split</span>(<span style="color:#760f15;">&quot;\n&quot;</span>)<br />
&nbsp;&nbsp;&nbsp;&nbsp;<br />
&nbsp;&nbsp;&nbsp;&nbsp;incidents = <span style="color:#881350;">Array</span>.new <span style="color:#236e25;"># array containing each incident<br />
</span>&nbsp;&nbsp;&nbsp;&nbsp;summary = <span style="color:#0000cc;">false</span> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#236e25;"># for multiple line summaries<br />
</span>&nbsp;&nbsp;&nbsp;&nbsp;disp = <span style="color:#0000cc;">false</span> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#236e25;"># for cases when the &quot;disp&quot; data is on the line after the &quot;Disp:&quot; header<br />
</span>&nbsp;&nbsp;&nbsp;&nbsp;<br />
&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#236e25;"># try to match each line to a regular expression or other condition<br />
</span>&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#236e25;"># then extract the data from the line<br />
</span>&nbsp;&nbsp;&nbsp;&nbsp;lines.each <span style="color:#881350;">do</span> |line|<br />
&nbsp;&nbsp;&nbsp;&nbsp;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#236e25;"># first line<br />
</span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#881350;">if</span> (line =~ stampRE)</p>
<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#236e25;"># special case for missing identifier of previous incident<br />
</span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#881350;">if</span> (incidents.size &gt; <span style="color:#0000ff;">0</span> &amp;&amp; incidents.last.identifier == <span style="color:#0000cc;">nil</span>) <br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#881350;">puts</span> <span style="color:#760f15;">&quot;+++ Last identifier is empty, searching for identifier in summary&#8230;&quot;</span><br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;tempRE = <span style="color:#c700c2;">/DR\#[\d]+/</span>;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;tempId = incidents.last.summary[tempRE];<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#881350;">if</span> (tempId != <span style="color:#0000cc;">nil</span>) <br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#881350;">puts</span> <span style="color:#760f15;">&quot;+++ Found! {&quot;</span> + tempId[<span style="color:#0000ff;">3.</span>.tempId.length-<span style="color:#0000ff;">1</span>] + <span style="color:#760f15;">&quot;}&quot;</span><br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;incidents.last.identifier = tempId[<span style="color:#0000ff;">3.</span>.tempId.length-<span style="color:#0000ff;">1</span>];<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#881350;">end</span><br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#881350;">end</span><br />
&nbsp;&nbsp;&nbsp;&nbsp;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#236e25;"># create new incident<br />
</span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;incidents &lt;&lt; Incident.new<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;summary = <span style="color:#0000cc;">false</span><br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;disp = <span style="color:#0000cc;">false</span><br />
&nbsp;&nbsp;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#236e25;"># extract category, subcategory, time, and stamp<br />
</span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;cat_subcat_index = line.slice(<span style="color:#c700c2;">/[^a-z]*(?=[A-Z][a-z])/</span>).length<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;incidents.last.category = line[<span style="color:#0000ff;">0.</span>.cat_subcat_index-<span style="color:#0000ff;">1</span>].strip<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;incidents.last.subcategory = line[cat_subcat_index..line.index(datetimeRE)<span style="color:#0000ff;">-1</span>].strip<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;incidents.last.time = <span style="color:#881350;">DateTime</span>.parse(line.slice(datetimeRE))<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;incidents.last.stamp = line.slice(stampRE)<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#236e25;"># identifier<br />
</span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#881350;">elsif</span> (line =~ <span style="color:#c700c2;">/^[0-9]+$/</span>)<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;incidents.last.identifier = line.slice(<span style="color:#c700c2;">/^[0-9]+$/</span>).to_i<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#236e25;"># location<br />
</span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#881350;">elsif</span> (line =~ <span style="color:#c700c2;">/Location:/</span>)<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;incidents.last.location = line.<span style="color:#881350;">sub</span>(<span style="color:#c700c2;">/Location:/</span>, <span style="color:#760f15;">&quot;&quot;</span>).strip<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#236e25;"># cc<br />
</span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#881350;">elsif</span> (line =~ <span style="color:#c700c2;">/cc:/</span>)<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;incidents.last.cc = line.<span style="color:#881350;">sub</span>(<span style="color:#c700c2;">/cc:/</span>, <span style="color:#760f15;">&quot;&quot;</span>).strip<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;summary = <span style="color:#0000cc;">false</span><br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#236e25;"># disposition<br />
</span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#881350;">elsif</span> (disp) <br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;incidents.last.disp = line.<span style="color:#881350;">sub</span>(<span style="color:#c700c2;">/Disp:/</span>, <span style="color:#760f15;">&quot;&quot;</span>).strip<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;disp = <span style="color:#0000cc;">false</span><br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#236e25;"># summary<br />
</span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#881350;">elsif</span> (line =~ <span style="color:#c700c2;">/Summary:/</span> || summary)<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#881350;">if</span> (incidents.last.summary.nil?)<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;incidents.last.summary = line.<span style="color:#881350;">sub</span>(<span style="color:#c700c2;">/Summary:/</span>, <span style="color:#760f15;">&quot;&quot;</span>).strip<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#881350;">else</span><br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;incidents.last.summary &lt;&lt; (<span style="color:#760f15;">&quot; &quot;</span> + line.<span style="color:#881350;">sub</span>(<span style="color:#c700c2;">/Summary:/</span>, <span style="color:#760f15;">&quot;&quot;</span>).strip)<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#881350;">end</span><br />
&nbsp;&nbsp;&nbsp;&nbsp;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#881350;">if</span> (incidents.last.summary =~ <span style="color:#c700c2;">/Disp:/</span>)<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#236e25;"># find the &quot;Disp:&quot; header and data, remove from summary<br />
</span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;disp = incidents.last.summary.slice!(<span style="color:#c700c2;">/\s*Disp:.*/</span>)<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;incidents.last.disp = disp.<span style="color:#881350;">sub</span>(<span style="color:#c700c2;">/Disp:/</span>, <span style="color:#760f15;">&quot;&quot;</span>).strip<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;disp = (incidents.last.disp == <span style="color:#760f15;">&quot;&quot;</span>) <span style="color:#236e25;"># check that we actually got the &quot;disp&quot; data<br />
</span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;summary = <span style="color:#0000cc;">false</span><br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#881350;">else</span><br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;summary = <span style="color:#0000cc;">true</span><br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#881350;">end</span><br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#236e25;"># no match<br />
</span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#881350;">else</span><br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#881350;">puts</span> <span style="color:#760f15;">&quot;discarding line: {&quot;</span> + line + <span style="color:#760f15;">&quot;}&quot;</span><br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#881350;">end</span><br />
&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#881350;">end</span><br />
&nbsp;&nbsp;&nbsp;&nbsp;<br />
&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#236e25;"># at the end save each incident and print a list<br />
</span>&nbsp;&nbsp;&nbsp;&nbsp;incidents.each <span style="color:#881350;">do</span> |incident|<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#881350;">begin</span><br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#881350;">puts</span>( (<span style="color:#760f15;">&quot;%8d&quot;</span> % incident.identifier) + <span style="color:#760f15;">&quot; &quot;</span> +<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;(<span style="color:#760f15;">&quot;%25s&quot;</span> % (<span style="color:#760f15;">&quot;{&quot;</span> + incident.category &nbsp;&nbsp;&nbsp;+ <span style="color:#760f15;">&quot;}&quot;</span>)) + <span style="color:#760f15;">&quot; &quot;</span> +<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;(<span style="color:#760f15;">&quot;%45s&quot;</span> % (<span style="color:#760f15;">&quot;{&quot;</span> + incident.subcategory + <span style="color:#760f15;">&quot;}&quot;</span>)) + <span style="color:#760f15;">&quot; &quot;</span> +<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;(<span style="color:#760f15;">&quot;%60s&quot;</span> % (<span style="color:#760f15;">&quot;{&quot;</span> + incident.location &nbsp;&nbsp;&nbsp;+ <span style="color:#760f15;">&quot;}&quot;</span>)));<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;incident.save<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#881350;">rescue</span> <span style="color:#881350;">Exception</span> =&gt; exp<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#881350;">puts</span> exp<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#881350;">end</span><br />
&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#881350;">end</span><br />
&nbsp;&nbsp;&nbsp;&nbsp;<br />
&nbsp;&nbsp;<span style="color:#881350;">end</span><br />
<span style="color:#881350;">end</span></p>
<p><span style="color:#881350;">if</span> (<span style="color:#a1617a;">ARGV</span>.length &gt; <span style="color:#0000ff;">0</span>)<br />
&nbsp;&nbsp;<span style="color:#236e25;"># import each argument<br />
</span>&nbsp;&nbsp;<span style="color:#a1617a;">ARGV</span>.each <span style="color:#881350;">do</span> |arg|<br />
&nbsp;&nbsp;&nbsp;&nbsp;import_url(arg)<br />
&nbsp;&nbsp;<span style="color:#881350;">end</span><br />
<span style="color:#881350;">else</span><br />
&nbsp;&nbsp;yesterday = <span style="color:#881350;">Date</span>.today &#8211; <span style="color:#0000ff;">1</span>;<br />
&nbsp;&nbsp;urlToImport = <span style="color:#760f15;">&quot;http://capsnet.usc.edu/DPS/webpdf/&quot;</span>+<br />
&nbsp;&nbsp;&nbsp;&nbsp;(<span style="color:#760f15;">&quot;%02d&quot;</span> % yesterday.mon) + (<span style="color:#760f15;">&quot;%02d&quot;</span> % yesterday.mday) + yesterday.year.to_s[<span style="color:#0000ff;">2..3</span>] + <span style="color:#760f15;">&quot;.pdf&quot;</span><br />
&nbsp;&nbsp;import_url(urlToImport)<br />
<span style="color:#881350;">end</span></div>
<p>### Conclusion ###<br />
This system works fairly well with a few exceptions. While the PDFs are far more consistent than the emails, occasionally a PDF that can&#8217;t be parsed by rpdf2txt shows up. So far I haven&#8217;t found a solution (perhaps using a different PDF to text converter). Also, sometimes entries are missing an identifier, or it shows up in a different location. Some special rules are used to try to find it, but it&#8217;s not always successful.</p>
<p>Overall it was a success, as demonstrated by the 4000+ incidents currently in the TOOBS database.</p>
]]></content:encoded>
			<wfw:commentRss>http://tlrobinson.net/blog/2007/03/scraping-usc-dpss-incident-logs/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

