Using user stylesheets to highlight links to PDFs or other media, rel="nofollow", etc

Browsers allow you to define your own stylesheet that’s applied to every page you visit. For the longest time I’ve wondered why anyone would ever want this feature. I figured it would be useful for people with poor vision or other disabilities and that was about it.

But combined with some neat features of CSS, one can come up with interesting uses of user stylesheets. Consider the following:

a[rel~=”nofollow”] {
text-shadow: rgba(255,0,0,0.25) 1px 1px 1px;
}

This rule uses the partial attribute value selector to give all hyperlinks with the rel=”nofollow” attribute a slight red shadow (like the preceding link, if your browser supports text-shadow).

Why would you want this? Well, for me, pure curiosity. But SEOs or spammers may find it enlightening though.

For example, the first thing I noticed was that on the Hacker News homepage links to external sites newer than 3 or 4 hours have the nofollow attribute, but older ones do not – clearly a spam deterrent.

There are many other useful and interesting scenarios. Say you want to highlight all PDF or MP3 links (“$=” matches the end of an attribute):

a[href$=.pdf], a[href$=.mp3] {
text-shadow: rgba(0,255,0,0.25) 1px 1px 1px;
}

Or email links (“^=” matches the beginning of an attribute):

a[href^=mailto] {
text-shadow: rgba(0,0,255,0.25) 1px 1px 1px;
}

Note that I used text shadows, but anything styleable by CSS is fair game.

With April fools day approaching, one could imagine other creative uses. I’ll leave that as an exercise to the reader.

Using command line tools to detect the most frequent words in a file

Antonio Cangiano wrote a post about “[Using Python to detect the most frequent words in a file](http://antoniocangiano.com/2008/03/18/use-python-to-detect-the-most-frequent-words-in-a-file/)”. It’s a nice summary of how to do it in Python, but (nearly) the same thing can be accomplished by stringing together a few standard command line tools.

I’m no command line ninja, but I’d like to think I have basic command of most of the standard filters. Here’s my solution:

cat test.txt | tr -s ‘[:space:]’ ‘\n’ | tr ‘[:upper:]’ ‘[:lower:]’ | sort | uniq -c | sort -n | tail -10

I’ll explain it blow-by-blow:

cat test.txt

If you don’t know what this does you’ve got a lot to learn. “cat” simply reads files and prints them to standard output (concatenates), for use by subsequent filters.

tr -s ‘[:space:]’ ‘\n’

“tr” is a handy tool that simply translates matching characters from the first set to the corresponding character of the second set. The first instance turns all whitespace characters (spaces, tabs, newlines) into newlines (“\n”) so that each word is on a separate line (the -s option “squeezes” multiple runs of newlines into a single newline).

tr ‘[:upper:]’ ‘[:lower:]’

The second instance translates all uppercase characters into lowercase (note: the two “tr”s are separate for clarity, but they could be combined into a single one).

sort | uniq -c

“sort” and “uniq” do exactly as their names imply, but “uniq” only removes adjacent duplicates, so you often want to sort the input first. The “-c” option for “uniq” prepends each line with the number of occurrences.

sort -n

We sort the result of “uniq”, this time by numerical order (“-n”) to get the list of words in order of the number of occurrences.

tail -10

Finally, we get the 10 most frequently occurring words by using “tail” to take only the last 10 lines (since the “sort -n” puts the list in ascending order)

It’s not perfect, especially since punctuation is included in the words, but the “tr” commands can be tweaked as needed.