Example - creating a scatter plot
Does the early bird get the worm? Let's look at the relationship between the time a person leaves for work and their income. Income is recorded in columns 297-303, and the time a person leaves for work is recorded in columns 196-198, encoded in ten minute intervals. This pipeline extracts, cleans and formats the data:
greps knock out records for which either field is null, and the perl
script inserts a space between the two columns so
gnuplot can parse the
columns apart. Plotting in gnuplot is simple:
And the resulting plot:
Recall that 0 on the x-axis is midnight, and 20 is 200 minutes after midnight or about 3:20am. Increased density in the beginning of the traditional 1st and 2nd shift periods is apparent. Folks who work regular business hours clearly have higher incomes. It would be interesting to compute the average income in each time bucket, but that makes a pretty hairy command line perl script. Here is it in all its gruesome glory:
You can plot the result for yourself if you're curious.
Example - Creating a bar chart with gnuplot
Let's look at historic immigration rates among Washingtonians. Year of
immigration is recorded in columns 78-81, and 0000 means the person is a native
born citizen. We can apply the usual tricks with
uniq, but it's a bit hard to see the patterns when scrolling back and forth
in text output, it would nicer if we could see a plot.
Gnuplot is a fine graphing tool for this purpose, but it wants the category
label to come first, and the count to come second, so we need to write a perl
script to reverse
uniq's output and stick the result in a file. See
perlrun(1) for details on the
-F options to perl.
Now we can make a bar chart from the contents of the file with gnuplot.
Here's the graph gnuplot creates:
Be a bit careful interpreting this plot, only people who are still alive can be counted, so it naturally goes up and to the right (people who immigrated more recently have a better chance of still being alive). That said, there seems to have been an increase in immigration after the end of World War II, and also a spike after the end if the Vietnam war. I remain at a loss to explain the spike around 1980, consult your local historian.