Wednesday, May 8, 2013

Big Data and Local Elections (Part I)

This sunny spring Northwest night I should be at the swinging corporate offices of Canadian high-tech wunder startup HootSuite in beautiful Vancouver, B.C.  HootSuite is graciously hosting an R Programming meetup for statistical graphing superstar Hadley Wickham.  Dr. Wickham specializes in "pracical tools for exploring data and models" like ggplot2 which he built especially for the R Programming language.  Why is this important?  Because we are now living in the internet connected era of very large data sets. And apparently the people who will die with the most toys in the future (or become elected to the  most powerful political offices) are the ones who will understand big data better than the rest.

But alas, I am not listening to Dr. Wickham tonight.  Stupefying spring pollen counts have grounded my lungs to air conditioned spaces for awhile, therefore this post. We now grow so used to the rapid change of computing infrastructure and progress in software functionality that it is easy to take for granted the power each of us now have to change social outcomes with our keyboards.  So listen up, future and present activists. The information in this post was produced with OpenBSD 5.2, R Programming, Gvim, and Gimp; all of this FOSS or Free and Open Source Software: all cross platform and all of it something almost anyone can load on most older hardware and (after some struggle) become proficient. Did I mention that OpenBSD is Canadian and renowned for security?

The R Programming environment is a mathematically rich software that has benefited from long term open source development. Although a bit kludgey to learn at first, the reward for your effort is lighting quick computation. Reputedly, it doesn't scale as well for large data as SQL based technology like PostgresSQL. However it scales well enough for my uses here. These images below are:
  • Whatcom County total population (by five age groups) from the census
  • A data extraction of voter registration by birth date from Nov. 2012
  • An overlap of the previous two images
It took me some time to write the R code for these graphs. I really just started to pick up R Programming in January. But look how illuminating a little programming can be.  (Click on these charts to enlarge.) The first pix gives us some idea on how the 205,000 us are grouped by age in Whatcom County. You can clearly see the baby boom/Gen X (Age 45 - 64) bin dominates the over 18 population: 

This second graph consists of yearly registration counts grouped by birth year. In it, we see the "boomer/gen X bump" but we also see lots of registrations from younger generations.

In the next graph, I create an image not so statistical in focus, but possibly important for visualization.  Whatcom County has a very high percentage of eligible voters registered if the numbers can be believed. About 130K of the 160K over the age of eighteen who live here would appear to be registered to vote. The overlaid images of the two graphs below is not to scale. The top age bin has about 50K. The top yearly registration count is about 2500 (e.g. 20:1 ratio).   However the image is still powerful. It seems to tell us that the Boomer/X Gen bumps are registering at an expected capacity in comparison to other age bins.  The Gen Y / Gen Z (??) final column however, seems to be registering above expected capacity.

In my next post, I will examine the electoral turnout for these age groups for both 2008 and 2012 General Election in Whatcom County. Here is the link for the technical part of this post. : -)