A Citizens Guide to Elections Data Analysis

Collapsing the data from individual to date

A hat tip for today’s posting goes to Charles Stewart of MIT, whose “Political Science Laboratory” course inspired me to engage my introductory statistics students in data management using real data sources.

Regular readers of this blog may have seen graphics plotting the daily ballot returns from North Carolina.  The graphics are identical to the kind of ballot chasing  engaged in by the presidential campaigns, and really any campaign in a state with substantial early voting.

The ballot return information is a public record, and theoretically, any citizen, organization, or campaign should have equal access.  Unfortunately, things aren’t so simple.  As Michael McDonald reports:

Election officials may not report early voting statistics. I attempt to collect as much of the information about these ballots as possible. However, I do not hound election officials for these statistics because they are busy doing the important work of preparing for the upcoming election. Sometimes data will be available only at the local level. I cannot continuously scan for local data, so I appreciate tips on where to find data.

I wish every state made these data available for a free electronic download.  If your state does not, I urge you to contact your state legislator and see why not.

But suppose you do have these data: what do you do with them?

It turns out that it’s not very hard to go from individual level vote reports to turnout information, if you have the right toolbox.  The tool you need is a statistical program capable of reading in datafiles that have hundreds of thousands of cases.  That’s too many for Excel.  The most commonly used packages in political science are Stata (the example shown below) and R.  (The big advantage of R is that it is publicly available, but I’m not conversant yet with the software.  My hopes are that some entrepreneurial reader of this blog will translate the Stata code into R code.)

With the tools in hand, the steps involved can seem confusing, but if you follow the attached presentation, I think not too difficult.  In brief:

  1. You start with individual voter records that include the name, age, party, date that the absentee ballot was requested, date that the absentee ballot was returned, and the status of the absentee ballot.  (We’re want to know if the ballot was “accepted” or not.)  The data file is freely downloadable at ftp://www.app.sboe.state.nc.us/enrs/absentee11xx06xx2012.zip

    The file looks something like this

    VOTER CODE   JOHN SMITH  123 MAIN ST  RALEIGH NC …  DEM … 10/1/2012   10/15/2012  BY MAIL  ACCEPT

  2. You need to convert the date variables, which look to statistical programs like a string of characters (e.g. 10/10/2012) to a “date” variable.
  3. We count up how many partisan requests there were for absentee ballots.
  4. You need to code the ballot as accepted (voted = 1) or not (voted =0).
  5. Now things get tricky.  We “collapse” the data so that our smaller data file is organized by date and by party.  The file will end up looking like this:

    DATE               DEMS      REPS     UNA      DEMVOTED  REPVOTED  UNAVOTED
    10/15/2012     10,219      9221      8217       123                   .                        .
    10/15/2012     10,219      9221      8217       .                        347                   .
    10/15/2012     10,219      9221      8217       .                        .                        456

    This made up file shows that on Oct. 15, 123 Democratic ballots were returned, 347 Republican ballots, and 456 Unaffiliated ballots.

  6. With this file in hand, we “cumulate” the number of returned ballots, divide by the number in each party, and voila!  We have the percentage of partisan ballots returned by day.

Obviously, it’s a bit more complicated than that, but I hope this powerpoint presentation (PDF format) that I prepared for my class can guide anyone through the process.  The Stata do file referenced in the Power Point can be downloaded as well.

And early voting there looks to shatter past records.  

I fielded inquiries about early voting and Hurricane Sandy, as did my colleague Doug Chapin and my friends at the NCSL.

Short summary:  every year throughout the last dozen years, one or two states have looked at emergency provisions for elections, and at least half have done something during that period.  The most recent states, for instance, are DE and SD.

Pundits are already speculating about political advantages and disadvantages.  That’s unfortunate, but it takes some real cluelessness for former Rep. Ginny Brown-Waite (FL) to make this statement:

This year is different than five, 10 or 20 years ago when there was only one day to vote and the availability of much more restrictive absentee ballot rules in place.  Now there are several days of early voting and an easier process to get an absentee ballot.  Voters no longer have to wait till election day and really have no excuse not to get it accomplished.  America will be very disappointed if the President tries to extend voting beyond Nov 6 to increase his chances of re-election.

Hello?  It’s a bit late to request a no-excuse absentee ballot when your power is out and streets are flooded!

There’s a much larger issue, however: the lack of emergency preparedness on the part of government officials.

I can’t help but recall the stillborn report from The Continuity of Government Commission which tried to fix serious deficiencies in the American system of succession. In that case, the requirement for elections in the case of mass vacancies or incapacitation could result in a non-functioning legislative branch for months.

No one seems to have contemplated what would happen if a natural disaster on the scale of Hurricane Sandy were to hit just one week later than it did this year.

That’s lack of disaster planning on an epic scale.

The Early Voting Transformation

A number of reporters have asked me how early voting may have changed campaigns.  I describe a longer period of voter mobilization.  I describe  get out the early vote rallies, such as Obama is holding in Illinois this week.  And I talk about how Election Day has been changed from a day where half or more of the citizenry go to a local school, community building, or government office to cast a ballot to the end of a two or three week period of balloting.

But sometimes a picture tells a thousand words, and I think the graphic below, comparing early voting rates in North Carolina in 2004, 2008, and for the first five of early voting in 2012, says it all.

If you were campaigning in the Tarheel State in 2004, elections were all about the first Tuesday after the first Monday in November.  A few days before the 2004 election, about 15% of Democrats and Republicans had voted early.  A week out, less than 10% of ballots were cast.  These voters mattered, of course, especially in a close contest, but campaigns kept their resources in check to focus on the 85% of partisans who cast an Election Day ballot.

In 2008, Barack Obama’s candidacy was the trumpet of Joshua that felled the Election Day wall.  Anyone familiar with the 2008 race cannot forget the long lines of Black voters waiting in the Fall heat to cast a historic vote for the candidate who would eventually be elected as our first African American president.

But it wasn’t just Obama.  Usage among Republican and Unaffiliated voters also leapt in response to the key legislative change: making absentee voting a “one-stop” process, essentially converting it into early in-person voting.  The result was that 2.6 million out of 4.4 million ballots were cast early and, at least for Democrats, half of those came in 7 or more days before Election Day.

Fast forward to 2012.  Once again, voters are enduring long early voting lines.  Democratic rates in particular are exceeding 2008 rates.  Republicans are lagging, but still are turning out earlier than in 2008.  And any candidate who wants to win the state has to be already on the ground, because if they aren’t, their opponent could be 30-40% ahead by election day.

A Rising Democratic Tide in North Carolina?

There are 720,694 early in-person ballots processed by the State Board of Elections in NC as of this morning.  We finally have enough leverage–and enough days–to compare the turnout rates and trajectory to previous elections.

Signs of a rising Democratic tide, at least in this one state, appear to be accurate.  The gap between the 2008 rate and the 2012 rate widened for the first three days of early in person voting and has held steady since then.  The GOP, by comparison, is not doing much better in 2012 than they did (as a proportion of identifiers) in 2008.

We’ll be updating these graphics every few days as early voting continues.

Data from NC Board of Elections Website

In Kitsap County, WA, heavy stock used to produce the ballot means that two stamps will be needed to return it by mail.

Dirty not very well-kept secret: USPS will deliver it anyway, and the county office had pledged to make up the difference.

Reporters FAQ 2: How Absentee Ballots are Processed, Scanned, and Tabulated

I just got off of a series of phone calls with reporters who are asking about absentee ballots and how they are treated by elections officials.

While the administrative rules and procedures vary by state (as with almost everything in American elections), there are some consistent patterns that reporters need to understand.

  1. Processing:
    Absentee ballots go through a number of steps before they are fed into a counting machine.  The signature on the external envelope needs to be verified.  This is done either with a computer or with a human, and there are always backups when signatures are deemed questionable.  The ballot is then separated from the external envelope–this is done to maintain the secrecy of the ballot (except in North Carolina where, at least in the past, it was possible to relink the two via a security code).A few states are “voter intent” states (California, Oregon, Washington, perhaps others), and in these states, the ballots are then examined and “remade” by ballot review boards.   In other states (e.g. Arkansas) this process does not take place unless an absentee ballot is rejected by the ballot counting machine.
  2. Scanning:
    Ballots are then typically scanned using an optical character recognition machine.  This information is stored on a memory card.
  3. Tabulating:
    Finally, at some point, an elections official hits the “tabulate” button that provides the candidate totals for the absentee ballots which have been scanned into the machine.  There is not, of course, a big Staples type “total” button–what this practically means is that the machine creates a report that contains a number of pieces of information, such as total ballots counted, total ballots accepted, total votes for each candidate in each race, and, depending on the report, candidate totals by precinct.
    (Here is an example of one such report from Bay County, FL from the November 2011 election.)

It's important to understand these distinctions, because many journalists don't realize that "scanning" is not the same as "tabulating."