With the rise of post-partisan Obama on the national political scene, there have been sporadic stories in the print and on-line media , in Op-Eds, on the cable-news/YouTube and in the blogs; of how some influential Republicans have turned into Obama supporters, the so called Obamacans, reverse of Reagan-Democrats. Of course, not everybody is buying into the Obamacan story, considering it as a media creation or part of chaos theory. However, the recent claims by McClatchy newspapers’s that their “…. computer analysis, incomplete due to the difficulty matching data from various campaign finance reports, found that hundreds of people who gave at least $200 to Bush’s 2004 campaign have donated to Obama”, caught our eye at FortiusOne.

So, if there indeed are Bush donors who now have become Obamacans, the data-team wanted to find out where they are spatially speaking. Below are the maps of our efforts showing locations of possible Obamacans in New York City and Washington D.C. Why use the term possible? Because what is mapped are the results based on spatial join and attribute join, the later being a variation of spatial join. And the accuracy of the results of such joins is subject to the limitations imposed by the accuracy of the original data (donor addresses) as well as limitations of the geocoding operation. More on this towards the end of this post. So what is mapped are donor address matches and not individual donors.

Attribute Join
The attribute join is based on an identifier “XY” constructed from the concatenation of X and Y location coordinates of the Bush-Cheney and Obama donors, where the X and Y location coordinates are obtained by geocoding donor addresses. The attribute join resulted in 250 records across the lower 48 states, mostly concentrated in major cities of North-East and West-Coast. The results are shown below for New York city (lower Manhattan) and Wash D.C., where blue circles represent Obama donors (1,415 in D.C. and 1,825 in New York city); red circles represent Bush-Cheney donors (294 in D.C. and 419 in New York). The purple squares colocated with Bush-Cheney red circles are the XY “attribute matches.” There were 32 such locations in D.C and New York City had 85.

New York City: “XY” attribute join of Bush-Cheney donors with Obama donors

Washington D.C.: “XY” attribute join of Bush-Cheney donors with Obama donors

Spatial Join
Yet another way was to carry out a “spatial” join between location of each Bush-Cheney donor with all of the co-located Obama donors, resulting in more than 9,200 Bush-Cheney records colocated with more than 42,000 Obama records in the lower 48 states. The results are shown below for New York City (lower Manhattan) and Wash D.C., where again blue circles represent Obama donors, red circles represent Bush-Cheney donors, and the purple circles with varying sizes represent count of Obama donors that are colocated with each of the “spatially” joined Bush-Cheney donor. There were more than 1,500 Obama donors colocating with 248 Bush-Cheney donors in D.C. while the comparable figures for NY city are more than 2,030 Obama donors colocating with 303 Bush-Cheney donors.

Bush-Cheney donor locations spatially joined with Obama donors in NY City

Bush-Cheney donor locations spatially joined with Obama donors in Wash D.C.

Donor Data
You may find/download the mapped as well as other supporting datasets from the Finder! by using the key-word “Obamacans“. The supporting datasets also include spatial join of all Bush-Cheney donors for each of the Obama donors.

A strict one-to-one Name/Address match between the Bush-Cheney and Obama donors based on uniquely generated ID to identify the real Obamacans resulted in zero matches. Unique IDs were constructed by concatenating the upper case fields of each donor record: “LAST NAME”, “FIRST NAME”, “STREET ADDRESS”, “CITY”, “STATE” and “ZIPCODE”. See below for the explanation of such disappointing results.

Geocoding, Mapping “join” and the results
Individual Bush and Obama donor lists were compiled from the publicly available donor records from the campaign finance reports filed by all presidential candidates. Federal Election Commission (FEC) rules permit an individual donor to contribute more than once with the condition that the total of all such contributions may not exceed $2300 per election cycle (primaries and general). Since both Bush-Cheney and Obama donor list include such multiple donation records, the resulting geocoded data inherits the many donor-multiple records problem.

The geocoding success rate depends not only on the accuracy of the address information, but if street address is not provided, then that donor’s location will be geocoded to the centroid of the zipcode/city/state areas mentioned in the record. Hence, we decided to limit our analysis to only those geocoded records that had street address and with the geocoding success score of 90 or more out of max possible score of 100. Next, multiple records per individual were eliminated by uploading the data to Access database, creating a unique ID based on donor’s name/address and by running sqls to find match between Bush and Obama donors based on the unique key. As was stated earlier, the output showed zero matches. Some of the possible reasons for zero matches are: donor names are spelled differently at different times, address changes over 4 year time period, some donor addresses have just PO boxes or only zipcode/city/state information; and/or combinations of these, which results in poor geocoding output.


The difference in number of matches with the “XY” attribute join and spatial join can be explained as follows: in case of the former, there may be more than one record for each of Bush-Cheney record, however, attribute join is between a Bush-Cheney record and the 1st of the multiple Obama records, while in the case of later, the multiple join occurs between spatially adjacent records of both Obama and Bush-Cheney. Multiple records with same location address often occur in cities where geocoding cannot distinguish between a multi-storied location from single family homes or where many donors live on the same city block(s).

For the above analysis, nearly 150 K Bush-Cheney individual donor records were extracted from the campaign finance reports filed with FEC between Oct, 03 and Sep, 04. Similarly, more than 560 K Obama individual donor records were extracted from the monthly campaign finance reports filed with FEC between Jan, 08 and Apr, 08. After geocoding both the Bush-Cheney and Obama donor records, only those with a geocoding score of 90 and more out of max of 100 were selected, further all those records that were geocoded to the centroid of zipcode or city/state were deleted. Additionally all those multiple records of many donors were aggregated to just single record, making the final tally for Bush-Cheney and Obama to be 15,742 and 38,042 donor records for the lower 48 states. “Spatial join” and “XY” attribute join between Bush-Cheney and Obama yielded a little over 9,270 and 250 records respectively.

So what is shown in the maps are address matches between Bush-Cheney and Obama donors rather than the donors themselves. As is evident from many of the articles in the MSM, there may indeed exist a few Obamacans but our analysis of FEC campaign finance reports fails to name the names. I would love to hear suggestions on how to improve on these results or better yet, a methodology to spatially identify/locate the Obamacans.

Tagged with:
 

4 Responses to Dataset of the day: Where are the Obamacans?

  1. Raj,

    What were the exact input files you used from the FEC? As far as I can tell the downloadable FTP files do not contain individual donor street address:

    http://www.fec.gov/finance/disclosure/ftpdet.shtml

    Did you scrape the donor search pages to get the data?

    -Pete

  2. Luke Opperman says:

    These are the files with the full contributor info: ftp://ftp.fec.gov/FEC/electronic/

  3. My mistake, the electronic FEC filings do have street address…

    -Pete

  4. You are my inspiration , I own few blogs and rarely run out from to post .