Todd came into town for the Gov2.0 Summit last week, and in additional to dropping off a terabyte worth of data from Afghanistan, he talked a bit about what has made the “beer for data” program work at the Taj. Outside the universal thirst for beer data sharing success boiled down to three basic principles:

1) Create immediate value for anyone contributing data: when users contribute data they should get an immediate return on that investment. In the case of the Afghan pilot that meant getting to see your contributed data on a map of high resolution satellite imagery as soon as you uploaded it. The imagery for Afghanistan was made available by NGA, then tiled and served up by a Fusion Server, graciously donated by Google.

2) Make contributor’s data available back to them with improvements: any data that goes in should be available to download back out again. Further, the data should come back better than when it went in. In the Afghan pilot this meant if you shared data in a spreadsheet format into the platform you could get it back out as KML, shapefile, Atom, JSON, spatialite etc. (Addendum to principle 2 – PDF’s are evil, and make parsing and extracting data into a sharable format complete misery.)

3) Share derivative works back with the data sharing community: urge users who create derivative works, with shared data, to contribute their data products back to the group. In the case of the Afghan pilot researchers were taking the detailed data from the field and feeding it into their sophisticated models and simulations. Researchers would then upload the results into GeoIQ to share the derivative works back with the data sharing community. This meant that agencies and individuals that shared data again got a better product back by contributing. The researchers get better data to feed their models, and a self perpetuating feedback loop is created that sustains increasing data sharing.

While these sound like simple principles, it is amazing how often they are not followed and effective data sharing is blunted. Too often data sharing – especially with government and corporations – is a black hole. Data goes in but never comes back out. Also it is rare to see the positive feedback loops of researchers sharing their work products back with the data sharing community. Too often researchers get wrapped around the axle on their products being proprietary or sensitive. While this can be the case there is huge benefit in feeding results back to gauge their veracity and accuracy. I’ve definitely seen way too many models that look great in the lab and completely fall apart in reality because researchers would not feed results back to the field for verification and error bounding. I’m hoping we’ll have more opportunities to implement these principles in future projects and we can see the success of Todd’s work in Jalalabad duplicated hundreds of times over.

 

2 Responses to Huffman's Three Principles for Data Sharing

  1. [...] the Map offers Huffman’s Three Principles for Data Sharing, which are really principles for data-collection and -display [...]

  2. Denis Pasqua says:

    What’s Happening i’m new to this, I stumbled upon this I’ve discovered It absolutely helpful and it has aided me out loads. I am hoping to contribute & aid different users like its aided me. Good job. London 69 Escorts Agency , 020 3011 8081