Todd came into town for the Gov2.0 Summit last week, and in additional to dropping off a terabyte worth of data from Afghanistan, he talked a bit about what has made the “beer for data” program work at the Taj. Outside the universal thirst for beer data sharing success boiled down to three basic principles:
1) Create immediate value for anyone contributing data: when users contribute data they should get an immediate return on that investment. In the case of the Afghan pilot that meant getting to see your contributed data on a map of high resolution satellite imagery as soon as you uploaded it. The imagery for Afghanistan was made available by NGA, then tiled and served up by a Fusion Server, graciously donated by Google.
2) Make contributor’s data available back to them with improvements: any data that goes in should be available to download back out again. Further, the data should come back better than when it went in. In the Afghan pilot this meant if you shared data in a spreadsheet format into the platform you could get it back out as KML, shapefile, Atom, JSON, spatialite etc. (Addendum to principle 2 – PDF’s are evil, and make parsing and extracting data into a sharable format complete misery.)
3) Share derivative works back with the data sharing community: urge users who create derivative works, with shared data, to contribute their data products back to the group. In the case of the Afghan pilot researchers were taking the detailed data from the field and feeding it into their sophisticated models and simulations. Researchers would then upload the results into GeoIQ to share the derivative works back with the data sharing community. This meant that agencies and individuals that shared data again got a better product back by contributing. The researchers get better data to feed their models, and a self perpetuating feedback loop is created that sustains increasing data sharing.
While these sound like simple principles, it is amazing how often they are not followed and effective data sharing is blunted. Too often data sharing – especially with government and corporations – is a black hole. Data goes in but never comes back out. Also it is rare to see the positive feedback loops of researchers sharing their work products back with the data sharing community. Too often researchers get wrapped around the axle on their products being proprietary or sensitive. While this can be the case there is huge benefit in feeding results back to gauge their veracity and accuracy. I’ve definitely seen way too many models that look great in the lab and completely fall apart in reality because researchers would not feed results back to the field for verification and error bounding. I’m hoping we’ll have more opportunities to implement these principles in future projects and we can see the success of Todd’s work in Jalalabad duplicated hundreds of times over.
Welcome to the Esri DC Development Center blog. We write about features of our work on big data analytics, open platforms, and open data, what is new and exciting in the Esri and community, and general industry thought leadership and discussions of geospatial data visualization and analysis.
Please explore what we're working on and let us know if you have any questions or ideas!
- Tile Layer
- Dataset from 'Czech cities and population' with a 10km buffer
- Aggregation of crime95 into zones
- Result of percent change in population against Dataset from 'Bus; underground and railway stops in London' with a 100m buffer
- Dataset from 'Bus; underground and railway stops in London' with a 100m buffer
- Merge of 'Military Installations, Ranges, and Training Areas Points' into 'Garcia Date Place of Enlistment'