Curating the Dark Data in the Long Tail of Science
There is a wealth of scientific data that is almost impossible to see. This is science's dark data. Much of this data resides in the long tail of science or "small" data collection efforts. Instrumentation has made it possible to develop large collections of relatively homogeneous data, be it from space sensors or high throughput gene sequencers. The monolithic collections are easy to find and search. Dark data on the other hand may constitute the larger mass of scientific information. The collections that make up the dark data of science are much smaller but also much more numerous, being generated by thousands of scientists, on a much broader number of scientific questions, and in a complex array of formats. Unfortunately, it is also more prone to be overlooked and lost over time. Using new technology, the economics of the internet, and change in the sociology of science it is possible to make greater use of this data than was possible in the past. Data curators are the people who develop and use these technologies and procedures to make this data more useful, insuring a more efficient return on investment in the enterprise of science.
Speaker: P. Bryan Heidorn
Program Manager, National Science Foundation Division of Biological Infrastructure and Associate Professor, University of Illinois
Google Tech Talks
August 28, 2008