Seattle Conference on Scalability: MapReduce Used on Large Data Sets
Google Tech Talks
June 23, 2007
2007 Google Seattle Conference on Scalability:
Using MapReduce on Large Geographic Datasets
Speaker: Barry Brumitt, Google, Inc.
MapReduce is a programming model and library designed to simplify distributed processing of huge datasets on large clusters of computers. This is achieved by providing a general mechanism which largely relieves the programmer from having to handle challenging distributed computing problems such as data distribution, process coordination, fault tolerance, and scaling. While working on Google maps, I've used MapReduce extensively to process and transform datasets which describe the earth's geography. In this talk, I'll introduce MapReduce, demonstrating its broad applicability through example problems ranging from basic data transformation to complex graph processing, all the in the context of geographic data.