Parallel processing model at Google

Google Fellows Jeff Dean and Sanjay Ghemawat published a paper in the January issue of Communications of the ACM that details the programming model Google leverages to process more than 20 petabytes of data every day on commodity-based clusters.

The method, known as MapReduce, lets users break computations into a map and a reduce function, which the runtime system automatically parallelizes across large clusters while navigating machine failures and honing the efficiency of network and disk use in the process. The methodology abstracts parallelization, fault tolerance, data distribution, and load balancing into a library.

More on MapReduce is on Google Code.

Leave a Reply