Posts tagged mapreduce
Apache Hadoop
Nov 26th
Apache Hadoop presentation which I gave at my workplace. Its a beginners guide for developers.
MapReduce
Aug 4th
The computer was originally designed as a sequential processor. This notion has become ingrained in our minds. We are often bogged down by this constraint while developing algorithms.
Advancements in computers have brought in more and more parallelism but our algorithms have not started to embrace it. Map Reduce is a design paradigm which forces us to think parallel. The algorithms developed using this method as well suited for running on parallel computers.
Map Reduce Algorithms, as the name suggests, work in 2 steps. The first step called “Map” consists of tasks that can be done in parallel. This step generates the intermediate results which are then passed to the Reduce step. In the Reduce step these result is collated to generate the final results. The second step is the sequential part.
The current algorithms we have usually do the Map steps sequentially inside a loop and keep collating the results. These results are presented at the end of the loop.
As simple example, take the case of counting the no of words in a file.
// Normal algorithm public int noOfWordsInFile(String fileName) { filecount = 0; while(not eof) { line = read line from file; linecount = noOfWordsInLine(line); filecount = linecount + filecount; } return filecount; }
|
// Map Reduce algorithm public void map(String line) { lineCount = noOfWordsInLine(line); save lineCount } public void reduce(int linecount) { load fileCount; fileCount = fileCount + lineCount. save fileCount; } |
Update : Some folks though that this is my idea.. I have just rephrased what has been talked about already.
Find more details about at .. Mapreduce