Explain the WordCount implementation via Hadoop framework ?


 We will count the words in all the input file flow as below
 input
Assume there are two files each having a sentence
Hello World Hello World (In file 1)
Hello World Hello World (In file 2)
Mapper : There would be each mapper for the a file
For the given sample input the first map output:
< Hello, 1>
< World, 1>
< Hello, 1>
< World, 1>
The second map output:
< Hello, 1>
< World, 1>
< Hello, 1>
< World, 1>
Combiner/Sorting (This is done for each individual map)
So output looks like this
The output of the first map:
< Hello, 2>
< World, 2>\
The output of the second map:
< Hello, 2>
< World, 2>
 Reducer :
It sums up the above output and generates the output as below
< Hello, 4>
< World, 4>
Output
Final output would look like
Hello 4 times
World 4 times

0 comments:

Post a Comment