A simple MapReduce program can be written to determine how many times different words appear in a set of files. For example, if we had the files:
foo.txt: Sweet, this is the foo file
bar.txt: This is the bar file
We would expect the output to be:
sweet 1
this 2
is 2
the 2
foo 1
bar 1
file 2
Naturally, we can write a program in MapReduce to compute this output. The high-level structure would look like this:
mapper (filename, file-contents):
for each word in file-contents:
emit (word, 1)
reducer (word, values):
sum = 0
for each value in values:
sum = sum + value
emit (word, sum)
0 comments:
Post a Comment