Teaching MapReduce

This post describes an "embodied learning" technique for teaching MapReduce, used for our module in Data, Schemas and Applications. To try and get across the way that MapReduce works, I had students process index cards for the problem of social networks friend recommendations. The algorithm implemented was that described in a useful post by DB Tsai: DB Tsai's Hadoop M/R to implement “People You Might Know” friendship recommendation. I found some other examples, but most seemed to use two MapReduce steps, whereas this only used one.

I had first introduced the pattern and the main steps and emphasised the importance of key(value) pairs as input and intermediate output.

MapReduce overview

The input data

I put each of these on an index card - minus the id numbers - to give to pairs of students who were acting as mappers. The Key is a person in the network and the value is a list of their current friends. I got my names from the Hollyoaks character page - but the id numbers below correspond to DB Tsai's sample data, so any name can be substituted in

0 Tony (Cindy, Jack, Nancie)

1 Cindy (Tony, Jack, Nancie, Charlie, Sonny)

2 Jack (Tony, Cindy, Charlie)

3 Nancie (Tony, Cindy, Charlie)

4 Charlie (Cindy, Jack, Nancie)

5 Sonny (Cindy, George)

6 George (Sonny)

Map task

Based on the example below, I ask students to put each output line on a new card, with the Key written on the back. These were then placed in piles with key uppermost.

//Pseudocode

//Example: Fred is friends with Barney, Wilma and Thelma (n=3)

// We emit a -1 for each existing friend and then all possible combinations of the other friends

Emit :

Fred (Barney, -1)

Fred (Wilma, -1)

Fred (Thelma, -1) // (Friends that Fred already has)

Barney(Wilma, Fred)

Barney(Thelma, Fred)

Wilma(Barney, Fred)

Wilma(Thelma, Fred)

Thelma(Barney, Fred)

Thelma(Wilma, Fred //(Possible friend for Thelma, one mutual friend “vote”)

--Check (n**2 ouput records)

--Store under KEY

students undertaking the map task

sorted index cards

The Reduce Task

Another pair of students is responsible for collecting up the cards for one of the people from all the mappers (the "shuffle" stage) and then doing the reduce using the following example as a guide

//pseudocode

Sum how many mutual friend votes each candidate recommendation has for the person

If any of them has mutual friend -1, we don’t make the recommendation since they are already friends.

Present in reverse order of the count

e.g.

Fred(Thelma, Barney)

Fred(Thelma, -1)

Fred(Wilma, Barney)

Fred(Wilma, Zak)

//To Fred, we recommend Wilma, with a count of 2 mutual friends. Fred and Thelma are already friends so we don’t recommend her.

The result

This is what we should end up with

Tony (Charlie (3) Sonny (1))

Cindy (George (1))

Jack (Nancie (3), Sonny (1))

Nancie (Jack (3), Sonny (1))

Charlie (Tony (3), Sonny(1))

Sonny (Tony(1), Jack(1), Charlie(1), Nancy(1))

George (Cindy(1))

Lessons Learned

  • You need to be really clear to the mappers on exactly how to record and sort the output (that each combination should be on a new card and that the key should be on the back and uppermost.)
  • There are some good opportunities to point out inefficiencies (ie that some mappers finish before others then are idle until the last mappers are done.)
Tags : workshop, analytics,
blog comments powered by Disqus


Powered by TextPress. Creative Commons Licence
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.