lec6.pdf

Page 1 of 16

Noc19-cs33

Lec 06-Hadoop

MapReduce 2.0

(Part-I)

Introduction of MapReduce; we have to say that MapReduce is a programming model and associated

implementation for processing and generating the large data set. Now in a version 2.0, MapReduce per

version 2.0, now we have separated out the programming model and as far as the resource management

and scheduling, is done using YARN, which is another component of Hadoop 2.0. So, in Hadoop 2.0 if

we see this kind of a stack. So it becomes three different layers. So, the first layer is HDFS 2.0, then, this

is Hadoop 2.0 and then comes the YARN and here the MapReduce version 2.0. So, now YARN will, do

the functionalities of resource management and scheduling which was a part of MapReduce 1.0 version.

So, this will make the simplification, into the MapReduce which is only now, the programming model it

will focus on the programming aspect and rest of this particular part that is the resource management and

the she ruling will be performed by YARN. So, with this, we can design many new applications on this

particular HDFS and YARN, bypassing the MapReduce also. So, we will see, the Hadoop stack or

different distributions, which either uses HDFS YARN or HDFS, YARN, MapReduce or top of it lot of

applications, are available for big data computations. Now, in this particular framework that is

MapReduce version 2 which is, now purely a programming model the users, can specify the map

function, that processes a key value pair, to generate a set of intermediate key value pairs, meaning to say

that the map function it is accepting the input, in the form of a key value pair. Which is quite trivial why

because, any data set we can represent in the form of a key value pair? So, the record is nothing but, a key

value pair. So, this will become the input to the map function and output is also, the specified in the key

value pair. So, what is the key and value pair? That we will discuss, in more detail in the for the slides

but, let us general let us understand let us, us at this point of time that any, set of records or a big data set

we can represent easily in the form of a key value pair. It's not a big thing that we are going to discuss.

Then the second part is called the, ‘Reduce Function’. So, reduced function merges, all the intermediate

values associated with the same intermediate key. So, that means the internet values or the output which

is generated by the map, will be the input to the, to the reduced function that is the output of map function

that is nothing but, in the form of a key value pair, that is in the intermediate form of the result and then it

will, now do the combination it will combine, it and give the final output. It will combine and combine by