Page 1 of 16
Noc19-cs33
Lec 06-Hadoop
MapReduce 2.0
(Part-I)
Page 3 of 16
Introduction of MapReduce; we have to say that MapReduce is a programming model and associated
implementation for processing and generating the large data set. Now in a version 2.0, MapReduce per
version 2.0, now we have separated out the programming model and as far as the resource management
and scheduling, is done using YARN, which is another component of Hadoop 2.0. So, in Hadoop 2.0 if
we see this kind of a stack. So it becomes three different layers. So, the first layer is HDFS 2.0, then, this
is Hadoop 2.0 and then comes the YARN and here the MapReduce version 2.0. So, now YARN will, do
the functionalities of resource management and scheduling which was a part of MapReduce 1.0 version.
So, this will make the simplification, into the MapReduce which is only now, the programming model it
will focus on the programming aspect and rest of this particular part that is the resource management and
the she ruling will be performed by YARN. So, with this, we can design many new applications on this
particular HDFS and YARN, bypassing the MapReduce also. So, we will see, the Hadoop stack or
different distributions, which either uses HDFS YARN or HDFS, YARN, MapReduce or top of it lot of
applications, are available for big data computations. Now, in this particular framework that is
MapReduce version 2 which is, now purely a programming model the users, can specify the map
function, that processes a key value pair, to generate a set of intermediate key value pairs, meaning to say
that the map function it is accepting the input, in the form of a key value pair. Which is quite trivial why
because, any data set we can represent in the form of a key value pair? So, the record is nothing but, a key
value pair. So, this will become the input to the map function and output is also, the specified in the key
value pair. So, what is the key and value pair? That we will discuss, in more detail in the for the slides
but, let us general let us understand let us, us at this point of time that any, set of records or a big data set
we can represent easily in the form of a key value pair. It's not a big thing that we are going to discuss.
Then the second part is called the, ‘Reduce Function’. So, reduced function merges, all the intermediate
values associated with the same intermediate key. So, that means the internet values or the output which
is generated by the map, will be the input to the, to the reduced function that is the output of map function
that is nothing but, in the form of a key value pair, that is in the intermediate form of the result and then it
will, now do the combination it will combine, it and give the final output. It will combine and combine by