We have already seen MapReduce. Now, lets dig deep into the new data processing model of Hadoop. Hadoop comes with a advanced resource management tool called YARN, and packaged as Hadoop-2.x.
What is YARN?
YARN stands for Yet Another Resource Navigator. YARN is also known as Hadoop Data Operating System — YARN enables data processing models beyond traditional mapreduce, such as Storm (real time streaming), Solr (searching) and interactive systems (Apache Tez).
Hadoop committers decided to split up resource management and job scheduling into separate daemons to overcome some deficiencies in the original MapReduce. In traditional MR terms, this is basically splitting up jobtracker to provide flexibility and improve performance.
YARN splits the functionality of a JobTracker into two separate daemons:
A global Resource Manager (RM) that consists of a Scheduler and an Applications Manager.
An Application Master (AM) that provides support for a specific application. It runs on every node and controls execution of applications. (Here application could mean a single MR job or a Directed Acyclic Graph (DAG) of jobs).
YARN, also referred to as MapReduce 2 is not an improvment over traditional MapReduce data processing model. It merely provides a resource management model that executes MapReduce jobs.
How to deploy Hadoop-2.x YARN?
We will look at how to deploy YARN in a single node cluster setup. The following are system requirements:
Java 6 or above
wget http://your download link here/hadoop-2.2.0.tar.gz
unpack the tarball:
tar -zxvf hadoop-2.2.0.tar.gz
Create HDFS directory inside Hadoop-2.2.0 folder:
mkdir -p /data/namenode
mkdir -p /data/datanode
Rename hadoop-2.2.0 to hadoop but I wouldn’t suggest that.
Go to Hadoop-2.2.0/etc/hadoop/hadoop-env.sh and copy/paste the following:
export JAVA_HOME= < your java path here >
<your path for lib folder in Hadoop-2.2.0> type pwd from lib and paste it below.
export HADOOP_OPTS="$HADOOP_OPTS -Djava.library.path=/home/hduser/hadoop-2.2.0/lib"
Similar to Hadoop 1 (traditional hadoop), make changes in core-site, mapred-site, yarn-site, hdfs-site:
In core-site.xml, copy/paste:
In hdfs-site.xml, copy/paste:
<value> <path to your hadoop> /hadoop/data/namenode</value>
<value> <path to your hadoop> /hadoop/data/datanode</value>
In mapred-site.xml, copy/paste: (create mapred-site.xml by vim mapred-site.xml , if file is not available)
In yarn-site.xml, copy/paste:
Either update your bashrc file or create yarnconfig.sh and paste the following:
export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk <your java path here>
export HADOOP_HOME=/home/hduser/hadoop-2.2.0 <your hadoop path here>
source ~/.bashrc or
Before you run YARN, go to Hadoop-2.2.0 folder and type to format the namenode
bin/hdfs namenode -format
To start Hadoop:
You can check whether everthing went well by checking jps. Ideally you should have the following:
Note: The PID might be different but the processes are the same.
Access the UI for:
Try some examples in your new YARN:
bin/hadoop jar /home/hduser/hadoop-2.2.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar pi 4 15