I would recommend Ubuntu 12.04 and never ever (even if you are paid a big bag of money) upgrade. Whether if you perfer Ubuntu or Mint or Debian or Fedora or CentOS, always always go for Long Term Support or as they call it LTS. My personal suggestion.
Now, unfortunately Hadoop being a distributed computing platform, to get hands on, not every one has few hundred machines at home. It is rare people have more than 1 desktop or laptop. So it is important to learn how to setup a single server node in your own machine. Follow the simple steps:
$ sudo apt-get update
You need Java. Hadoop Framework almost entirely written in Java.
$ sudo apt-get install <open jdk or oracle jdk version >= 6>
It would be great if a user creates seperate account just for Hadoop (or not if you are super organized).
Now we need to configure SSH. SSH is another framework designed to create secure connections across multiple computers so your HDFS is safe and secure — no one can access it. Ubuntu comes with pre-installed SSH but just incase
$ sudo apt-get install ssh
$ ssh-keygen -r rsa
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
Once this is done, try the following:
This step should proceed through without requiring password.
Next step is to download Hadoop. I’d recommend Hadoop 1.2.1 Stable version.
tar -xzvf hadopp-1.2.1.targ.gz (just press tab, no need to type in terminal)
cd into the hadoop directory -> cd into conf directory
Update the following files : core-site.xml, mapred-site.xml, hdfs-site.xml, masters, slaves with the following:
<description>A base for other temporary directories.</description>
<description>The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri’s scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The uri’s authority is used to
determine the host, port, etc. for a filesystem.</description>
<description>The host and port that the MapReduce job tracker runs
at. If “local”, then jobs are run in-process as a single map
and reduce task.
<description>Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
Masters, Slaves: delete all information inside the file and just type ‘localhost’
Now update the hadoop configuration script – hadoopenv.sh with path of Java home
Also create a new run time configuration file <your-config-file-name>.sh and add the following path:
1. I have copy pasted my configuration here so look for any typos or clerical errors
2. Some books say that you need to get into bash.sh to add these information. I would strongly disagree with those books. As many of us are not expert users of linux and it is likely we may delete something imporant in bash script which would result in OS not booting properly! (usually we open in vi editor and if you are not familiar with vi commands you can make a mess).
3. Open a nano editor nano hadoopconfiguration.sh and add these information. That way you need to run this config file during each session but it is highly recommended that way. Stay away from bash script
Now we are ready to launch Hadoop! Follow the steps in same order
cd into hadoop -> bin/start-all.sh
this will popup the following:
starting namenode, logging to /home/hduser/hadoop-1.2.1/libexec/../logs/hadoop-hduser-namenode-ubuntu.out
starting datanode, logging to /home/hduser/hadoop-1.2.1/libexec/../logs/hadoop-hduser-datanode-ubuntu.out localhost:
starting secondarynamenode, logging to /home/hduser/hadoop-1.2.1/libexec/../logs/hadoop-hduser-secondarynamenode-ubuntu.out
starting jobtracker, logging to /home/hduser/hadoop-1.2.1/libexec/../logs/hadoop-hduser-jobtracker-ubuntu.outlocalhost:
starting tasktracker, logging to /home/hduser/hadoop-1.2.1/libexec/../logs/hadoop-hduser-tasktracker-ubuntu.out
When you type
jps you should get the following:
Your Hadoop setup is ready! Congratulations. Now your first operation should be formatting the namenode
hadoop/bin> hadoop namenode -format
Should you wish to terminate your current Hadoop session type
Hope you find this helpful. Comment here if you face any problems installing.