We are going to look at installing spark on a Hadoop. Lets try to setup hadoop yarn here once again with screenshots from scratch, as i received some comments that my installation needs more screenshots so i am doing one with screenshots. In this post, we will look at creating a new user account on Ubuntu 14.04 and installing Hadoop 2.5.x stable version.
To create new user,
enter your admin password to set up your root passwd
sudo adduser <new user-name>
enter the details
now providing the root access to the new user
add the line
new user-name ALL = (ALL:ALL) ALL
if you want to delete new user then
sudo deluser <new user-name> from account with sudo privileges ( not guest)
Oracle jdk is the official. to install oracle-java 8, add oracle -8 to your packet manager repository and then do an update. install only after these steps are completed.
sudo apt-get install python-software-properties
sudo add-apt-repository ppa:webupd8team/java
sudo apt-get update
sudo apt-get install oracle-java8-installer
Quickest way to setup java home
sudo update-alternatives --config java
copy the path of java 8 till java-8-oracle. for instance
sudo nano /etc/environment
export JAVA_HOME = "/usr/lib/jvm/java-8-oracle"
if you echo, you will see the path.
Setting up passwordless ssh:
Look up my previous posts on ssh introduction. we will just directly jump into passwordless ssh with screenshots
Generate the key pair
Create a folder in localhost and permanently add those keys generated to the localhost
Thats it. You are done.
Install hadoop 2.5 stable version:
tar xzvf hadoop-2.4.1.tar.gz
mv hadoop-2.4.1 hadoop
Create HDFS directory inside hadoop folder:
mkdir -p data/namenode
mkdir -p data/datanode
You should have these:
hadoop-env.sh and update the java home path, hadoop_opts, hadoop_common_lib_native_dir. it is in
Edit core-site.xml and add the following:
create a file called “mapred-site.xml” and add the following:
Edit hdfs-site.xml and add the following:
Edit yarn-site.xml and add the following:
Now, when you run the start-yarn/start-dfs files under sbin, you will get the following screen:
Obtain the latest version of Spark from http://spark-project.org/download. To interact with Hadoop Distributed File System (HDFS), you need to use a Spark version that is built against the same version of Hadoop as your cluster. Go to http://spark.apache.org/downloads.html and choose the package type: prebuilt for hadoop-2.4 and download spark. Note that Spark 1.1.0 uses scala 2.10.x. So we need to install scala.
Lets install Scala:
wget http://www.scala-lang.org/files/archive/scala-2.9.3.tgz to get the path
tar -xvf scala-2.9.3.tgz
You will probably want to add these to your .bashrc file or equivalent:
We also need something called sbt. Sbt stands for simple build tool but to me it seems to be more complicated than Maven. You can still use maven to build however, I would suggest to get acquainted with sbt, if you are interested in exploring Scala in general.
More on the next post.