R programming

H2O and Machine Learning

Working with H2O has been quite an experience so far. Lets look at how to set it up. We can setup H2O as standalone server, install in R or install in Hadoop. Setting it up on standalone is quite simple.

download the zip file
unzip h2o-version.zip
cd into the directory
java -jar h2o.jar

Go to http://localhost:54321/ to see your output. It should look like this:


Installing in R requires slightly complex steps, especially if you are working with Ubuntu or linux.

Install the package by the following command:

install.packages("h2o", repos=(c("http://s3.amazonaws.com/h2orelease/h2o/master/", getOption("repos"))))

Initialize the package and verify that H2O installed properly:


localH2O = h2o.init()


Installing in Hadoop requires you to have a Cloudera or Hortonworks or MapR version of Hadoop running on your system because this is what I found inside the h2o/hadoop directory. You can see the drivers for these versions of Hadoop only.


Computational Statistics setup

With the demand for analytics fast growing, many professionals are looking at leveraging open source systems for their analytics tasks. “R” or “R programming” is one of the famous statistical programming tool that is used for scientific computations. R programming language can also be linked with Hadoop to perform scalable/big data analytics.

Here we are going to look at installing R programming and R Studio IDE for R programming. Specifically on Linux(Debian). If you follow the code sequentially then you will be installing R in no time!

sudo apt-get update
sudo apt-get install r-base

If you want to compile R-packages then also install the following package as well.

sudo apt-get install r-base-dev

A number of R packages are available for Debian and they have the names starting with r-cran-****. These are usually kept upto date. Packages may require some build dependencies for it to run smoothly. Users should be aware of this and the following command would help:

sudo apt-get build-dep r-cran-

Now we have the R ready. Type ‘R’ on terminal to get the command line interface.

Next, we would like to install RStudio. You can either download the latest debian package from Rstudio or if you have ubuntu, then open software center and type Rstudio. IF you choose to download R debian package from the site, use the following command:

wget http://download1.rstudio.org/rstudio-0.98.495-i386.deb
sudo dpkg -i rstudio-0.98.495-i386.deb

sudo dpkg --install package-name-here.deb