As we know already that Mahout is sunsetting its mapreduce algorithms support and moving to advance data processing systems that are significantly faster than mapreduce, today we will see one of the Mahout’s latest system: Mahout Scala and Spark bindings package.
If you had hands on with either R’s command line on Linux or Julia on Linux, you will learn this new package pretty quick. Note: Julia is a open source scientific computing and mathematical optimization platform on linux.
Lets look at how to set up Mahout spark shell on linux without hadoop. It is very simple and straight forward, if you follow these steps.
Note: Always check out mahout and spark latest version , else you will end up with java.lang.AbstractMethodError (version mismatch)
First, lets setup Spark:
Looks simple but, careful in what you are trying to select. I would choose the latest version under spark release and choose the “source code” under package type.
Once downloaded, build using sbt. This will take close to an hour.
Secondly, clone Mahout 1.0 from github:
git clone https://github.com/apache/mahout mahout
and build Mahout using Maven.
To start Mahout-spark shell go to spark folder and do a
Obtain the spark url master (if you are localhost, then it would be:
mahoutspark.sh file and type in the following:
save it and run “. mahoutspark.sh” followed by going into Mahout directory and a “
You would get the following screen: