Book Review: Apache Mahout Cookbook

bookQuick summary:
Very well written for Developers who are new to both Mahout and Machine Learning, with walk-throughs and screenshots. However, if you have experience in writing heuristics/have expertise in Machine learning, you can skip this book. Concise and to the point, few clerical errors and typos, though. This book certainly makes a wonderful academic companion if anyone plan to use Mahout in their academic research project.

Detailed Review:
When I was asked to review this book, I was skeptical about this book because of the TSP receipe that is included no longer supported by Mahout. I guess a technical cookbook should have real world use cases and here was a receipe which cannot be practically implemented and hence misleading Mahout’s capabilities. However, when I read this book right from chapter 1, it was written so well that anyone can understand setting up and working with Mahout. Caveat: You should have some amount of knowledge in Software development and Java programming.

I disagree with comments that most of receipes in this book can be obtained by google search. The book carefully explains a given concept with output screenshots and also puts a walkthrough on how to implement the same in Netbeans. Glad to see author using Netbeans, I personally support that and it is easy to work with. Receipes like import/export data from HDFS/RDBMS, spectral clustering are a highlight. The author does not assume that the user is familiar with MySQL so there is a walkthrough on installing the same. Topic modeling, Pattern mining are good to see.

There is an entire chapter on classification walkthrough (for binary and multi-level classification) in Mahout for which there are plenty of tutorials available on the web and it is well written in MiA. Same goes with k-meansg. Also, based on the discussions happened with developers, it is pretty conclusive mapreduce version of genetic programming may not possibly see the light in future Mahout release. My personal recommendation is not to get too involved with chapter 10. Also, TSP example is basically a sample and not a real life one. For those who want to learn more, I would suggest to look up watchmaker project. Instead of outdated TSP demo, I would have liked to see Hidden Markov Modeling case study even though it is only partially parallelized.

I personally would like to see a second edition with more in-depth recipes where data is extracted and cleansed using Pig/Hive, then fed to Mahout to produce meaningful results. I would like to see detailed coverage on building recommendation engines, building a fraud detection engine based on large amount of data that is transformed using Pig and finding hidden patterns where Hadoop ecosystem tools are put to use. Author’s choice of preferred NoSQL database in Mahout context would also be good to see.

You can buy this book at Packt Publishing

About these ads

5 comments

  1. Hello, I do believe your website might be having web browser compatibility problems. When I look at your web site in Safari, it looks fine however, if opening in Internet Explorer, it has some overlapping issues. I simply wanted to give you a quick heads up! Apart from that, excellent blog!

  2. Dear Pavan,

    first of all thanks for taking the time to read my book in a so precise way. Working nine months for writing the book and knowing that someone spend his/her time to read it means lot to me.

    Considering you suggestion I found them pertinent and strait to the point. If one can just in an objective way his work I have to say that I agree with you that Genetic algorithm will be probably never implemented in Mahout using Hadoop so the last chapter is basically an introduction to the TSP without any clue to the real world Mahout use. Anyway the publisher and I will take care of your suggestion for a second revision of the book with two goals in mind trying to make the book typos-free (nearly impossible) and adding the suggestion you give us.

    In any case thanks for you overall appreciation of the book itself despite the issues you underline.

    Piero

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s