Book Review: Apache Mahout Cookbook

bookQuick summary:
Very well written for Developers who are new to both Mahout and Machine Learning, with walk-throughs and screenshots. However, if you have experience in writing heuristics/have expertise in Machine learning, you can skip this book. Concise and to the point, few clerical errors and typos, though. This book certainly makes a wonderful academic companion if anyone plan to use Mahout in their academic research project.

Detailed Review:
When I was asked to review this book, I was skeptical about this book because of the TSP receipe that is included no longer supported by Mahout. I guess a technical cookbook should have real world use cases and here was a receipe which cannot be practically implemented and hence misleading Mahout’s capabilities. However, when I read this book right from chapter 1, it was written so well that anyone can understand setting up and working with Mahout. Caveat: You should have some amount of knowledge in Software development and Java programming.

I disagree with comments that most of receipes in this book can be obtained by google search. The book carefully explains a given concept with output screenshots and also puts a walkthrough on how to implement the same in Netbeans. Glad to see author using Netbeans, I personally support that and it is easy to work with. Receipes like import/export data from HDFS/RDBMS, spectral clustering are a highlight. The author does not assume that the user is familiar with MySQL so there is a walkthrough on installing the same. Topic modeling, Pattern mining are good to see.

There is an entire chapter on classification walkthrough (for binary and multi-level classification) in Mahout for which there are plenty of tutorials available on the web and it is well written in MiA. Same goes with k-meansg. Also, based on the discussions happened with developers, it is pretty conclusive mapreduce version of genetic programming may not possibly see the light in future Mahout release. My personal recommendation is not to get too involved with chapter 10. Also, TSP example is basically a sample and not a real life one. For those who want to learn more, I would suggest to look up watchmaker project. Instead of outdated TSP demo, I would have liked to see Hidden Markov Modeling case study even though it is only partially parallelized.

I personally would like to see a second edition with more in-depth recipes where data is extracted and cleansed using Pig/Hive, then fed to Mahout to produce meaningful results. I would like to see detailed coverage on building recommendation engines, building a fraud detection engine based on large amount of data that is transformed using Pig and finding hidden patterns where Hadoop ecosystem tools are put to use. Author’s choice of preferred NoSQL database in Mahout context would also be good to see.

