Being a professional in the metadata “trade”, that also puts me in the centre of some data projects. Scientific data, public domain data, you name it. I have found myself more than once, digging through data, sorting data, selecting and cleaning up data and metadata. Having lots of practical experience and also some background in statistics from my first degree and my MSc, I decided to take a break from the entrepreneurship literature and go into data mining and hopefully to big data.
I have identified some of the prevalent books out there, but skimming through them, I saw that some may be a bit advanced for my taste, so as a first step, I went with the obvious option of “Data Mining for Dummies“. And now, some days after making this choice, I can say with certainty that this was the first and the last time I chose a book of this series ([something] for Dummies).
To begin with, I fully understand that this book was intended for “dummies”, which I guess means people that may have never heard of data mining and the likes. Still, even with this premise, this book fails to deliver. It’s too big and too spread out. It repeats stuff over and over again, with a structure that is kind of confusing. I did like the fact that it included some concepts and guidelines to begin with. It does provide a nice structure of the things that you need to master in order to be a data miner.
On the downside, it presents everything so simple, whereas at the same time it offers some screenshots of how you do things in the software explained, without any theoretical background whatsoever, leaving the reader confused in the end. The provocative statement that data mining is a craft and that you don’t need theoretical models but you can just work you way through trial and error, is simply put, “nonsense”. Plain and simple. How can you be a data miner if you don’t have a concrete understanding of what the models that you apply on your data do, or mean for that fact? You can’t, and you shouldn’t…!
The book does make some good points in some parts, but when others are just an oversimplification of literaly everything, you kind of lose credibility. I liked the visual programming part and really appreciated the fact that the book pointed me to the right direction to download a couple of software to start playing with (KNIME and Weka). I also appreciated the introduction to CRISP-DM and the simplicity that the process of data mining was explained.
Other than that, the book seems to be about 200 pages long (at most), repeating itself over and over again, stretching out to 411 pages or so. Too many repetition of the same things, too many naive statements about business processes and how data mining connects to business practices. And then you had the parts where the software was promoted, explaining how you do specific operations in each piece of software, with little if any support, for someone that actually wants to test them.
This book should be avoided even by the real novices in data mining. Chances are that you will pick up the wrong things or the right things in the wrong way. And then, once you learn something, you will have to double the effort to unlearn it. I am already in the lookout for a nicer book on data mining, so I hope I can come back with a better book review as soon as possible. Would you like to help?
Some options for my next read include Beautiful Data by Orit Halpern, Big Data: A Revolution That Will Transform How We Live, Work, and Think by Mayer-Schönberger and Cukier, Mining of Massive Datasets by Leskovec, Rajaraman and Ullman, Data Science from Scratch by Joel Grus and last but not least, Big Data at Work: Dispelling the Myths, Uncovering the Opportunities by Thomas H. Davenport. Which one would you advocate for? Do you have anything else to recommend?
UPDATE: Searching a bit, I found a daunting (750 pages) but also promising read that I selected! It’s the Data Mining: The Textbook, by Charu C. Aggarwal…! Wish me luck! 😉