Monday, February 17, 2014

Book review: Data Science for Business

Since I'm interested in data science but I'm newbie into this field of research, I decided to read some introductory book dedicated to this topic. Firs book which I read was Data Science for Business written by Foster Provost and Tom Fawcett. Below are my descriptions of each main chapter of this book:
  • First chapter of this book is dedicated to overall definition of data science, big data and similar.
  • Second chapter introduces "canonical data mining tasks".
  • Chapter 3 shows first steps with supervised segmentation and decision trees.
  • Next chapter adds linear regression, support vector machine and logistic regression. 
  • Chapter 5 - in my opinion most useful - defines overfitting. Authors shows examples how one can hit the overfitting problem, but also shows how to avoid it and deal with potential problems.
  • Chapter 6 introduces additional data science tools: similarity, neighbors and clustering methods.
  • Chapter 7 focuses on aspects strictly related to applying earlier mentioned tools to business - expected profit. This well written chapter shows that there is almost always second bottom, apart of pure data tools - business bottom.
  • Great data scientist, at some point has to show his results and hypothesis to stakeholders. He can use lots of complicated mathematical formulas, but also can use simple plots with additional information to nicely visualize his ideas. Chapter 8 describes some fundamental "curves" which are often used in data science.
  • In chapter 9, authors describe Bayes' rule and discuss its advantages and disadvantages.
  • Chapter 10 is dedicated to "text mining". Authors know, that they just scratch top layer of this issue. But on the other hand, reader can find here some basics ideas how to work with text and how to start researching different methods.
  • Final evaluation of example problem which was used through this book is done in chapter 11. 
  • Chapter 12 discuss other techniques with approaching analytical tasks: co-occurrence and associations (example usage: determining item which are bough together). Profiling, link prediction and data reduction is also discussed with nice example of Netflix Prize. Authors also clearly explain why ensemble of models could give better results in some cases.
  • In chapter 13, authors shows how to think about data science in business context, but also points how to work as data scientist in business environment.
  • Last chapter is dedicated to overall summary. Authors gives hints how we should ask data science related questions and how to think in general about data science.
Lecture of this book was very satisfying. I wasn't hit by enormous quantity of new definitions, equations and examples. For newbie in data science, reading this book chapter after chapter is like going step by step after your mentor. Using one main example during whole book was great idea. Reader can observe different techniques and problems related to them, applied on the same business situation. Also, business awareness is raised from chapter to chapter. I recommend this book either to data scientist wannabe or to "suit" who want to hire some geeks to examine business possibilities which gathered data.

Actually I can't say anything bad about this book. Of course, I would just love to see a complementary handbook with code in Python or R, but I guess that there are plenty of such books.

No comments:

Post a Comment