The Data Science Handbook - #1 Overview

Nancy Chelaru-Centea

Dec. 8, 2018


I got it in my head the other day that I should use this blog to keep track of my progress in studying data science, not only to help myself retain the information better, but also to serve as a resource for others on the same path.

A good place to start, I thought, is The Data Science Handbook (2017) by Field Cady, a data scientist at the Allen Institute for Artificial Intelligence. A quick look through the table of contents and preface makes this seem like quite the compendium of data science skills.


Mission

"The candidate was smart and knowledgeable, but the interview made it painfully clear that they were unprepared for the daily work of a data scientist. What do you do as an interviewer when the candidate starts apologizing for wasting your time? ... I wrote this book in an attempt to help people like that out, by condensing data science’s various skill sets into a single, coherent volume." (pg. xvii)

"This book aims to teach you everything you’ll need to know to be a competent data scientist." (pg. 1)


Pitch

"It [the book] is hands‐on and to the point: ideal for somebody who needs to come up to speed quickly or solve a problem on a tight deadline." (pg. xviii)


Layout

Introduction - Becoming a Unicorn

Part I - The Stuff You'll Always Use

  • The Data Science Road Map
  • Programming Languages
  • Interlude: My Personal Toolkit
  • Data Munging: String Manipulation, Regular Expressions, and Data Cleaning
  • Visualizations and Simple Metrics
  • Machine Learning Overview
  • Interlude: Feature Extraction Ideas
  • Machine Learning Classification
  • Technical Communication and Documentation

Part II - Stuff You Still Need to Know

  • Unsupervised Learning: Clustering and Dimensionality Reduction
  • Regression
  • Data Encodings and File Formats
  • Big Data
  • Databases
  • Software Engineering Best Practices
  • Natural Language Processing
  • Time Series Analysis
  • Probability
  • Statistics
  • Programming Language Concepts
  • Performance and Computer Memory

Part III - Specialized or Advanced Topics

  • Computer Memory and Data Structures
  • Maximum Likelihood Estimation and Optimization
  • Advanced Classifiers
  • Stochastic Modeling

Parting Words - Your Future as a Data Scientist


The Bottom Line


The internet is overflowing with a dizzying variety of articles, tutorials, courses and books for learning data science (more on that later). A book that aims to teach me everything I need to know to be a "competent data scientist" is hard to pass up. Let's see if it delivers.