Statistical Learning

Mihai Chelaru-Centea

Online Course by Trevor Hastie and Rob Tibshirani

Resource last updated: Aug. 28, 2016


Overview

This course is the companion to the book An Introduction to Statistical Learning with Applications in R, which can be freely downloaded at that link. It's taught by two statisticians from Stanford, Trevor Hastie and Rob Tibshirani. It takes a look at machine learning techniques and methodologies from a statistics point of view, and often provides the historical context to some of these techniques, as many of them were developed at Stanford.

Format

The lectures are of varying length, but usually no more than 15 minutes long. Each section has an accompanying set of slides in PDF format that the lecturers go through, and at the end of each section there are some hands-on tutorials in R that implement some of what was discussed in the lectures.

The pace is quite slow, and each lecture covers a few of the slides for each section in detail. The lecturers have great chemistry, and I found myself smiling a lot at seeing two friends making silly jokes while trying to explain sometimes pretty dry statistical concepts.

Content

Each lecture covers a distinct topic, from linear models and regularization, to non-linear models such as decision tree-based models, like bagging and random forest, and SVMs. There is good coverage here of a lot of the traditional machine learning algorithms, although there isn't a lot of mention of deep learning and neural networks, which is fine as they don't really fit the mold of statistical learning anyhow.

Difficulty

The topics discussed are sometimes fairly advanced and I would really recommend reading the ISLR book as the companion to the course to fully understand the course content. I would say that this course requires at least some background in statistics, as when I took it I had already taken a university-level statistics course that covered the basics of probability theory, which comes in handy when discussing Bayesian statistics. It also helps to know some calculus, although you can understand the concepts being discussed without fully understanding the math and any derivations they mention.

The Bottom Line


As data scientists, it's great to have a solid knowledge of the statistics that underlies many of the popular techniques, and this course definitely provides a good introduction, as well as some of the historical context behind the development of this field, which is interesting and relevant as the field continues to advance every year.

Since it's a free course with a free book that is a staple for machine learning practitioners wanting to get a good grounding in statistics, I would definitely recommend taking a look at this course. There's a lot of useful content squeezed into this course's short runtime, so definitely check this one out.