FastAI: Introduction to Machine Learning for Coders

Mihai Chelaru-Centea

Online Course by Jeremy Howard

Resource last updated: Sept. 21, 2018


Overview

The FastAI Machine Learning course is a very recent addition to the panoply of online courses that exist on the subject of data science and machine learning. It's taught by Jeremy Howard, top Kaggle competitor and founder of Enlitic, a modern machine learning startup dedicated to revolutionizing healthcare.

Format

The course is divided into twelve lectures, although each lecture picks off where the last one went off in a fairly contiguous manner, so they're not self-contained and the lecture titles are more of a guideline. This loose structure is a general theme of the course, since it's taught live at the University of San Francisco and the lectures have a Q and A format, with periods of lecturing punctuated by intelligent questions from grad students pursuing a Master's in data science.

The course is taught in Python using the Jupyter Notebook environment, which lends itself really well to data science in general, and also the live Q&A format of the course. It predominantly uses code from the fastai Python library, written by the authors of the course with a philosophy of being quick and easy to understand, although a good portion of the course is spent implementing random forest from scratch, a refreshing take as most online courses I've encountered don't go into the guts of how their algorithms work.

Content

There is a heavy focus on tree-based methods such as random forest and gradient boosting, with a reasonably large section in the second half of the course dedicated to deep learning methods. The transition is logical, as it comes off the back of Howard explaining that trees have limitations when it comes to extrapolating to new data ranges, a limitation not shared by their neural net counterparts.

The deep learning portion of the course uses PyTorch, and here again there is a lot of manually implementing various aspects of neural networks using the PyTorch API. This adds to the overall feeling that this is a course taught by a practising data scientist, and not simply an educator.

In every lecture there are a number of real-life heuristics to help you with real problems you will encounter, as well as advice on things like what parameter values work well in practice. There's also a lecture at the end that has a great list of machine learning applications in various industries, such as healthcare, finance, and marketing.

The last portion of the course is devoted to talking about the ethical implications of deploying machine learning algorithms in real-world scenarios. This section has a number of case studies, such as Facebook and Volkswagen, that really drive home the point that you are responsible for the code you produce and the ramifications it can have for your users or other people.

Difficulty

This course does not assume much background in machine learning or programming, although I think it would be of great benefit to have knowledge of both of those subject areas, as well as a working knowledge of statistics. The course is at the Master's level, and a lot of the questions that get asked are fairly advanced.

Some of the deep learning lectures do discuss matrix algebra and tensors, so it helps to have a basic idea of how that works. For that I'd recommend looking at the Linear Algebra Review section in Week 1 of Andrew Ng's Machine Learning course on Coursera.

The Bottom Line


Overall, I would say that the amount of value that is packed into the course's short runtime is hard to beat. This course is an absolute must for anyone interested in machine learning, particularly if you're dealing with structured datasets, and upon completing it you'll have the skills to go out and perform reasonably well in Kaggle competitions or other applications. Add to that the fact that this course is free and comes with its own Python library and an online forum to discuss the course and the library and it's probably the best single course out there to date.