Data Science A-Z™: Real-Life Data Science Exercises Included

Mihai Chelaru-Centea

Online Course by Kirill Eremenko

Resource last updated: Nov. 22, 2018


Overview

This is yet another course from Kirill Eremenko, author of Confident Data Skills: Master the Fundamentals of Working with Data and Supercharge Your Career. Access to the course comes bundled with the book, so if you can get yourself a copy, perhaps from the library, then definitely check it out.

Format

This is quite a comprehensive course that focuses on multiple aspects of day-to-day work as a data scientist, including ETL (Extract, Transform, Load) with SSIS, creating visualizations with Tableau, and creating models using Gretl. The sections are once again modularized like the other A-Z series courses, allowing you to pick and choose which sections you want to focus on and ignore the ones that don't interest you.

The sections are broken down into multiple lectures, each with a descriptive title that more or less describes what is covered in that lecture. As a result, the lectures are usually under 10 minutes long, although there are a few that go over 15 minutes. This makes it easy to go back and look for a particular topic, or to only do a bit at a time if you have limited time or like to take frequent breaks.

Content

The pacing is somewhat slow, as I've come to expect from courses in this A to Z series, but they're geared for beginners so this is understandable. The first section takes a hands-on approach to building visualizations using Tableau, after loading data in from a CSV file. Kirill shows off some of the most interesting features, but the sample data is not the most complex, and I think I would have benefited from a more interesting dataset that would allow a more comprehensive introduction to the software.

The section on ETL using SSIS introduces a bunch of different potential errors that data scientists will potentially deal with, and this section really gave the impression that it came from the course author's personal experience working as a data scientist. Many of these are data wrangling tasks that have to do with loading data in from a CSV file and using SQL to remove rows with corrupted data or fix up typos.

This section is well-structured, with valuable best practices for naming your files and logging excluded data, as well as writing and using stored procedures for reproducibility, which is extremely important in a professional environment.

The third section involves building linear models using Gretl, and using logistic regression in the case of a classification task. I think the choice of software here is questionable when Python and R are freely available and more widely used, but I can see that this course is more geared towards data analyst types who are used to using point-and-click software, so I'll give that a pass.

The last section is all about advice on how to excel a workplace environment. The advice is fairly generic, and you'll find many others on the internet saying the same thing. However, many of the points are backed with anecdotes, which make them easier to understand and implement in practice.

This type of content is not present in many other courses, although the FastAI Machine Learning course has a section on ethics that touches on how to respond to requests to produce unethical code in the workplace. I value this type of section being included in courses meant to prepare people for the professional world, and so it's refreshing to see it in this course.

Kirill also includes video footage of a presentation he did where he described a project he worked on to reduce staffing requirements at a company he worked for. The focus is on the delivery and the structure of the presentation, which is pretty short but really gets the point across. Getting a sense of what a real live data science presentation looks like is where most of the value in this course was for me, as I came into it already knowing how to do a lot of the visualization, data preprocessing, and modelling steps from other courses and projects.

The Bottom Line


Overall, an interesting course that provides some things I haven't seen offered in other data science courses and resources on the web. I hope to see more content like this focused on the presentation skills and professionalism aspects of data science, as there's already an overabundance of courses that just go into the modelling portion. The data wrangling portion is also great, although I would have preferred something that uses Python or R instead of SSIS, as not everyone will be working with that particular tool.

Learning Tableau is nice, and it does make visualizations quick and easy, although I had quite a bit of trouble getting some of my Excel files to work, and the software isn't free. Getting it free as a student is also quite difficult, as they don't accept a picture of a student card as proof, nor having a student email, so you need some sort of other documentation, which is a lot more hoops to jump through than something like GitHub Student or JetBrains Student License.