Hadoop Starter Kit

Mihai Chelaru-Centea

Online Course by Hadoop in Real World

Resource last updated: Feb. 1, 2017


Overview

This is one of the top results on Udemy when looking for courses on Hadoop, and I can see why. For starters, it's free, so there's no paywall to climb over to get to the content. More importantly, it nicely lays out the case for big data solutions and introduces Hadoop as a solution to some of the challenges big data poses to analysts and programmers alike.

Format

The course is only 3.5 hours long, and divided up into four main sections. There's a short quiz with a few questions at the end of each section to test your understanding. The course starts off introducing the concept of big data, and the challenges it poses. This is where a real-life example problem is introduced, namely calculating the maximum closing price for each stock symbol on an exchange.

The next chapter addresses some of the storage and computational problems of big data with the Hadoop Distributed File System (HDFS). The course then describes the MapReduce programming model, along with the three phases: map, shuffle, reduce. This section has two lectures devoted to looking at a Java implementation of MapReduce for solving the maximum closing price problem. Finally, there is a section on Pig and Hive, two software tools that enhance the base functionality of Hadoop, leveraging powerful scripting languages and SQL-like queries.

Content

The course covers the main components of Hadoop, HDFS and MapReduce. It explains both in a fair bit of detail, while also using a real-world case study to frame everything, and the bigger picture of big data analytics as the overall backdrop for the course.

Some later lectures show actual operations on a real Hadoop cluster, the same one that students have access to through the course, although it seems like to get access you have to sign up on the Hadoop in Real World website instead of just Udemy. This makes the course have a practical aspect to it, not just theory and diagrams.

Difficulty

The lectures fit nicely together, with a logical flow and good production value. The lectures are clearly scripted, and read in a clear voice with an emphatic delivery that really gets the points across. I'd say that this coupled with the fact that you don't need any programming experience to really understand the course makes it ideal for beginners, although generally you'd probably only be interested in Hadoop once you already know something about performing data analysis programmatically using some other language.

The Bottom Line


This course offers great value, and a lot of information condensed into a small runtime. Compared to a lot of other MOOCs out there in other categories, this one is far better organized, and you get the sense that it's being taught by practitioners who are experts in the subject matter, more so than some educators out there who don't seem to be experts in the material they're teaching.

I would recommend this course for anyone looking for a quick and well-explained introduction to the Hadoop ecosystem. At 3.5 hours, you could get it done in a day, and then from there decide where to go to find more information. The authors do offer a full course called Hadoop Developer in Real World. Udemy allows you to preview the first lectures in each section of that course, so it might be worth a look if you're interested in going further.