If you're interested in Hadoop but don't know where to begin, this session will give you an idea of what you can do with the open-source platform. We will see an overview of the Hadoop architecture, becoming familiar with the overall platform and its solutions for warehousing, ETL, streaming data ingest, in-memory processing, and more. We will compare Hadoop to SQL Server to help gain an understanding of when to deploy which technology.


The slides are available in HTML 5 format. All modern browsers (including tablets and phones) should be able to navigate the slides successfully.

The slides are licensed under Creative Commons Attribution-ShareAlike.

Additional Media

I have a version of this talk on YouTube. You can get the recording on my Youtube channel.

Links And Further Information

Hadoop Distributions

If you want to get started with Hadoop, there are a number of options available to you. The local sandboxes tend to be available as Azure or AWS virtual machines as well, so if you don't have a beefy machine at home, you can still get started pretty easily.

Local sandboxes:

Platform-as-a-Service offerings:

Interesting Links

Learning Resources

I'm not sure that any books are worth picking up, as these technologies change so fast. For example, a book on Hive development published in 2015 would be missing significant developments, particularly around Hive LLAP and Druid. If you really want to pick up a book, you might look at Spark: the Definitive Guide or Hadoop: the Definitive Guide. The Spark book is well-written but not quite complete yet. The Hadoop book was released in 2015, so it's missing some important things; there are also some chapters which are much better-written than others.

Some of the foundational papers do hold up well, as they provide information on the underpinnings of these technologies. Examples include:

I have a few other talks in which I cover elements of Hadoop in detail.

I learned a good deal from the Hortonworks tutorials, which include both written and video tutorials. They are a good place to start.