Hadoop Ecosystem Concept Map

Urvashi Saxena
7 min readDec 15, 2021

Context: I did the research here for a course for my MS in Data Analytics Engineering degree at George Mason University.

Image sourced from Analytics Vidhya

Hadoop Ecosystem comprises several open-source tools that provide a framework for the data to be processed in a distributed model across clusters of machines. Hadoop makes this distributed model a possibility by having four main processes: Data Storage, Data Processing, Data Ingestion, and Data Administration. Let’s dive deep into the concept map created here!

What is Hadoop?

Hadoop is a set of tools that support the running of applications of big data. Big data is any data that has a large data volume and data variety. Due to the large volume of data, it is hard to use traditional non-scalable applications. Hadoop works by breaking the data down into equal pieces and computing them individually. Hadoop can be broadly classified into two components: Hadoop Distributed File System (HDFS) and MapReduce. The set of tools that work in cohesion, for HDFS and MapReduce, is called Hadoop, which is managed by Apache.

Hadoop Concept Map

Some of the essential characteristics of Hadoop are:

  • Distributed Models

--

--

Urvashi Saxena
Urvashi Saxena

Written by Urvashi Saxena

Product Leader. Software Engineer. UI/UX Designer. Virginia Lover! Food & Chai Enthusiast. 👩‍💻☕️ Support my writing with https://ko-fi.com/urvashisaxena ☕️