Member-only story

Hadoop Ecosystem Concept Map

7 min readDec 15, 2021

Context: I did the research here for a course for my MS in Data Analytics Engineering degree at George Mason University.

Hadoop Ecosystem comprises several open-source tools that provide a framework for the data to be processed in a distributed model across clusters of machines. Hadoop makes this distributed model a possibility by having four main processes: Data Storage, Data Processing, Data Ingestion, and Data Administration. Let’s dive deep into the concept map created here!

What is Hadoop?

Hadoop is a set of tools that support the running of applications of big data. Big data is any data that has a large data volume and data variety. Due to the large volume of data, it is hard to use traditional non-scalable applications. Hadoop works by breaking the data down into equal pieces and computing them individually. Hadoop can be broadly classified into two components: Hadoop Distributed File System (HDFS) and MapReduce. The set of tools that work in cohesion, for HDFS and MapReduce, is called Hadoop, which is managed by Apache.

Some of the essential characteristics of Hadoop are:

Distributed Models

Hadoop Ecosystem Concept Map

What is Hadoop?

Written by Urvashi Saxena

Responses (1)