HDFS Part -1

December 17, 2016

From now onward we will start learning about the various ecosystems of the Hadoop. The focus will be on the theory as well as implementations. This blog will provide you all the basic details of the Hadoop with easy learning.

HDFS stands for Hadoop Distributed File System is a distributed file system and runs on commodity hardware.It is fault tolerance and design to be deployed on low-cost hardware. HDFS is suitable for applications having large data sets.

In other words, we can consider HDFS as storing space for the Hadoop-related data. HDFS runs on master-slave architecture and it mainly consists of three components:-

Name Node -- Acts as Master and keeping cluster storage track

Data Node -- Take care for working of different ecosystems of Hadoop

Secondary Name Node -- Acts as a backup node, become main in case active name node goes down.

Previously, we use to a single node cluster(A Unix box where we can perform all Hadoop related operations). Now, there are clusters with two nodes called as HA(High Availability) cluster.

Why HA clusters are introduced:-

In the case of single node cluster, when the nodes go down all the processing of the Hadoop stops as the full cluster was dependent on this node only. We need to wait until the node comes up. To avoid this scenario, HA was introduced. In HA there are two nodes(active and standby), in case the active node goes down standby node become active node. So cluster will be always up.

In the next post, I will explain more about HDFS, nodes and how HA nodes switch.

Your reviews are precious to us, kindly provide your reviews.

Search This Blog

Abhinav's Blog

HDFS Part -1

Comments

Post a Comment

Popular posts from this blog

DistCp2

Yarn Apache Hadoop

Big Data Intro

Big Data Intro Part-2

DistCP

HDFS Part -3

Hadoop Learning

HDFS Part -2