Posts

Showing posts from December, 2016

HDFS Part -2

Image
In this post, we will learn how HDFS works and then we will learn some basic Hadoop commands in following posts. As we learn in the previous post, HDFS has three components name node, data nodes, and secondary name node. All functioning are done by data nodes and they are more in numbers while name node keeps the address of the data nodes and lists of its tasks. Data is stored in HDFS in chunks which are called as blocks. The minimum size of a block is 128MB. Consider we have a file of 1GB and we want to copy that file to HDFS. This file will be divided into blocks each of size 128 MB. Data will be stored in 8 blocks ( 1024 MB/128 = 8 Blocks). So data will be divided into 8 blocks.  Suppose, we have a file of 1100 MB, how data will be stored now.  Data will be stored in 9 blocks( 8 blocks x 128 MB = 1024 MB, remaining 76 MB will be stored in the 9th block).             What will happen to remaini...

HDFS Part -1

From now onward we will start learning about the various ecosystems of the Hadoop. The focus will be on the theory as well as implementations. This blog will provide you all the basic details of the Hadoop with easy learning. HDFS stands for Hadoop Distributed File System is a distributed file system and runs on commodity hardware.It is fault tolerance and design to be deployed on low-cost hardware. HDFS is suitable for applications having large data sets. In other words, we can consider HDFS as storing space for the Hadoop-related data.  HDFS runs on master-slave architecture and it mainly consists of three components:- Name Node  -- Acts as Master and keeping cluster storage track Data Node  -- Take care for working of different ecosystems of Hadoop   Secondary Name Node  -- Acts as a backup node, become main in case active name node goes down.                          P...