Posts

Showing posts from November, 2016

Big Data Intro Part-2

In the previous post, we have learned about basics of big data and its 3V's. In this post, we are going to learn about the ecosystems of Hadoop which, which will discuss in deep at later posts. Below are the list of a few Hadoop ecosystems:- HDFS  is the file system of Hadoop ecosystem used to keep all the data. Map Reduce is used to process and generate large sets of data with a parallel distributed algorithm. Hive  gives an SQL-like interface to query data stored in various databases and file systems that integrate with Hadoop. Impala  is an open source parallel processing engine, perform quick analysis on data, low latency. Pig  is a platform for analyzing large data sets consisting of high level language. Sqoop is a tool designed to transfer bulk amount of data between HDFS and relational databases. Oozie  is a scheduler used to manage Hadoop jobs. Scala  is a scalable language. Scala code is compiled t...