Big Data Intro Part-2
In the previous post, we have learned about basics of big data and its 3V's. In this post, we are going to learn about the ecosystems of Hadoop which, which will discuss in deep at later posts. Below are the list of a few Hadoop ecosystems:- HDFS is the file system of Hadoop ecosystem used to keep all the data. Map Reduce is used to process and generate large sets of data with a parallel distributed algorithm. Hive gives an SQL-like interface to query data stored in various databases and file systems that integrate with Hadoop. Impala is an open source parallel processing engine, perform quick analysis on data, low latency. Pig is a platform for analyzing large data sets consisting of high level language. Sqoop is a tool designed to transfer bulk amount of data between HDFS and relational databases. Oozie is a scheduler used to manage Hadoop jobs. Scala is a scalable language. Scala code is compiled t...