DistCP
In this post, we are going to learn about Distcp in Hadoop and various aspects of Distcp. What is Distcp Distcp(Distributed Copy) is a tool used for copying a large set of data for Inter/Intra-cluster copying. It uses Map Reduce for distribution, error handling, recovery, and reporting. It expands the list of the files into a map task which is used to copy a partition file from source to destination cluster. How DistCp Works The DistCp command will compare the file size at source and destination. If the destination file size is different from source file size, it will copy the file else if the file size is same, it will skip copying the file. Why prefer discp over cp, get and put commands DistCp is the fastest way to copy the data as compare to get/put command as it uses map for the processing. Also, we cannot use put/get...