Hadoop Distcp (Distributed copy) is a tool used for large inter- and intra-cluster copying. It uses MapReduce to effect its distribution, error handling and recovery, and reporting. With MapReduce parallel processing capabilities, it migrates massive data fast through map tasks, each of which will copy a partition of the files specified under the source path.
Since Hadoop-COS implements the semantics of Hadoop Distributed File System (HDFS), it can help you easily migrate data between COS and HDFS. This document will describe how to do so.
If a correct list of COS buckets is returned, Hadoop-COS works well, and you can begin the practice below.
hadoop fs -ls cosn://examplebucket-1250000000/
Use Hadoop Distcp to migrate files in the
/ test directory of the local HDFS cluster to the COS bucket
hadoop distcp hdfs://10.0.0.3:9000/test cosn://hdfs-test-1250000000/
Hadoop Distcp starts MapReduce tasks to copy the files, and outputs a brief report like so:
hadoop fs -ls -R cosn://hdfs-test-1250000000/command to list directories and files that have just been migrated to the bucket
Hadoop Distcp is a tool that supports copying data between different clusters and file systems. To copy COS files to local HDFS, simply using the object path in COS bucket as the source path and the HDFS file path as the destination path.
hadoop distcp cosn://hdfs-test-1250000000/test hdfs://10.0.0.3:9000/
With this command line, you can migrate data from HDFS to COS, and vice versa.
Run the following command:
hadoop distcp -Dfs.cosn.impl=org.apache.hadoop.fs.CosFileSystem -Dfs.cosn.bucket.region=ap-XXX -Dfs.cosn.userinfo.secretId=AK**XXX -Dfs.cosn.userinfo.secretKey=XXXX -libjars /home/hadoop/hadoop-cos-2.6.5-shaded.jar cosn://bucketname-appid/test/ hdfs:///test/
depdirectory on Github.
For any other parameters, see Hadoop-COS.
Hadoop Distcp supports a variety of parameters. For example, you can use
-m to specify the maximum number of Map tasks for parallel replication, and
-bandwidth to limit the maximum bandwidth used by each map. For more information, see the Apache Hadoop Distcp documentation DistCp Guide.