tencent cloud

Migrating Data from Native HDFS to CHDFS
Last updated: 2022-03-30 09:30:26
Migrating Data from Native HDFS to CHDFS
Last updated: 2022-03-30 09:30:26

Preparations

1. Create a CHDFS instance and a CHDFS mount point and configure the permission information at the Tencent Cloud official website.
2. Access the created CHDFS instance from a CVM instance in a VPC. For more information, please see Creating CHDFS Instance.
3. After the mount is successful, open the Hadoop command line tool and run the following command to verify whether the CHDFS instance works properly.
hadoop fs -ls ofs://f4xxxxxxxxxxxxxxx.chdfs.ap-beijing.myqcloud.com/
If you can see the output similar to the following, the CHDFS instance works properly.


Migration

After the preparations are completed, you can use the standard DistCp tool in the Hadoop community to perform full or incremental HDFS data migration. For more information, please see DistCp.

Notes

The Hadoop DistCp tool provides some parameters that are incompatible with CHDFS. If you specify some parameters in the following table, they will not take effect.
Parameter
Description
Status
-p[rbax]
r: replication; b: block-size; a: ACL, x: XATTR
Not effective

Samples

1. When the CHDFS instance is ready, run the following Hadoop command to perform data migration.
hadoop distcp hdfs://10.0.1.11:4007/testcp ofs://f4xxxxxxxx-xxxx.chdfs.ap-beijing.myqcloud.com/
Here, f4xxxxxxxx-xxxx.chdfs.ap-beijing.myqcloud.com is the mount point domain name, which needs to be replaced with the information of your actual mount point.
2. After the Hadoop command is executed, the details of the migration will be printed in the log as shown below:
2019-12-31 10:59:31 [INFO ] [main:13300] [org.apache.hadoop.mapreduce.Job:] [Job.java:1385]
Counters: 38
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=387932
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=1380
HDFS: Number of bytes written=74
HDFS: Number of read operations=21
HDFS: Number of large read operations=0
HDFS: Number of write operations=6
OFS: Number of bytes read=0
OFS: Number of bytes written=0
OFS: Number of read operations=0
OFS: Number of large read operations=0
OFS: Number of write operations=0
Job Counters
Launched map tasks=3
Other local map tasks=3
Total time spent by all maps in occupied slots (ms)=419904
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=6561
Total vcore-milliseconds taken by all map tasks=6561
Total megabyte-milliseconds taken by all map tasks=6718464
Map-Reduce Framework
Map input records=3
Map output records=2
Input split bytes=408
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=179
CPU time spent (ms)=4830
Physical memory (bytes) snapshot=1051619328
Virtual memory (bytes) snapshot=12525191168
Total committed heap usage (bytes)=1383071744
File Input Format Counters
Bytes Read=972
File Output Format Counters
Bytes Written=74
org.apache.hadoop.tools.mapred.CopyMapper$Counter
BYTESSKIPPED=5
COPY=1
SKIP=2
Was this page helpful?
You can also Contact Sales or Submit a Ticket for help.
Yes
No

Feedback