HDFS TO COS Tools
Last updated: 2018-07-19 10:31:00PDF
HDFS TO COS is used to copy the data on HDFS to Tencent Cloud COS.
Installation and Configuration
For more information on the installation and configuration of environment, please see Install and Configure Java.
Configuration and Usage
How to Configure
- For more information on how to install Hadoop-2.7.2 or later, please see Install and Test Hadoop.
- Download HDFS TO COS at here and decompress it.
- Copy core-site.xml in the HDFS cluster to be synchronized to the conf folder. core-site.xml contains the configuration information of NameNode.
- Edit the configuration file
cos_info.conf, bucket, region, and API key. The provided name of "bucket" should be the full name including the appid provided by Tencent. E.g. "mybucket-125000000".
- Specify a location for the configuration file in the command line parameter. By default, it is located at
If the command line parameter conflicts with the configuration file, the command line parameter prevails.
How to Use
(Take Linux as an example)
The execution result is as follows:
- If the file you copy from HDFS to COS has the same name with a file originally stored in COS, the latter will be overwritten.
./hdfs_to_cos_cmd --hdfs_path=/tmp/hive --cos_path=/hdfs/20170224/
- If the file you copy from HDFS to COS has the same name and length with a file originally stored in COS, the copy will be ignored.
Only file length is used as a metric, because calculating file summaries of Hadoop takes too much work.
./hdfs_to_cos_cmd --hdfs_path=/tmp/hive --cos_path=/hdfs/20170224/ -skip_if_len_match
conf: Configuration file, which stores core-site.xml and cos_info.conf log: log directory src: Java source program dep: Runnable JAR package generated after compilation
About Configuration Information
Make sure you enter the correct configuration information, including bucket, region and API key. The provided name of "bucket" should be the full name including the appid provided by Tencent. E.g. "mybucket-125000000". Make sure the time difference between the server and Beijing time does not exceed 1 minute. Otherwise, reset the server time.
Make sure DateNode can be connected with the server that contains the replication program. NameNode can be connected because it has a public IP. But the DateNode server that contains the obtained block only has a private IP, it cannot be connected directly. It is recommended to execute the synchronization program at a Hadoop node, so that both NameNode and DateNode can be accessed.
Use Hadoop command to download and check files, and then use synchronization tools to synchronize data on Hadoop.
About File Overwriting
If the file you copy from HDFS to COS has the same name with a file originally stored in COS, the latter will be overwritten by default. If a user specifies
-skip_if_len_match, the copy will be skipped when the file you copy from HDFS to COS has the same name and length with a file originally stored in COS.
About COS path
COS path is a directory by default, and all the files copied from HDFS are stored in the directory.