tencent cloud

Feedback

Hadoop tool v4

Last updated:

    Feature Description

    The Hadoop cosn plugin is used to execute high-layer computing tasks on file systems that use Tencent Cloud Object Storage (COS) as underlying storage. You can use Hadoop engines for big data processing, such as MapReduce, Hive, Spark, Tez, to process data stored on Tencent COS.

    Operating Environment

    System Environment

    Linux/Windows system

    Software Dependencies

    Hadoop-2.7.2 or later

    Installation and Configuration

    For more information on how to install and configure Hadoop, please see Install and Test Hadoop.

    How to Use

    Install Maven

    Obtain cos-java-sdk

    Download link: cos-java-sdk

    Execute the following command in the download path to compile to obtain cos_api-4.2.jar which is located in the target directory:

    mvn clean package -Dmaven.test.skip=true
    

    Obtain hadoop-cos Plugin

    Download link: hadoop-cos plugin

    As cosn relies on SDK, copy the cos_api-4.2.jar compiled in the previous step to src/main/resources and execute the following command to compile, to obtain the hadoop-cos-2.7.2.jar under the target directory:

    mvn clean package -Dmaven.test.skip=true
    

    Plugin Installation

    Modify hadoop_env.sh

    Enter hadoop_env.sh under $HADOOP_HOME/etc/hadoop directory. Add the following content and add the cosn related jar packages to Hadoop environment variable:

    for f in $HADOOP_HOME/share/hadoop/tools/lib/*.jar; do
      if [ "$HADOOP_CLASSPATH" ]; then
        export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$f
      else
        export HADOOP_CLASSPATH=$f
      fi
    done
    

    Copy the obtained cos_api-4.2.jar and hadoop-cos-2.7.2.jar to $HADOOP_HOME/share/hadoop/tools/lib, and copy json-20140107.jar under /org/json/json/20140107 in local Maven warehouse and httpmime-4.2.5.jar under /org/apache/httpcomponents/httpmime/4.2.5 to the directory.
    Generally, the local Maven warehouse is under ${user.home}/.m2/repository, and is controlled by the localRepository variable in ${MAVEN_HOME}/conf/settings.xml.

    Modify Configuration File to Use the Plugin

    Modify $HADOOP_HOME/etc/hadoop/core-site.xml, and add COS related users and implementation class information. For example:

    <configuration>
        <property>
            <name>hadoop.tmp.dir</name>
            <value>/data/rabbitliu/work/hadoop/hadoop_test</value>
        </property>
        <property>
            <name>fs.defaultFS</name>
            <value>hdfs://localhost:9000</value>
        </property>
        <property> 
            <name>dfs.name.dir</name>           
            <value>/data/rabbitliu/work/hadoop/hadoop_test/name</value> 
        </property>
        <property> 
            <name>fs.cos.userinfo.appid</name>           
            <value>1252448703</value> 
        </property>
        <property> 
            <name>fs.cos.userinfo.secretId</name>           
            <value>xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx</value> 
        </property>
        <property> 
            <name>fs.cos.userinfo.secretKey</name>           
            <value>xxxxxxxxxxxxxxxxxxxxxxxx</value> 
        </property>
        <property>
            <name>fs.cosn.impl</name>
            <value>org.apache.hadoop.fs.cosnative.NativeCosFileSystem</value>
        </property>
        <property>
            <name>fs.cos.buffer.dir</name>
            <value>/data/rabbitliu/work/hadoop/hadoop_test/cos_buf</value>
        </property>
        <property>
            <name>fs.cos.userinfo.region</name>
            <value>tj</value>
        </property>
    </configuration>
    

    Note:
    The configuration file contains the following attributes of COS:

    • fs.cos.userinfo.appid: Enter the APPID of your Tencent Cloud account, which can be seen at Console -> Account Info.
    • fs.cos.userinfo.secretId/secretKey: Enter the API key information of your account, which can be seen at Cloud API Key Console.
    • fs.cosn.impl is the implementation class of cosn. It is always org.apache.hadoop.fs.cosnative.NativeCosFileSystem.
    • fs.cos.buffer.dir: Configure a directory that actually exists. Temporary files generated in the running process will be stored here.
    • fs.cos.userinfo.region: Enter your region. Enumerated values are region abbreviations in the List of Regions for Historical Versions, such as sh, gz, sgp.

    Use of Software (Take Linux as an example)

    Use hadoop fs Common Commands

    Command format: hadoop fs- -ls cosn://Bucket path. Take the Bucket named "example" as an example. You can add a specific path after the Bucket.

    hadoop fs -ls  cosn://example/
    Found 7 items
    -rw-rw-rw-   1 example example       3669 2016-10-25 21:23 cosn://example/b.txt
    drwxrwxrwx   - example example          0 1970-01-01 08:00 cosn://example/dir1
    drwxrwxrwx   - example example         0 1970-01-01 08:00 cosn://example/mr
    -rw-rw-rw-   1 example example      16952 2016-10-25 21:37 cosn://example/qcloud_sign.proto
    -rw-rw-rw-   1 example example       2048 2016-10-25 21:48 cosn://example/test2K.txt
    -rw-rw-rw-   1 example example   52428800 2016-10-27 16:40 cosn://example/test50MB.txt
    drwxrwxrwx   - example example          0 1970-01-01 08:00 cosn://example/xx1
    

    Run the wordcount provided in MapReduce

    Note:
    hadoop-mapreduce-examples-2.7.2 in the following command is only available to Version 2.7.2. For other versions, change it to the appropriate version number.

    bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar wordcount cosn://example/mr/input cosn://example/mr/output3
    

    When successfully executed, it will return the following statistical information:

    File System Counters
            COSN: Number of bytes read=72
            COSN: Number of bytes written=40
            COSN: Number of read operations=0
            COSN: Number of large read operations=0
            COSN: Number of write operations=0
            FILE: Number of bytes read=547350
            FILE: Number of bytes written=1155616
            FILE: Number of read operations=0
            FILE: Number of large read operations=0
            FILE: Number of write operations=0
            HDFS: Number of bytes read=0
            HDFS: Number of bytes written=0
            HDFS: Number of read operations=0
            HDFS: Number of large read operations=0
            HDFS: Number of write operations=0
        Map-Reduce Framework
            Map input records=5
            Map output records=7
            Map output bytes=59
            Map output materialized bytes=70
            Input split bytes=99
            Combine input records=7
            Combine output records=6
            Reduce input groups=6
            Reduce shuffle bytes=70
            Reduce input records=6
            Reduce output records=6
            Spilled Records=12
            Shuffled Maps =1
            Failed Shuffles=0
            Merged Map outputs=1
            GC time elapsed (ms)=0
            Total committed heap usage (bytes)=653262848
        Shuffle Errors
            BAD_ID=0
            CONNECTION=0
            IO_ERROR=0
            WRONG_LENGTH=0
            WRONG_MAP=0
            WRONG_REDUCE=0
        File Input Format Counters 
            Bytes Read=36
        File Output Format Counters 
            Bytes Written=40
    
    Contact Us

    Contact our sales team or business advisors to help your business.

    Technical Support

    Open a ticket if you're looking for further assistance. Our Ticket is 7x24 avaliable.

    7x24 Phone Support