COS launched the metadata acceleration feature to provide high-performance file system features. Metadata acceleration leverages of the powerful metadata management feature of Cloud HDFS (CHDFS) at the underlying layer to allow users to use file system semantics to access COS. The system design metrics can reach a bandwidth of up to 2.4 GB/s, over 100,000 queries per second (QPS), and a latency of milliseconds. Buckets with metadata acceleration enabled can be widely used in scenarios such as big data, high-performance computing, machine learning, and artificial intelligence. For more information about metadata acceleration, see Metadata Acceleration Overview.
In the past, big data access based on COS is mainly implemented based on the Hadoop-COS tool. The Hadoop-COS tool internally adapts HCFS APIs to COS RESTful APIs to access data on COS. The difference in metadata organization between COS and file systems leads to the difference in metadata operation performance, which affects big data analysis performance. Buckets with metadata acceleration enabled are fully compatible with the HCFS protocol and can be accessed directly by using native HDFS APIs, saving the overhead of converting the HDFS protocol to COS protocol, and providing native HDFS features such as efficient directory rename (as an atomic operation), file atime and mtime update, efficient directory DU statistics, and POSIX ACL permission support.
Create a COS bucket and enable metadata acceleration for it.
After the bucket is created, go to the File List page of the bucket. You can upload or download files in the console.
Choose Performance Configuration > Metadata Acceleration in the left sidebar, and you can see that metadata acceleration is enabled. If this is the first time you create a bucket that requires metadata acceleration, perform the corresponding authorization operations as instructed. After you click Authorize, HDFS access is enabled by default, and you can view the information of the default bucket mount point.
If the system indicates that the corresponding HDFS file system is not found, submit a ticket.
The HDFS permission configuration is different from the native COS permission system. If you use HDFS to access a COS bucket, you are advised to configure HDFS permission to authorize machines in a specified VPC to access the COS bucket to achieve the same permission experience as that of the native HDFS.
5. By default, the HDFS protocol uses the native POSIX ACL mode for authentication. If you want to use Ranger authentication, you can select the Ranger authentication mode under HDFS authentication mode and configure Ranger addresses.
For how to configure the Ranger service and use the Ranger service to access COS via HDFS, see here.
core-site.xmlfor the computing cluster. For more information, see here. If you are using Tencent Cloud EMR, you can use the default EMR configuration directly without extra configuration.
fs.ofs.bucket.regionparameter is required. It specifies the COS region where the bucket resides, for example,
classpathpath (example path:
/usr/local/service/hadoop/share/hadoop/common/lib/; the path varies depending on component) on each server in the Hadoop cluster. Then restart resident services such as Yarn, Hive, Presto, and Impala.
In a big data scenario, you can perform the following steps to access a bucket with metadata acceleration enabled using HDFS:
core-stie.xml, configure HDFS related mount point information as shown in the Preparations part.