Release Notes and Announcements
- Release Notes
- Announcements
- Security Announcements
Product Introduction
- Overview
- Strengths
- Architecture
- Features
- Use Cases
- Constraints and Limits
- Technical Support Scope
- Product release
Purchase Guide
- EMR on CVM Billing Instructions
- EMR on TKE Billing Instructions
- EMR Serverless HBase Billing Instructions
- EMR Serverless TCBase Billing Overview
Getting Started
- EMR on CVM Quick Start
- EMR on TKE Quick Start
EMR on CVM Operation Guide
- Planning Cluster
- Administrative rights
- Configuring Cluster
- Managing Cluster
- Managing Service
- Monitoring and Alarms
- TCInsight
EMR on TKE Operation Guide
- Introduction to EMR on TKE
- Configuring Cluster
- Cluster Management
- Service Management
- Monitoring and Ops
- Application Analysis
EMR Serverless HBase Operation Guide
- EMR Serverless HBase Product Introduction
- Quotas and Limits
- Planning an Instance
- Managing an Instance
- Monitoring and Alarms
- Development Guide
EMR Serverless TCBase Operation Guide
- Introduction to EMR Serverless TCBase
- Managing Instances
- Managing Services
- Monitoring and Alarms
EMR Development Guide
- Hadoop Development Guide
- Spark Development Guide
- HBase Development Guide
- Phoenix on Hbase Development Guide
- Hive Development Guide
- Presto Development Guide
- Sqoop Development Guide
- Hue Development Guide
- Oozie Development Guide
- Flume Development Guide
- Kerberos Development Guide
- Knox Development Guide
- Alluxio Development Guide
- Kylin Development Guide
- Livy Development Guide
- Kyuubi Development Guide
- Zeppelin Development Guide
- Hudi Development Guide
- Superset Development Guide
- Impala Development Guide
- Druid Development Guide
- TensorFlow Development Guide
- Kudu Development Guide
- Ranger Development Guide
- Kafka Development Guide
- StarRocks Development Guide
- Flink Development Guide
- JupyterLab Development Guide
- MLflow Development Guide
Practical Tutorial
- Practice of EMR on CVM Ops
- Data Migration
- Practical Tutorial on Custom Scaling
API Documentation
- History
- Introduction
- API Category
- Making API Requests
- Cluster Resource Management APIs
- Cluster Services APIs
- User Management APIs
- Information Query APIs
- Scaling APIs
- Configuration APIs
- Other APIs
- Cluster Lifecycle APIs
- Serverless HBase APIs
- YARN Resource Scheduling APIs
- Data Types
- Error Codes
FAQs
- EMR on CVM
Service Level Agreement
Contact Us

Druid Usage

Download

Focus Mode

Font Size

Last updated: 2025-01-03 15:02:25

EMR allows you to deploy an E-MapReduce Druid cluster as an independent cluster based on the following considerations:
Use case: E-MapReduce Druid can be used without Hadoop to adapt to different business use cases.
Resource preemption: E-MapReduce Druid has high requirements for the memory, especially with the Broker and Historical nodes. The resource usage of E-MapReduce Druid is not scheduled by Hadoop YARN; therefore, resource preemption tends to occur during operations.
Cluster size: As an infrastructure, Hadoop generally has a large size, while E-MapReduce Druid is relatively small. When they are deployed in the same cluster, resources may be wasted due to their different sizes. Therefore, separate deployment is more flexible.
Purchase suggestions
To purchase a Druid cluster, select Druid as the cluster type when creating the EMR cluster. The Druid cluster has built-in Hadoop HDFS and YARN services integrated with Druid, which are recommended for testing only. We strongly recommend you use a dedicated Hadoop cluster in the production environment.
To disable the built-in Hadoop services for the Druid cluster, go to the EMR console, select the target service pane on the Cluster services page, and click Operation > Pause service to suspend the service.
Configuring connectivity between Hadoop and Druid clusters
This section describes how to configure the connectivity between the Hadoop and Druid clusters. If you use the built-in Hadoop cluster in the Druid cluster (which is not recommended for the production environment), they can be properly connected with no additional settings required, and you can skip this section.
If you want to store the index data in the HDFS of another independent Hadoop cluster (which is recommended for the production environment), you need to configure the connectivity between the two clusters in the following steps:
1. Make sure that the Druid and Hadoop clusters can properly communicate with each other.
The two clusters should be in the same VPC. If they are in different VPCs, the two VPCs should be able to communicate with each other (through CCN or Peering Connection, for example).
2. Copy the core-site.xml, hdfs-site.xml, yarn-site.xml, and mapred-site.xml files in /usr/local/service/hadoop/etc/hadoop in the Hadoop cluster and paste them in /usr/local/service/druid/conf/druid/_common on each node in the E-MapReduce Druid cluster.
Note:
As the Druid cluster has a built-in Hadoop cluster, the relevant soft links to the files above already exist in the Druid path. You need to delete them first before copying the configuration files of another Hadoop cluster. In addition, you need to make sure that the file permissions are correct so that the files can be accessed by the hadoop user.
3. Modify the common.runtime.properties configuration file in Druid configuration management, save the change, and restart the Druid cluster services.
druid.storage.type: It defaults to hdfs and does not need to be modified
druid.storage.storageDirectory:
If the target Hadoop cluster is non-HA: hdfs://{namenode_ip}:4007
If the target Hadoop cluster is HA: hdfs://HDFSXXXXX
Configure the full path, which can be found in the `fs.defaultFS` configuration item in the `core-site.xml` file of the target Hadoop cluster.
Using COS
E-MapReduce Druid can use COS as the deep storage. This section describes how to configure COS as the deep storage of the Druid cluster.
First, you need to make sure that COS has been activated for both the Druid cluster and the target Hadoop cluster. You can activate COS when purchasing the clusters or configure COS in the EMR console after purchasing them.
1. Modify the common.runtime.properties configuration file in Druid configuration management:
druid.storage.type: hdfs
druid.storage.storageDirectory: cosn://{bucket_name}/druid/segments
You can create the segments directory on COS and set its permissions in advance.
2. Modify the core-site.xml configuration file in HDFS configuration management:
Set fs.cosn.impl to org.apache.hadoop.fs.CosFileSystem.
Add a new configuration item fs.AbstractFileSystem.cosn.impl and set it to org.apache.hadoop.fs.CosN.
3. Put the JAR packages related to hadoop-cos (such as cos_api-bundle-5.6.69.jar and hadoop-cos-2.8.5-8.1.6.jar) into the /usr/local/service/druid/extensions/druid-hdfs-storage, /usr/local/service/druid/hadoopdependencies/hadoop-client/2.8.5, and /usr/local/service/hadoop/share/hadoop/common/lib/ directories on each node of the cluster.
Save the configuration and restart the Druid cluster services.
Modifying Druid parameters
After you create the E-MapReduce Druid cluster, a set of configuration items will be generated automatically. However, we recommend you modify the memory configuration as needed to achieve the optimal performance. You can do so on the [Configurations]https://www.tencentcloud.com/document/product/1026/31109) page in the EMR console.
When modifying the configuration, make sure that the modification is correct:
MaxDirectMemorySize >= druid.processing.buffer.sizeByte *(druid.processing.numMergeBuffers + druid.processing.numThreads + 1) 
Modification suggestion:
druid.processing.numMergeBuffers = max(2, druid.processing.numThreads / 4)
druid.processing.numThreads =  Number of cores - 1 (or 1)
druid.server.http.numThreads = max(10, (Number of cores * 17) / 16 + 2) + 30
For more information on the configuration, see Configuration reference.
Using a router as a query node
Currently, a Druid cluster deploys the Broker process on the EMR master node by default. As there are many processes deployed on the master node, they may interfere with each other, which may lead to insufficient memory and compromise the query efficiency. In addition, many businesses require that the query nodes and core nodes be separately deployed. In this case, you can add one or more router nodes in the console and install the Broker processes so as to scale out the query nodes of the Druid cluster.
Accessing the web
You can access the Druid cluster in the console through the port 18888 on the master node and configure the public IP on your own. After opening port 18888 in the security group and setting the bandwidth, you can access the cluster at [http://{masterIp}:18888]().

Help and Support

Was this page helpful?

You can also Contact sales or Submit a Ticket for help.

Help us improve! Rate your documentation experience in 5 mins.

Feedback

tencent cloud

Elastic MapReduce

Druid Usage

Purchase suggestions

Configuring connectivity between Hadoop and Druid clusters

Using COS

Modifying Druid parameters

Using a router as a query node

Accessing the web

Help and Support