Release Notes and Announcements
- Release Notes
- Announcements
- Security Announcements
Product Introduction
- Overview
- Strengths
- Architecture
- Features
- Use Cases
- Constraints and Limits
- Technical Support Scope
- Product release
Purchase Guide
- EMR on CVM Billing Instructions
- EMR on TKE Billing Instructions
- EMR Serverless HBase Billing Instructions
- EMR Serverless TCBase Billing Overview
Getting Started
- EMR on CVM Quick Start
- EMR on TKE Quick Start
EMR on CVM Operation Guide
- Planning Cluster
- Administrative rights
- Configuring Cluster
- Managing Cluster
- Managing Service
- Monitoring and Alarms
- TCInsight
EMR on TKE Operation Guide
- Introduction to EMR on TKE
- Configuring Cluster
- Cluster Management
- Service Management
- Monitoring and Ops
- Application Analysis
EMR Serverless HBase Operation Guide
- EMR Serverless HBase Product Introduction
- Quotas and Limits
- Planning an Instance
- Managing an Instance
- Monitoring and Alarms
- Development Guide
EMR Serverless TCBase Operation Guide
- Introduction to EMR Serverless TCBase
- Managing Instances
- Managing Services
- Monitoring and Alarms
EMR Development Guide
- Hadoop Development Guide
- Spark Development Guide
- HBase Development Guide
- Phoenix on Hbase Development Guide
- Hive Development Guide
- Presto Development Guide
- Sqoop Development Guide
- Hue Development Guide
- Oozie Development Guide
- Flume Development Guide
- Kerberos Development Guide
- Knox Development Guide
- Alluxio Development Guide
- Kylin Development Guide
- Livy Development Guide
- Kyuubi Development Guide
- Zeppelin Development Guide
- Hudi Development Guide
- Superset Development Guide
- Impala Development Guide
- Druid Development Guide
- TensorFlow Development Guide
- Kudu Development Guide
- Ranger Development Guide
- Kafka Development Guide
- StarRocks Development Guide
- Flink Development Guide
- JupyterLab Development Guide
- MLflow Development Guide
Practical Tutorial
- Practice of EMR on CVM Ops
- Data Migration
- Practical Tutorial on Custom Scaling
API Documentation
- History
- Introduction
- API Category
- Making API Requests
- Cluster Resource Management APIs
- Cluster Services APIs
- User Management APIs
- Information Query APIs
- Scaling APIs
- Configuration APIs
- Other APIs
- Cluster Lifecycle APIs
- Serverless HBase APIs
- YARN Resource Scheduling APIs
- Data Types
- Error Codes
FAQs
- EMR on CVM
Service Level Agreement
Contact Us

Software Configuration

ダウンロード

フォーカスモード

フォントサイズ

最終更新日: 2023-12-27 10:02:29

Feature
Software configuration enables you to customize configurations of components such as HDFS, YARN, and Hive when creating a cluster.
Custom Software Configuration
Software programs such as Hadoop and Hive have many configuration items. By using the software configuration feature, you can customize component parameters when creating a cluster. During the configuration, you need to provide the corresponding JSON files as required. You can customize the files or generate them by exporting software configuration parameters of an existing cluster for quick cluster creation. For more information on how to export software configuration parameters, please see Exporting Software Configuration.
Currently, only parameters in the following files can be customized:
HDFS: core-site.xml, hdfs-site.xml, hadoop-env.sh, log4j.properties
YARN: yarn-site.xml, mapred-site.xml, fair-scheduler.xml, capacity-scheduler.xml, yarn-env.sh, mapred-env.sh
Hive: hive-site.xml, hive-env.sh, hive-log4j2.properties
Sample JSON file and description:
[
    {
        "serviceName": "HDFS",
        "classification": "hdfs-site.xml",
        "serviceVersion": "2.8.4",
        "properties": {
            "dfs.blocksize": "67108864",
            "dfs.client.slow.io.warning.threshold.ms": "900000",
            "output.replace-datanode-on-failure": "false"
        }
    },
    {
        "serviceName": "YARN",
        "classification": "yarn-site.xml",
        "serviceVersion": "2.8.4",
        "properties": {
            "yarn.app.mapreduce.am.staging-dir": "/emr/hadoop-yarn/staging",
            "yarn.log-aggregation.retain-check-interval-seconds": "604800",
            "yarn.scheduler.minimum-allocation-vcores": "1"
        }
    },
    {
        "serviceName": "YARN",
        "classification": "capacity-scheduler.xml",
        "serviceVersion": "2.8.4",
        "properties": {
            "content": "<?xml version=\\"1.0\\" encoding=\\"UTF-8\\"?>\\n<?xml-stylesheet type=\\"text/xsl\\" href=\\"configuration.xsl\\"?>\\n<configuration><property>\\n        <name>yarn.scheduler.capacity.maximum-am-resource-percent</name>\\n        <value>0.8</value>\\n</property>\\n<property>\\n        <name>yarn.scheduler.capacity.maximum-applications</name>\\n        <value>1000</value>\\n</property>\\n<property>\\n        <name>yarn.scheduler.capacity.root.default.capacity</name>\\n        <value>100</value>\\n</property>\\n<property>\\n        <name>yarn.scheduler.capacity.root.default.maximum-capacity</name>\\n        <value>100</value>\\n</property>\\n<property>\\n        <name>yarn.scheduler.capacity.root.default.user-limit-factor</name>\\n        <value>1</value>\\n</property>\\n<property>\\n        <name>yarn.scheduler.capacity.root.queues</name>\\n        <value>default</value>\\n</property>\\n</configuration>"
        }
    }
]
Configuration parameter descriptions:
serviceName: component name, which must be in uppercase.
classification: filename, which must be a full name with file extension.
serviceVersion: component version, which must be the same as the corresponding component version in the EMR product version.
properties: parameters that need to be customized.
If you want to modify configuration parameters in capacity-scheduler.xml or fair-scheduler.xml, set key in properties to content, and set value to the content of the entire file.
If you want to adjust the component configuration of an existing cluster, you can configure component parameters.
Accessing External Clusters
After configuring the HDFS access address information of an external cluster, you can read data in it.
Configuration during purchase
EMR allows you to configure access to external clusters when you create an EMR cluster. On the purchase page, you only need to import a JSON file that meets the requirements in the software configuration section. Below is an example based on assumption:
Assumption
Assume that the nameservice of the external cluster to be accessed is HDFS8088, and the access method is as follows:
<property>
    <name>dfs.ha.namenodes.HDFS8088</name>
    <value>nn1,nn2</value>
</property>
<property>
    <name>dfs.namenode.http-address.HDFS8088.nn1</name>
    <value>172.21.16.11:4008</value>
</property>
<property>
    <name>dfs.namenode.https-address.HDFS8088.nn1</name>
    <value>172.21.16.11:4009</value>
</property>
    <name>dfs.namenode.rpc-address.HDFS8088.nn1</name>
    <value>172.21.16.11:4007</value>
<property>
    <name>dfs.namenode.http-address.HDFS8088.nn2</name>
    <value>172.21.16.40:4008</value>
</property>
<property>
    <name>dfs.namenode.https-address.HDFS8088.nn2</name>
    <value>172.21.16.40:4009</value>
</property>
<property>
    <name>dfs.namenode.rpc-address.HDFS8088.nn2</name>
    <value>172.21.16.40:4007</value>
<property>
JSON file and description:
Taking the assumption as an example, import the JSON file (the requirements for its content are the same as those for custom software configuration) in the box.
[
    {
        "serviceName": "HDFS",
        "classification": "hdfs-site.xml",
        "serviceVersion": "2.7.3",
        "properties": {
            "newNameServiceName": "newEmrCluster",
            "dfs.ha.namenodes.HDFS8088": "nn1,nn2",
            "dfs.namenode.http-address.HDFS8088.nn1": "172.21.16.11:4008",
            "dfs.namenode.https-address.HDFS8088.nn1": "172.21.16.11:4009",
            "dfs.namenode.rpc-address.HDFS8088.nn1": "172.21.16.11:4007",
            "dfs.namenode.http-address.HDFS8088.nn2": "172.21.16.40:4008",
            "dfs.namenode.https-address.HDFS8088.nn2": "172.21.16.40:4009",
            "dfs.namenode.rpc-address.HDFS8088.nn2": "172.21.16.40:4007"
        }
}
]
Configuration parameter description
serviceName: component name, which must be "HDFS".
classification: filename, which must be "hdfs-site.xml".
serviceVersion: component version, which must be the same as the corresponding component version in the EMR product version.
properties: the content must be the same as that in the assumption.
newNameServiceName: it indicates the nameservice of the newly created cluster, which is optional. If this parameter is left empty, its value will be generated by the system; if it is not empty, its value must consist of a string, digits, and hyphen.
Access to external clusters is supported only for high-availability clusters.
Access to external clusters is supported only for clusters with Kerberos disabled.
Configuration after purchase
EMR allows you to use the configuration distribution feature to access external clusters after creating an EMR cluster.
Below is the assumption:
Assume that the nameservice of the cluster is HDFS80238 (if it is not a high-availability cluster, the nameservice will usually be masterIp:rpcport, such as 172.21.0.11:4007).
The nameservice of the external cluster to be accessed is HDFS8088, and the access method is as follows:
<property>
    <name>dfs.ha.namenodes.HDFS8088</name>
    <value>nn1,nn2</value>
</property>
<property>
        <name>dfs.namenode.http-address.HDFS8088.nn1</name>
        <value>172.21.16.11:4008</value>
</property>
<property>
        <name>dfs.namenode.https-address.HDFS8088.nn1</name>
        <value>172.21.16.11:4009</value>
</property>
 <name>dfs.namenode.rpc-address.HDFS8088.nn1</name>
        <value>172.21.16.11:4007</value>
﻿
        <property>
        <name>dfs.namenode.http-address.HDFS8088.nn2</name>
        <value>172.21.16.40:4008</value>
</property>
<property>
        <name>dfs.namenode.https-address.HDFS8088.nn2</name>
        <value>172.21.16.40:4009</value>
</property>
<property>
        <name>dfs.namenode.rpc-address.HDFS8088.nn2</name>
        <value>172.21.16.40:4007</value>
<property>
﻿
If the information above is in the EMR cluster, you can view it on the management page of configuration distribution or log in to the server to view the /usr/local/service/hadoop/etc/hadoop/hdfs-site.xml file.
1. Enter the Configuration Distribution page and select the hdfs-site.xml file of the HDFS component.
2. Change the value of the configuration item dfs.nameservices to HDFS80238,HDFS8088.
3. Add configuration items and their values.
Configuration Item  
Value
 dfs.ha.namenodes.HDFS8088
nn1，nn2 
fs.namenode.http-address.HDFS8088.nn1
172.21.16.11:4008  
 dfs.namenode.https-address.HDFS8088.nn1 
 172.21.16.11:4009    
dfs.namenode.rpc-address.HDFS8088.nn1
172.21.16.11:4007
fs.namenode.http-address.HDFS8088.nn2
172.21.16.40:4008
dfs.namenode.https-address.HDFS8088.nn2
172.21.16.40:4009
 dfs.namenode.rpc-address.HDFS8088.nn2
172.21.16.40:4007
dfs.client.failover.proxy.provider.HDFS8088
org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
dfs.internal.nameservices
HDFS80238 
dfs.internal.nameservice needs to be added; otherwise, if the cluster is scaled out, the "datanode" may report an error and be marked as "dead" by the namenode.
4. Distribute the configuration by using the Configuration Distribution feature.
For more information on relevant configurations and principles, please see the community documentation.

ヘルプとサポート

この記事はお役に立ちましたか？

営業担当者にお問い合わせいただくかチケットを提出してサポートを求めることができます。

フィードバック

tencent cloud

Elastic MapReduce

Software Configuration

Feature

Custom Software Configuration

Accessing External Clusters

Configuration during purchase

Configuration parameter description

Configuration after purchase

ヘルプとサポート

Configuration Item	Value
dfs.ha.namenodes.HDFS8088	nn1，nn2
fs.namenode.http-address.HDFS8088.nn1	172.21.16.11:4008
dfs.namenode.https-address.HDFS8088.nn1	172.21.16.11:4009
dfs.namenode.rpc-address.HDFS8088.nn1	172.21.16.11:4007
fs.namenode.http-address.HDFS8088.nn2	172.21.16.40:4008
dfs.namenode.https-address.HDFS8088.nn2	172.21.16.40:4009
dfs.namenode.rpc-address.HDFS8088.nn2	172.21.16.40:4007
dfs.client.failover.proxy.provider.HDFS8088	org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
dfs.internal.nameservices	HDFS80238