This document describes how to create an EMR cluster in the EMR console.
Log in to the EMR console and click Create Cluster on the cluster list page.
Region: A region is the physical location of an IDC. Currently supported regions include Guangzhou, Shanghai, Beijing, Singapore, Silicon Valley, Chengdu, Nanjing, and Mumbai. Tencent Cloud products in different regions cannot communicate with each other over a private network.
Use Cases: Hadoop clusters support five use cases, namely, the default use case, ZooKeeper, HBase, Presto, and Kudu. You can choose one to deploy as needed.
Product Version and Components to Deploy: EMR recommends some commonly used combinations of components for Hadoop. You can also combine the components based on your needs.
Kerberos Secure Cluster: It specifies whether to enable Kerberos authentication for the cluster. This feature is not required for individual users and disabled by default.
Software Configuration: You can create a cluster by entering custom parameters as required. The external cluster access feature is provided as well, so that you can read/write external cluster data after configuring the correct address information in relevant parameters.
Billing Mode: Pay-as-you-go.
AZ: Different AZs in the same region support different models and specifications. Tencent Cloud products in different regions cannot communicate with each other over a private network. The AZ cannot be changed after purchase. We recommend you select a latest AZ close to the region of your business data so as to reduce the access latency and increase the download speed.
Cluster Network: To ensure the security of the EMR cluster, all nodes of the cluster are placed in a VPC; therefore, you need to set up a VPC before you can successfully create the EMR cluster.
Remote Login: Port 22 is usually used for remote login and opened on the newly created security group by default. You can close it based on your business needs.
High Availability: High availability is enabled by default. The deployed number of different types of nodes varies by cluster type and use case under HA or non-HA mode. For more information, see [Cluster Types].
Node Configuration: EMR offers multiple node types. You can select an appropriate model configuration for each node type based on your business needs.
Currently, up to 15 cloud disks in multiple types can be mounted to a core, task, or router node (one type can be selected only once).
Placement Group: It is a policy for distributing and placing CVM instances on the underlying hardware. For more information, see [Placement Group].
Hive Metadatabase: When you choose to deploy the Hive component, there are two storage methods for Hive metadata: you can store the metadata in a MetaDB instance separately purchased for the cluster (default method) or associate the metadata with EMR-MetaDB or a self-built MySQL database. In the latter case, metadata will be stored in the associated database and will not be deleted when the cluster is terminated.
Project: Assign the current cluster to a project. To assign an instance to a new project, create a project first. For detailed directions, see [Creating Project].
Cluster Name: EMR clusters are differentiated by cluster name.
Login Method: Currently, EMR provides two ways to log in to cluster services, nodes, and MetaDB: custom password and associated key. SSH keys are only used for logging in to the EMR-UI quick entry. The default username is "root", and the username for the WebUI quick entry of the Superset component is "admin".
Add Bootstrap Action: A bootstrap action is a custom script executed when a cluster is created to help you modify the cluster environment, install third-party software, and use your own data.
Tag: You can add tags to clusters or node resources during resource creation to facilitate resource management. Up to 5 tags can be bound, and the tag key must be unique.
Auto-Renewal: From seven days before the cluster expires, the system will check whether your account balance is sufficient every day in order to renew the cluster resources with auto-renewal enabled.
After completing the configurations above, click Purchase to make the payment. Your EMR cluster will be automatically created once the payment is received. Wait for about ten minutes and then you will find the cluster you just created in the EMR console.
You can view the information of each node in the CVM console. To ensure your EMR cluster works properly, do not modify such information.
After the cluster is successfully created, you can log in to it accordingly to perform further configuration and other operations on it. For specific operations, see the following documents: