Elastic MapReduce

Elastic and open-source cloud-based Hadoop service

overview

Combining cloud computing and community open-source technologies such as Hadoop, Hive, Spark, HBase, Presto, and Storm, Tencent Cloud Elastic MapReduce (EMR) provides secure and cost-effective cloud-based Hadoop services featuring high reliability and elastic scalability. Create a secure and reliable Hadoop cluster in a matter of minutes to analyze petabytes of data stored on the data nodes in the cluster or in COS.

benefits

Flexibility

EMR allows you to launch a secure and reliable dedicated Hadoop cluster in minutes in the web-based console or via APIs. You can mix and match suitable editions of big data components such as Hive, Spark, HBase, and Presto catering to your different business departments. Monitoring alarm configuration and OPS operations can be easily performed for nodes, components, and processes in the console.

Elasticity

EMR enables you to scale the managed Hadoop clusters manually or automatically according to your business curve or monitoring metrics. Through separation of storage and computation, it even allows you to terminate a cluster to maximize resource efficiency.

Reliability

EMR supports hot failover for CBS-based nodes; its design features a master/slave disaster recovery mechanism where the slave node starts within seconds when the master node fails, ensuring high availability of big data services. The metadata of its components such as Hive supports remote disaster recovery. The computation-storage separation ensures high data persistence for COS data storage. EMR is equipped with a comprehensive monitoring system that helps you quickly identify and locate cluster exceptions to ensure stable cluster operations.

Security

VPCs provides a convenient network isolation method that facilitates your network policy planning for managed Hadoop clusters. Network ACLs and security groups can be created to filter traffic at the subnet and host levels, to meet your network security needs in all aspects. Tencent Cloud security reinforcement service provides an integrated security solution for EMR clusters, covering network protection, intrusion detection and vulnerability protection.

Features

Quick Deployment

It takes just three steps to launch a dedicated big data cluster in the EMR Console. EMR provides a wide range of open-source big data components which can be mixed and matched as needed during cluster creation, including but not limited to Hive, Spark, HBase, Presto, Flink, and Storm. EMR also offers secure and cost-effective cloud-based Hadoop services featuring high reliability and elastic scalability. In addition, you can use APIs to retain creation parameters to recreate and terminate clusters as often as you want.

EMR supports cluster deployment on a variety of models, allowing you to configure the type and amount of cluster CPU, memory, and storage capacity for different business scenarios.

scenarios

Offline Data Analytics

After syncing massive amounts of log from business servers such as games, web applications and mobile apps to EMR nodes or COS, you can use tools like Hue to leverage Hive, Spark, Presto and other mainstream computing frameworks to gain data insights promptly.

HBase

HBase is a highly scalable column-based distributed big data storage system. EMR supports native HBase components, allowing you to create and use the managed HBase clusters conveniently and promptly. Low-latency SQL access to HBase databases is made possible with the aid of the Phoenix tool.

Streaming Data Processing

After the data generated in real time on business servers is pushed to the messaging middleware through APIs and SDKs in programs/tools, it can be analyzed in EMR by selecting the appropriate streaming data processing engine to implement real-time data computation and decision-making.

COS Data Analytics

Massive amounts of data stored in COS can be quickly analyzed by EMR for complete storage-computation separation. This enables you to take full advantage of various data synchronization tools provided by COS. In addition, you can use multiple Hadoop cluster versions to analyze the same data to achieve data consistency and resolve legacy issues caused by the coexistence of multi-version Hadoop clusters.