Hive Best Practices
Last updated: 2019-07-26 17:48:35PDF
Hive on Tencent Cloud Elastic MapReduce (EMR) currently supports three execution engines:
Select TEZ, if needed, when you purchase a EMR cluster. Normally, TEZ is the recommended execution engine due to higher computing efficiency.
Currently, Tencent Cloud supports the following storage media: local data disk, HDD cloud disk, SSD cloud disk, and COS (most cost-efficient).
Tencent Cloud supports multiple compression algorithms such as Snappy and LZO. We recommend you to format your files in ORC or Parquet for greater space utilization and computing efficiency.
How to choose the best query engine?
Currently, EMR supports SQL-based query engine Presto, SparkSQL, and Hive. Presto allows you to query a variety of data sources, while SparkSQL is good for applications that require low-latency. For querying general database, use Hive + TEZ.
If you use COS as underlying storage, external tables are recommended to prevent the data from being accidentally deleted; if your data is stored in HDFS, we recommend you to enable the HDFS Recycle Bin to recover the deleted data.