Big data instances are designed specifically for scenarios like Hadoop distributed computing, massive log processing, distributed file system, large data warehouse, and more. This CVM instance type is mainly used to solve the cloud computing and storage problems of massive business data.
Big data instances are applicable to customers in the Internet, game, finance and other industries that require big data computing and storage analysis, as well as business scenarios that require massive data storage and offline computing. They can meet the storage, capacity and private network bandwidth requirements of distributed computing businesses represented by Hadoop.
In addition, with the highly available architectural framework of distributed computing services such as Hadoop, big data instances use local storage, making the total cost close to that of a self-built Hadoop cluster in IDC, while ensuring massive storage capacity and high performance.
Note:For more information on instance specifications, see the “Big Data Family” section in Instance Types.
Big data instances use local disk as the data disk, which may lose data (e.g., when the host crashes). If your application cannot guarantee data reliability, we recommend you choose an instance that can use cloud disks as the data disk.
The table below shows you the local disk data status after you perform different operations on an instance with the local disk.
Operation | Local Disk Data Status | Description |
---|---|---|
Log in to an instance to restart it, restart an instance on the console, or forcibly restart an instance | Retained | Local disk storage as well as the data is retained. |
Log in to an instance to shut it down, shut down an instance on the console, or forcibly shut down an instance | Retained | Local disk storage as well as the data is retained. |
Terminate (instance) on the console | Erased | Local disk storage is erased. No data is retained. |
Note:Do not store business data that needs to be retained for a long time on a local disk. Back up data in advance and use a highly available architecture. We recommend you store the data on a CBS disk for long-term retention.
Local disk can only be purchased when a big data instance is created. The instance specifications determine the number and capacity of local disks you can purchase.
No.
No.
Big data instances feature massive data storage and use local HDD as data disk. To prevent data loss (when the host crashes or local disk is damaged), we recommend you use a redundancy policy, for example, a file system that supports redundancy and fault tolerance (such as HDFS and Mapr-FS). We also recommend you regularly back up data to a persistent storage system, such as Tencent COS. For more information, please see Cloud Object Storage.
If the local disk is damaged, you will need to shut down the CVM instance for us to replace the local disk. We will notify you and perform fixes if the CVM instance crashes.
High IO I2 instances featuring ultra-high IOPS are designed specifically for business scenarios with low latency and high random IO. They are suitable for high-performance database (relational database, NoSQL, etc.). Big data instances are designed specifically for business scenarios that require high sequential read/write and low-cost massive data storage. This type features high storage cost-efficiency and private network bandwidth.
Big data D2 instances boast local disks with the sequential read/write throughput as follows.
Cloud Block Storage is a highly available, highly reliable, low-cost, and customizable block storage device. It can be used as an independent and scalable disk for CVM, providing efficient and reliable storage devices. It provides data storage at the data block level and employs a three-copy distributed mechanism to ensure data reliability for CVM instance, meeting the requirements of different use cases. The local disk of big data instances is designed specifically for business scenarios that require high sequential read/write for massive local data sets, such as Hadoop distributed computing, large-scale parallel computing, and data warehouse.
Was this page helpful?