In a typical Spark-HDFS offline data analytics framework, RDD reads/writes and shuffle writes of disks by are all sequential IO, except for the random shuffle read I/O. The sequential I/O accounts for 95%. CBS features an excellent multi-thread concurrent throughput performance, enabling efficient offline data processing at the terabyte and petabyte levels for Hadoop-Mapreduce, HDFS, and Spark.
With multi-disk concurrency, a single HDFS cluster can achieve a throughput up to 1 GB/s.
CBS supports big data applications such as data analytics, data mining, and business intelligence for companies like Xiaohongshu, Giant Interactive Group, Ele, Yoho!BUY, and wepiao.com.
Deployment environment: 5 CVM servers (with 12-Core 40 GB RAM), simulating offline data analysis of 1.5TB data volume.
Test performance:
SSD is ideal for scenarios with high requirements for IO performance and data reliability. It is particularly suitable for medium and large relational database applications like PostgreSQL, MySQL, Oracle, and SQL Server; for IO-intensive core business systems with high data reliability requirements; and for medium and large development and testing environments with high data reliability requirements.
SSD offers both data reliability and high performance. It constantly provides reliable support for companies such as Heroes Evolved, Wendao, Yoho!BUY, weipiao.com, Xiaohongshu, etc.
Deployment environment: 4 CVM servers (with 4-Core 8 GB RAM). Each has one 800 GB SSD cloud disk mounted, with MySQL version 5.5.42 deployed.
Test performance: simulate OLTP performance testing using sysbench, with a test set of 10 million records. In this test, TPS and QPS reach 1,616 and 29,000 respectively, meaning a single disk is sufficient to support 10 thousand concurrent transactions per second.
Was this page helpful?