An easy to use, fully managed, high performance and highly elastic cloud distributed PB-level data warehousing suite
Tencent Sparkling Data Warehouse Suite provides you with a fully-managed, easy-to-use and high-performance petabyte-level cloud data warehousing solution. Based on the industry-leading Apache Spark framework, Sparkling enables you to create an enterprise-grade distributed cloud data warehouse with thousands of nodes in a few minutes that can be flexibly scaled as needed. Sparkling features Data Studio, a one-stop big data development and science platform, for cluster management, data integration, metadata management, workflow development, data processing and result visualization. It deeply integrates Business Intelligence for application data mart construction, offline processing of massive amounts of data, data modeling, ad hoc query analysis, data mining and visual exploration. Plus, its cross-data source conjoint analysis feature allows you to easily analyze data on data engines such as COS and CDB, helping you focus on the mining and exploration of data value.
Sparkling is equipped with powerful elastic scalability. Computation and storage are separated, and working nodes of a cluster are divided into core nodes and elastic compute nodes. Manual and/or automated scale-out of high numbers of nodes and scale-up/down of computing and storing devices can be quickly achieved using the Tencent Cloud Console or cloud API. Automated elastic scale-in is available to elastic compute nodes to meet changing business scale.
Sparkling boasts Data Studio, a one-stop data engineering and science platform, which enables cluster management monitoring, data integration, metadata management, data ETL, data processing and computation, data analysis visualization, workflow task management and collaboration in a visualized manner, eliminating cumbersome OPS and parameter adjustment work for the underlying infrastructure and data warehouse cores. It is fully compatible with the ANSI SQL 2003 standard, enabling the construction of enterprise-class data warehouses using standard SQL.
Sparkling supports the expansion of COS cloud storage to achieve unlimited storage capacity. It supports high-speed data import from a wide variety of tools and data sources such as traditional relational databases, CKafka and K-V databases to achieve convergent analysis of multi-source cloud data.
Based on the Apache Spark ecosystem and leveraging innovative technologies such as distributed multi-level caching, index optimization, off-heap memory management, high-performance columnar storage and CBO optimization, Sparkling supports high-performance parallel loading and accessing and multi-dimensional exploration of data, with a batch processing efficiency several times higher than traditional databases.
Sparkling features a three-copy data storage mechanism with master-slave nodes, enabling imperceptible failover and disaster recovery backup. User clusters are deployed separately and support VPC isolation, offering multiple layers of data access security. User behaviors are logged for auditing purpose to protect the security of your data.
To enable ad hoc analysis and pay-as-you-go pricing, Sparkling is designed to be serverless and can be used out of the box with zero deployment and OPS costs. You do not have to purchase or manage clusters; instead, you simply pay for what you use. This pricing method allows you to enjoy high cost effectiveness with guaranteed performance and security.
The exclusive usage mode of Sparkling provides cluster management and monitoring modules that support cluster creation, automated scaling, cluster configuration, start/stop and intelligent resource monitoring and alarming. Daily OPS and cluster performance tuning can be performed using the cluster management function.
Sparkling is capable of accessing and integrating a wide range of heterogeneous data sources. Data from traditional relational databases, COS, local files and K-V data storage can be extracted, transformed and loaded into the storage of Sparkling using the Data Studio console.
Sparkling offers a task scheduling and management module that supports time-driven and event-driven DAG task scheduling. Meanwhile, it features comprehensive task monitoring capabilities to facilitate ETL and processing of OPS data.
Sparkling has a metadata management module that enables registration, import, storage, retrieval, export and release of technical, management and business metadata, while providing a rich set of data management capabilities such as data maps, data dictionaries, data lineage and impact analysis, metadata version management, metadata statistical analysis and data quality reports.
Sparkling is equipped with a data development module based on the Notebook architecture for ETL, data processing, data computation and more programming needs. The beta version currently only supports SQL. Common scripting languages such as Python, Scala, Java and R will be supported in the future.
Sparkling features a project management module that enables you to create project spaces according to your organization's internal product lines, teams and projects and manage project personnel and notebooks.
Sparkling effectively meets the urgent needs of industries such as gaming, finance, retail and industrial engineering by providing a tool to centrally manage and analyze management and business data of user behaviors, staffing, procurement, sales, assets and supply chain, so that a comprehensive view of global data can be generated to help understand overall operational conditions and make rapid and accurate decisions.
Featuring a log standardization and normalization mechanism, Sparkling enables you to conveniently analyze petabytes of structured or semi-structured data such as user behavior and system logs, generate cookie-based consumer profiles and personalize recommendations to users. This significantly improves the efficiency of targeted marketing. Moreover, it supports real-time data access and in-depth integration with COS.
With the aid of its easy-to-use machine learning framework, interactive collaborative programming environment and real-time data query and analysis capabilities, Sparkling provides data scientists with powerful tools for data modeling. Plus, it enables business managers to refine corporate operations and helps them enhance business insight capabilities.