Overview
Use Cases
Node Specification
Cluster Architecture
CREATE TABLE <table_name> (...) [ DISTRIBUTED BY (<column> [,..] ) | DISTRIBUTED RANDOMLY | DISTRIBUTED REPLICATED ]
CREATE TABLE statement supports the following three distribution policy clauses:DISTRIBUTED BY (column, [ ... ]) specifies to distribute data rows among nodes (segments) according to the hash value of the distribution column. The same value will be always hashed to the same segment. Choosing a unique distribution key (such as the primary key) will ensure a more even data distribution. Hash distribution is the default distribution policy for tables, and if the DISTRIBUTED clause is not provided when the table is created, the primary key or the first eligible column of the table will be used as the distribution key. If there are no eligible columns in the table, the distribution policy will degrade to random distribution.DISTRIBUTED RANDOMLY specifies to distribute data evenly among nodes (segments) in a circular manner. Unlike in the hash distribution policy, data rows with the same value are not necessarily located on the same segment. Although random distribution ensures an even data distribution, it is only recommended when the table doesn't have a suitable discretely distributed data column that can be used as the hash distribution column.DISTRIBUTED REPLICATED specifies to distribute data in a replicated manner; that is, each node (segment) has all the data in the table. In this distribution policy, data is evenly distributed as each segment stores the same data rows. When large tables are joined with small tables, specifying a sufficiently small table as replicated may also improve the performance.

CREATE TABLE products (name varchar(40),prod_id integer,supplier_id integer)DISTRIBUTED BY (prod_id);
CREATE TABLE random_stuff (things text,doodads text,etc text)DISTRIBUTED RANDOMLY;
CREATE TABLE replicated_stuff (things text,doodads text,etc text)DISTRIBUTED REPLICATED;
UPDATE and DELETE statements, Tencent Cloud TCHouse-P has the feature of pruning segments by distribution key. For example, if the products table uses prod_id as the distribution key, the following query will only be sent to segments that satisfy prod_id=101 for execution, which greatly improves the SQL execution performance:select * from products where prod_id = 101;
create table t1(c1 int, c2 int) distributed by (c1,c2);

create table t1(c1 int, c2 int, primary key (c1)) distributed by (c2);will fail to create.
create table t1(c1 int, c2 int) distributed by (c1);select gp_segment_id,count(1) from t1 group by 1 order by 2 desc;gp_segment_id | count---------------+--------0 | 10001 | 68(2 rows)
ALTER TABLE <table_name> SET WITH (REORGANIZE=true)DISTRIBUTED BY (<distribution columns>);
Was this page helpful?
You can also Contact sales or Submit a Ticket for help.
Help us improve! Rate your documentation experience in 5 mins.
Feedback