Overview
Concepts
Cluster Architecture
Strengths
Scenarios
__DORIS_DELETE_SIGN__ to implement this. Since we are only doing batch deletion on the unique model, we just need to add a hidden column with a Boolean type of aggregate function replace. In BE, all aggregate write procedures are the same as those of a normal column. There are two read schemes:
When FE encounters select * etc. expansion, it excludes the __DORIS_DELETE_SIGN__ column and defaults to add a __DORIS_DELETE_SIGN__ != true condition. BE always adds a column during reading, to determine whether to delete it according to the condition.DELETE ON expression. Other aggregate behaviors are the same as the column with the aggregate function replace.__DORIS_DELETE_SIGN__ != true to all olapScanNodes that have the hidden column. BE is unaware of this process and operates normally.enable_batch_delete_by_default=true in the FE configuration file. After FE is rebooted, all new tables will support batch deletion. This option is set to false by default.ALTER TABLE tablename ENABLE FEATURE "BATCH_DELETE". This operation is essentially a schema change operation, which returns immediately, and the completion of the operation can be confirmed with the show alter table column command.SET show_hidden_columns=true, and then use the desc tablename command. If the output includes the column __DORIS_DELETE_SIGN__, batch deletion is supported; if not, it is not supported.Stream Load method adds a column field in the header to set the delete mark column, as shown in the example -H "columns: k1, k2, label_c3" -H "merge_type: [MERGE|APPEND|DELETE]" -H "delete: label_c3=1".Broker Load involves setting the delete mark column in the PROPERTIES section. The syntax is as follows:LOAD LABEL db1.label1([MERGE|APPEND|DELETE] DATA INFILE("hdfs://abc.com:8888/user/palo/test/ml/file1")INTO TABLE tbl1COLUMNS TERMINATED BY ","(tmp_c1,tmp_c2, label_c3)SET(id=tmp_c2,name=tmp_c1,)[DELETE ON label_c3=true])WITH BROKER 'broker'("username"="user","password"="pass")PROPERTIES("timeout" = "3600");
Routine Load adds a map to the columns field, where the map follows the same syntax as aboveCREATE ROUTINE LOAD example_db.test1 ON example_tbl[WITH MERGE|APPEND|DELETE]COLUMNS(k1, k2, k3, v1, v2, label),WHERE k1 > 100 and k2 like "%doris%"[DELETE ON label=true]PROPERTIES("desired_concurrent_number"="3","max_batch_interval" = "20","max_batch_rows" = "300000","max_batch_size" = "209715200","strict_mode" = "false")FROM KAFKA("kafka_broker_list" = "broker1:9092,broker2:9092,broker3:9092","kafka_topic" = "my_topic","kafka_partitions" = "0,1,2,3","kafka_offsets" = "101,0,0,200");
Stream Load might be executed out of order in Doris, it is necessary to use MERGE along with load sequence when importing if it is not Stream Load. You can refer to Sequence Column in the documentation for the specific syntax.DELETE ON can only be used together with MERGE.mysql> SET show_hidden_columns=true;Query OK, 0 rows affected (0.00 sec)mysql> DESC test;+-----------------------+--------------+------+-------+---------+---------+| Field | Type | Null | Key | Default | Extra |+-----------------------+--------------+------+-------+---------+---------+| name | VARCHAR(100) | No | true | NULL | || gender | VARCHAR(10) | Yes | false | NULL | REPLACE || age | INT | Yes | false | NULL | REPLACE || __DORIS_DELETE_SIGN__ | TINYINT | No | false | 0 | REPLACE |+-----------------------+--------------+------+-------+---------+---------+4 rows in set (0.00 sec)
curl --location-trusted -u root: -H "column_separator:," -H "columns: siteid, citycode, username, pv" -H "merge_type: APPEND" -T ~/table1_data http://127.0.0.1:8130/api/test/table1/_stream_load
curl --location-trusted -u root: -H "column_separator:," -H "columns: siteid, citycode, username, pv" -T ~/table1_data http://127.0.0.1:8130/api/test/table1/_stream_load
curl --location-trusted -u root: -H "column_separator:," -H "columns: siteid, citycode, username, pv" -H "merge_type: DELETE" -T ~/table1_data http://127.0.0.1:8130/api/test/table1/_stream_load
+--------+----------+----------+------+| siteid | citycode | username | pv |+--------+----------+----------+------+| 3 | 2 | tom | 2 || 4 | 3 | bush | 3 || 5 | 3 | helen | 3 |+--------+----------+----------+------+
3,2,tom,0
+--------+----------+----------+------+| siteid | citycode | username | pv |+--------+----------+----------+------+| 4 | 3 | bush | 3 || 5 | 3 | helen | 3 |+--------+----------+----------+------+
site_id=1.curl --location-trusted -u root: -H "column_separator:," -H "columns: siteid, citycode, username, pv" -H "merge_type: MERGE" -H "delete: siteid=1" -T ~/table1_data http://127.0.0.1:8130/api/test/table1/_stream_load
+--------+----------+----------+------+| siteid | citycode | username | pv |+--------+----------+----------+------+| 4 | 3 | bush | 3 || 5 | 3 | helen | 3 || 1 | 1 | jim | 2 |+--------+----------+----------+------+
2,1,grace,23,2,tom,21,1,jim,2
+--------+----------+----------+------+| siteid | citycode | username | pv |+--------+----------+----------+------+| 4 | 3 | bush | 3 || 2 | 1 | grace | 2 || 3 | 2 | tom | 2 || 5 | 3 | helen | 3 |+--------+----------+----------+------+
Query OK indicates success.mysql> delete from test_tbl PARTITION p1 where k1 = 1;Query OK, 0 rows affected (0.04 sec){'label':'delete_e7830c72-eb14-4cb9-bbb6-eebd4511d251', 'status':'VISIBLE', 'txnId':'4005'}
mysql> delete from test_tbl PARTITION p1 where k1 = 1;Query OK, 0 rows affected (0.04 sec){'label':'delete_e7830c72-eb14-4cb9-bbb6-eebd4511d251', 'status':'COMMITTED', 'txnId':'4005', 'err':'delete job is committed but may be taking effect later' }
affected rows: Indicates the number of rows affected by this deletion. Since Doris's deletion is currently logical, this value is always 0.
label: An automatically generated label, which is the identifier of the import job. Each import job has a label that is unique within a single database.
status: Indicates whether the data deletion is visible. If visible, it displays VISIBLE; if not, it shows COMMITTED.
txnId: Transaction ID corresponding to this Delete job.
err: This field displays detailed information about this deletion.mysql> delete from test_tbl partition p1 where k1 > 80;ERROR 1064 (HY000): errCode = 2, detailMessage = {Error reason}
(tablet=replica).mysql> delete from test_tbl partition p1 where k1 > 80;ERROR 1064 (HY000): errCode = 2, detailMessage = failed to delete replicas from job: 4005, Unfinished replicas:10000=60000, 10001=60000, 10002=60000
ERROR 1064 (HY000), it means the deletion fails.Query OK, it means the deletion is successful.status is COMMITTED, data is still not visible and you may want to wait a while before using the show delete command to view the result.status is VISIBLE, it means the data deletion is successful.tablet_delete_timeout_second
The timeout of the delete operation itself can change according to the number of tablets under a given partition, and this configuration setting contributes to the average timeout time of one tablet. The default value is 2.
If the specified partition contains 5 tablets for this deletion, the timeout for deletion is 10 seconds. Since this is lower than the minimum timeout of 30 seconds, the final timeout is 30 seconds.load_straggler_wait_second
If the user anticipates a large volume of data and the upper limit of 5 minutes is not sufficient, they can adjust the timeout limit using this configuration. The default value is 300.TIMEOUT = MIN(load_straggler_wait_second, MAX(30, tablet_delete_timeout_second * tablet_num))
query_timeout
Since delete is a SQL command, the deletion statement would be session limited and the timeout would be affected by the query_timeout value in the session. We can increase the timeout by using the SET query_timeout = xxx command. The time unit is in seconds.max_allowed_in_element_num_of_delete
If users require more elements while using an IN predicate, this setting can be used to adjust the maximum number of allowable elements. The default value is 1024.SHOW DELETE [FROM db_name]
mysql> show delete from test_db;+-----------+---------------+---------------------+-----------------+----------+| TableName | PartitionName | CreateTime | DeleteCondition | State |+-----------+---------------+---------------------+-----------------+----------+| empty_tbl | p3 | 2020-04-15 23:09:35 | k1 EQ "1" | FINISHED || test_tbl | p4 | 2020-04-15 23:09:53 | k1 GT "80" | FINISHED |+-----------+---------------+---------------------+-----------------+----------+2 rows in set (0.00 sec)
label. For relevant concepts, view the INSERT INTO document.HELP DELETE in the command line of the Mysql client to access more help information.フィードバック