ClickHouse SQL Syntax

Last updated: 2020-09-02 11:23:21

    Data Type

    ClickHouse supports multiple data types such as integer, floating point, character, date, enumeration, and array.

    Type list

    Type Name Type Identifier Value Range or Description
    Integer 1-byte integer Int8 -128–127
    2-byte integer Int16 -32768–32767
    4-byte integer Int32 -2147483648–2147483647
    8-byte integer Int64 -9223372036854775808–9223372036854775807
    1-byte unsigned integer UInt8 0–255
    2-byte unsigned integer UInt16 0–65535
    4-byte unsigned integer UInt32 0–4294967295
    8-byte unsigned integer UInt64 0–18446744073709551615
    Floating point Single-precision floating point Float32 6–7 significant digits
    Double-precision floating point Float64 15–16 significant digits
    Custom-precision floating point Decimal32(S) 1–9 significant digits (specified by `S`)
    Decimal64(S) 10–18 significant digits (specified by `S`)
    Decimal128(S) 19–38 significant digits (specified by `S`)
    Character Varchar String The string length is unlimited
    Char FixedString(N) The string length is fixed
    UUID UUID The `UUID` is generated by the built-in function `generateUUIDv4`
    Time Date Date It stores the year, month, and day in the format of `yyyy-MM-dd`
    Timestamp (second-level) DateTime(timezone) Unix timestamp that is accurate down to the second
    Timestamp (custom precision) DateTime(precision, timezone) You can specify the time precision
    Enumeration 1-byte enumeration Enum8 256 values (-128–127) are provided
    2-byte enumeration Enum16 65,536 values (-32768–32767) are provided
    Array Array Array(T) It indicates an array consisting of data in `T` type. You are not recommended to use nested arrays
    • You can use UInt8 to store boolean values and limit the value to 0 or 1.
    • For more information on other data types, please see the official documentation.

    Use cases

    Enumeration

    The following sample code is used to store the gender information of users in a site:

    CREATE TABLE user (uid Int16, name String, gender Enum('male'=1, 'female'=2)) ENGINE=Memory;
    
    INSERT INTO user VALUES (1, 'Gary', 'male'), (2, 
    'Jaco', 'female');
    
    # Query data
    SELECT * FROM user;
    
    ┌─uid─┬─name─┬─gender─┐
    │   1 │ Gary │ male   │
    │   2 │ Jaco │ female │
    └─────┴──────┴────────┘
    
    # Use the `cast` function to query the enumerated integers
    SELECT uid, name, CAST(gender, 'Int8') FROM user;
    
    ┌─uid─┬─name─┬─CAST(gender, 'Int8')─┐
    │   1 │ Gary │                    1 │
    │   2 │ Jaco │                    2 │
    └─────┴──────┴──────────────────────┘

    Array

    The following sample code is used to record the IDs of users who log in to the site every day so as to analyze active users.

    CREATE TABLE userloginlog (logindate Date, uids Array(String)) ENGINE=TinyLog;
    
    INSERT INTO userloginlog VALUES ('2020-01-02', ['Gary', 'Jaco']), ('2020-02-03', ['Jaco', 'Sammie']);
    
    # Query result
    SELECT * FROM userloginlog;
    
    ┌──logindate─┬─uids──────────────┐
    │ 2020-01-02 │ ['Gary','Jaco']   │
    │ 2020-02-03 │ ['Jaco','Sammie'] │
    └────────────┴───────────────────┘

    Creating Database/Table

    ClickHouse uses the CREATE statement to create a database or table.

    CREATE DATABASE [IF NOT EXISTS] db_name [ON CLUSTER cluster] [ENGINE = engine(...)]
    
    CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
    (
        name1 [type1] [DEFAULT|MATERIALIZED|ALIAS expr1] [compression_codec] [TTL expr1],
        name2 [type2] [DEFAULT|MATERIALIZED|ALIAS expr2] [compression_codec] [TTL expr2],
        ...
    ) ENGINE = engine

    Databases and tables can be created on local disks or in a distributed manner. Distributed creation can be implemented in the following two methods:

    • Run the CREATE statement on all servers where clickhouse-server resides.
    • Run the ON CLUSTER clause to create a database/table with ZooKeeper.

    If you use clickhouse-client to query a local table of server B on server A, the error "Table xxx doesn't exist.." will be reported. If you want that all servers in the cluster can query a table, you are recommended to use distributed tables.

    For more information, please see CREATE Queries.

    Query

    ClickHouse uses the SELECT statement to query data.

    SELECT [DISTINCT] expr_list
    [FROM [db.]table | (subquery) | table_function] [FINAL]
    [SAMPLE sample_coeff]
    [GLOBAL] [ANY|ALL] [INNER|LEFT|RIGHT|FULL|CROSS] [OUTER] JOIN (subquery)|table USING columns_list
    [PREWHERE expr]
    [WHERE expr]
    [GROUP BY expr_list] [WITH TOTALS]
    [HAVING expr]
    [ORDER BY expr_list]
    [LIMIT [offset_value, ]n BY columns]
    [LIMIT [n, ]m]
    [UNION ALL ...]
    [INTO OUTFILE filename]
    [FORMAT format]

    For more information, please see SELECT Queries Syntax.

    Batch Write

    ClickHouse uses the INSERT INTO statement to write data.

    INSERT INTO [db.]table [(c1, c2, c3)] VALUES (v11, v12, v13), (v21, v22, v23), ...
    
    INSERT INTO [db.]table [(c1, c2, c3)] SELECT ...

    For more information, please see INSERT.

    Data Deletion

    ClickHouse uses the DROP or TRUNCATE statement to delete data.

    DROP deletes metadata and data, while TRUNCATE deletes only data.

    DROP DATABASE [IF EXISTS] db [ON CLUSTER cluster]
    DROP [TEMPORARY] TABLE [IF EXISTS] [db.]name [ON CLUSTER cluster]
    
    TRUNCATE TABLE [IF EXISTS] [db.]name [ON CLUSTER cluster]

    Table Structure Modification

    ClickHouse uses the ALTER statement to modify the table structure.

    # Column operations on table
    ALTER TABLE [db].name [ON CLUSTER cluster] ADD COLUMN [IF NOT EXISTS] name [type] [default_expr] [codec] [AFTER name_after]
    ALTER TABLE [db].name [ON CLUSTER cluster] DROP COLUMN [IF EXISTS] name
    ALTER TABLE [db].name [ON CLUSTER cluster] CLEAR COLUMN [IF EXISTS] name IN PARTITION partition_name
    ALTER TABLE [db].name [ON CLUSTER cluster] COMMENT COLUMN [IF EXISTS] name 'comment'
    ALTER TABLE [db].name [ON CLUSTER cluster] MODIFY COLUMN [IF EXISTS] name [type] [default_expr] [TTL]
    
    # Partition operations on table
    ALTER TABLE table_name DETACH PARTITION partition_expr
    ALTER TABLE table_name DROP PARTITION partition_expr
    ALTER TABLE table_name CLEAR INDEX index_name IN PARTITION partition_expr
    
    # Attribute operations on table
    ALTER TABLE table-name MODIFY TTL ttl-expression

    For more information, please see ALTER.

    Information Viewing

    • SHOW statement
      It is used to display information such as databases, processing lists, tables, and dictionaries.
      SHOW DATABASES [INTO OUTFILE filename] [FORMAT format]
      SHOW PROCESSLIST [INTO OUTFILE filename] [FORMAT format]
      SHOW [TEMPORARY] TABLES [{FROM | IN} <db>] [LIKE '<pattern>' | WHERE expr] [LIMIT <N>] [INTO OUTFILE <filename>] [FORMAT <format>]
      SHOW DICTIONARIES [FROM <db>] [LIKE '<pattern>'] [LIMIT <N>] [INTO OUTFILE <filename>] [FORMAT <format>]
      For more information, please see SHOW Queries.
    • DESCRIBE statement
      It is used to view table metadata.
      DESC|DESCRIBE TABLE [db.]table [INTO OUTFILE filename] [FORMAT format]

    Functions

    There are two types of ClickHouse functions: regular functions and aggregate functions. The difference is that a regular function can generate the result for each row, while an aggregate function needs a set of data to generate the result.

    Regular functions

    Arithmetic functions

    In this type of functions, all fields in the table engage in the arithmetic calculation.

    Function Name Purpose Use Case
    plus(a, b), a + b Calculates the sum of two fields plus(table.field1, table.field2)
    minus(a, b), a - b Calculates the difference between two fields minus(table.field1, table.field2)
    multiply(a, b), a * b Calculates the product of two fields multiply(table.field1, table.field2)
    divide(a, b), a / b Calculates the quotient of two fields divide(table.field1, table.field2)
    modulo(a, b), a % b Calculates the remainder between two fields modulo(table.field1, table.field2)
    abs(a) Calculates absolute value abs(table.field1)
    negate(a) Calculates opposite negate(table.field1)

    Comparison functions

    Function Name Purpose Use Case
    =, == Determines whether the values are the same table.field1 = value
    !=, <> Determines whether the values are different table.field1 != value
    > Determines whether the former value is greater than the latter value table.field1 > value
    >= Determines whether the former value is greater than or equal to the latter value table.field1 >= value
    < Determines whether the former value is smaller than the latter value table.field1 < value
    <= Determines whether the former value is smaller than or equal to the latter value table.field1 <= value

    Logical operation functions

    Function Name Purpose Use Case
    AND Returns result if both two conditions are met -
    OR Returns result if either condition is met -
    NOT Returns result if none conditions are met -

    Type conversion functions

    Overflow may occur when you use a type conversion function. The data types of overflowed values are the same as those in C.

    Function Name Purpose Use Case
    toInt(8|16|32|64) Converts String value to Int value The result of toInt8('128') is -127
    toUInt(8|16|32|64) Converts String value to UInt value The result of toUInt8('128') is 128
    toInt(8|16|32|64)OrZero Converts String value to Int value and returns 0 if failed The result of toInt8OrZero('a') is 0
    toUInt(8|16|32|64)OrZero Converts String value to UInt value and returns 0 if failed The result of toUInt8OrZero('a') is 0
    toInt(8|16|32|64)OrNull Converts String value to Int value and returns NULL if failed The result of toInt8OrNull('a') is NULL
    toUInt(8|16|32|64)OrNull Converts String value to UInt value and returns NULL if failed The result of toUInt8OrNull('a') is NULL

    Functions similar to those above are also provided for the floating point and date types.

    For more information, please see Type Conversion Functions.

    Date functions

    For more information, please see Functions for Working with Dates and Times.

    String functions

    For more information, please see Functions for Working with Strings.

    UUID

    For more information, please see Functions for Working with UUID.

    JSON processing functions

    For more information, please see Functions for Working with JSON.

    Aggregate functions

    Function Name Purpose Use Case
    count Counts the number of rows or non-NULL values count(expr), COUNT(DISTINCT expr), count(), count(*)
    any(x) Returns the first encountered value. The result is indeterminate any(column)
    anyHeavy(x) Returns a frequently occurring value using the heavy hitters algorithm. The result is generally indeterminate anyHeavy(column)
    anyLast(x) Returns the last encountered value. The result is indeterminate anyLast(column)
    groupBitAnd Applies bitwise AND groupBitAnd(expr)
    groupBitOr Applies bitwise OR groupBitOr(expr)
    groupBitXor Applies bitwise XOR groupBitXor(expr)
    groupBitmap Returns cardinality groupBitmap(expr)
    min(x) Calculates the minimum min(column)
    max(x) Calculates the maximum max(x)
    argMin(arg, val) Returns the arg value for a minimal val value argMin(c1, c2)
    argMax(arg, val) Returns the arg value for a maximum val value argMax(c1, c2)
    sum(x) Calculates the sum sum(x)
    sumWithOverflow(x) Calculates the sum. If the sum exceeds the maximum value for this data type, the function will return an error sumWithOverflow(x)
    sumMap(key, value) Sums the value array of the same key and returns the tuples of value and key arrays: keys in sorted order, and value ​​sum of corresponding keys -
    skewPop Calculates skewness skewPop(expr)
    skewSamp Calculates sample skewness skewSamp(expr)
    kurtPop Calculates kurtosis kurtPop(expr)
    kurtSamp Calculates sample kurtosis kurtSamp(expr)
    timeSeriesGroupSum(uid, timestamp, value) Sums the timestamps in time-series grouped by uid. It uses linear interpolation to add missing sample timestamps before summing the values -
    timeSeriesGroupRateSum(uid, ts, val) Calculates the rate of time-series grouped by uid and then sum rates together -
    avg(x) Calculates the average -
    uniq Calculates the approximate number of different values uniq(x[, ...])
    uniqCombined Calculates the approximate number of different values. Compared with uniq, it consumes less memory and is more accurate but has slightly lower performance uniqCombined(HLL_precision)(x[, ...]), uniqCombined(x[, ...])
    uniqCombined64 Functions in the same way as uniqCombined but uses 64-bit values to reduce the probability of result value overflow -
    uniqHLL12 Calculates the approximate number of different values. It is not recommended. Please use uniq and uniqCombined instead -
    uniqExact Calculates the exact number of different values uniqExact(x[, ...])
    groupArray(x), groupArray(max_size)(x) Returns an array of x values. The array size can be specified by max_size -
    groupArrayInsertAt(value, position) Inserts value into array at specified position -
    groupArrayMovingSum - -
    groupArrayMovingAvg - -
    groupUniqArray(x), groupUniqArray(max_size)(x) - -
    quantile - -
    quantileDeterministic - -
    quantileExact - -
    quantileExactWeighted - -
    quantileTiming - -
    quantileTimingWeighted - -
    quantileTDigest - -
    quantileTDigestWeighted - -
    median - -
    quantiles(level1, level2, …)(x) - -
    varSamp(x) - -
    varPop(x) - -
    stddevSamp(x) - -
    stddevPop(x) - -
    topK(N)(x) - -
    topKWeighted - -
    covarSamp(x, y) - -
    covarPop(x, y) - -
    corr(x, y) - -
    categoricalInformationValue - -
    simpleLinearRegression - -
    stochasticLinearRegression - -
    stochasticLogisticRegression - -
    groupBitmapAnd - -
    groupBitmapOr - -
    groupBitmapXor - -

    Dictionary

    A dictionary is a mapping between a key and attributes and can be used as a function for query, which is simpler and more efficient than the method of combining referencing tables with a JOIN clause.

    There are two types of data dictionaries, namely, internal and external dictionaries.

    Internal dictionary

    ClickHouse supports one type of internal dictionaries, i.e., geobase. For more information on the supported functions, please see Functions for Working with Yandex.Metrica Dictionaries.

    External dictionary

    ClickHouse allows you to add external dictionaries from multiple data sources. For more information on the supported data sources, please see Sources of External Dictionaries.

    Was this page helpful?

    Was this page helpful?

    • Not at all
    • Not very helpful
    • Somewhat helpful
    • Very helpful
    • Extremely helpful
    Send Feedback
    Help