Component | Version Requirement |
TencentDB for PostgreSQL | For version 14 and above, upgrade to the latest kernel minor version. For the upgrade procedure, see Upgrade Kernel Minor Version. |
pgvector extension | 0.8.2 or above |
pgvectorscale extension | 0.9.0 or above |
-- Step 1: Install the pgvector extension (skip this step if it is already installed)CREATE EXTENSION IF NOT EXISTS vector;-- Step 2: Install the pgvectorscale extensionCREATE EXTENSION IF NOT EXISTS vectorscale;
SELECT extname, extversion FROM pg_extensionWHERE extname IN ('vector', 'vectorscale');
-- Create a document table containing 128-dimensional vectors.CREATE TABLE documents (id SERIAL PRIMARY KEY,title TEXT NOT NULL,content TEXT,embedding vector(128) -- Adjust the vector dimension according to your actual model.);
-- Cosine distance (most commonly used, suitable for semantic search)CREATE INDEX idx_doc_cosine ON documentsUSING diskann (embedding vector_cosine_ops);-- Euclidean distance (L2)CREATE INDEX idx_doc_l2 ON documentsUSING diskann (embedding vector_l2_ops);-- Inner product distanceCREATE INDEX idx_doc_ip ON documentsUSING diskann (embedding vector_ip_ops);
-- Insert a single data recordINSERT INTO documents (title, content, embedding)VALUES ('TencentDB for PostgreSQL Introduction','PostgreSQL is a powerful, open-source relational database...','[0.1, 0.2, -0.3, ...]'::vector);-- Batch insertion (recommended to use COPY or multi-row INSERT)INSERT INTO documents (title, content, embedding) VALUES('Document 1', 'Content 1', '[0.1, 0.2, ...]'::vector),('Document 2', 'Content 2', '[0.3, 0.4, ...]'::vector);
-- Query the Top 5 most similar documents using cosine distanceSELECT id, title, embedding <=> '[0.1, 0.2, ...]'::vector AS distanceFROM documentsORDER BY embedding <=> '[0.1, 0.2, ...]'::vectorLIMIT 5;-- L2 distance querySELECT id, title, embedding <-> '[0.1, 0.2, ...]'::vector AS distanceFROM documentsORDER BY embedding <-> '[0.1, 0.2, ...]'::vectorLIMIT 5;-- Inner product querySELECT id, title, embedding <#> '[0.1, 0.2, ...]'::vector AS distanceFROM documentsORDER BY embedding <#> '[0.1, 0.2, ...]'::vectorLIMIT 5;
Operator | Distance Type | Scenarios |
<=> | cosine distance | Semantic similarity search (Recommended) |
<-> | Euclidean distance (L2) | Distance calculation in coordinate space |
<#> | negative inner product | Dot product similarity |
Parameter | Default Value | Description |
num_neighbors | 32 | The number of neighbors for each node in the graph. Increasing this value can improve Recall but will increase the index size and build time. |
search_list_size | 100 | The size of the candidate list during a search. Increasing this value can improve Recall but will increase query latency. |
-- High Recall configuration (suitable for scenarios requiring high accuracy)CREATE INDEX idx_high_recall ON documentsUSING diskann (embedding vector_cosine_ops)WITH (num_neighbors = 64, search_list_size = 200);-- High Throughput configuration (suitable for scenarios requiring low latency)CREATE INDEX idx_low_latency ON documentsUSING diskann (embedding vector_cosine_ops)WITH (num_neighbors = 32, search_list_size = 80);-- Large Dataset configuration (millions of vectors or more)CREATE INDEX idx_large_scale ON documentsUSING diskann (embedding vector_cosine_ops)WITH (num_neighbors = 50, search_list_size = 150);
Data Scale | num_neighbors | search_list_size | Expected Recall |
< 100,000 | 32 | 100 | ≥ 95% |
100,000 - 1,000,000 | 50 | 150 | ≥ 90% |
> 1,000,000 | 64 | 200 | ≥ 85% |
Scenario | Recommended Index | Reason |
Dataset size < 1 million, latency-sensitive | HNSW | Fast construction, low query latency |
Dataset size > 1 million | DiskANN | Low memory footprint, disk-friendly |
Memory resources are constrained | DiskANN | Storage footprint is about 60% of HNSW's. |
Frequent data updates | DiskANN | Supports concurrent inserts and mixed read/write operations. |
Requires extremely high Recall (>99%). | HNSW | Slightly better precision |
-- Create a knowledge base tableCREATE TABLE knowledge_base (id SERIAL PRIMARY KEY,doc_title TEXT,chunk_text TEXT,embedding vector(1024), -- Using the BGE-M3 model as an examplemetadata JSONB);-- Create a DiskANN indexCREATE INDEX idx_kb_diskann ON knowledge_baseUSING diskann (embedding vector_cosine_ops)WITH (num_neighbors = 50, search_list_size = 150);-- Search for the 5 document fragments most relevant to the questionSELECT chunk_text, metadata,embedding <=> $1::vector AS distanceFROM knowledge_baseORDER BY embedding <=> $1::vectorLIMIT 5;
-- Product vector tableCREATE TABLE product_embeddings (product_id BIGINT PRIMARY KEY,product_name TEXT,category TEXT,embedding vector(256));-- Similar product search filtered by categorySELECT product_id, product_name,embedding <=> $1::vector AS distanceFROM product_embeddingsWHERE category = 'Electronics'ORDER BY embedding <=> $1::vectorLIMIT 20;
-- Image vector tableCREATE TABLE image_vectors (image_id BIGINT PRIMARY KEY,image_url TEXT,description TEXT,embedding vector(512) -- CLIP model output);CREATE INDEX idx_img_diskann ON image_vectorsUSING diskann (embedding vector_cosine_ops);-- Search for images using an imageSELECT image_id, image_url, description,embedding <=> $1::vector AS distanceFROM image_vectorsORDER BY embedding <=> $1::vectorLIMIT 10;
-- ReindexREINDEX INDEX idx_doc_cosine;-- Or perform concurrent rebuilding (non-blocking to queries)REINDEX INDEX CONCURRENTLY idx_doc_cosine;
-- View index sizeSELECT pg_size_pretty(pg_relation_size('idx_doc_cosine')) AS index_size;-- View index usageSELECT indexrelname, idx_scan, idx_tup_read, idx_tup_fetchFROM pg_stat_user_indexesWHERE indexrelname LIKE '%diskann%';
Distance Type | Operator | Index Operator Class | Applicable Function |
cosine distance | <=> | vector_cosine_ops | cosine_distance() |
Euclidean distance | <-> | vector_l2_ops | l2_distance() |
negative inner product | <#> | vector_ip_ops | inner_product() |
Esta página foi útil?
Você também pode entrar em contato com a Equipe de vendas ou Enviar um tíquete em caso de ajuda.
comentários