tencent cloud

TencentDB for PostgreSQL

DokumentasiTencentDB for PostgreSQLAI CapabilitiesPractical TutorialBuilding a GraphRAG Knowledge Graph Application Based on Apache AGE

Building a GraphRAG Knowledge Graph Application Based on Apache AGE

Unduh
Mode fokus
Ukuran font
Terakhir diperbarui: 2026-06-08 10:32:14

Overview

GraphRAG is an advanced search paradigm that combines knowledge graphs with RAG. It constructs a knowledge graph through entity extraction, stores it in Apache AGE, and integrates vector search using pgvector. This approach enables more accurate multi-hop reasoning capabilities compared to pure vector-based RAG.

GraphRAG vs. Traditional RAG

Level
Traditional RAG
GraphRAG
Search Methods
Pure vector similarity
Graph traversal + vector hybrid
Multi-hop Reasoning
Not supported
Reasoning via graph relationships
Entity Relationships
Implicit in text
Explicitly stored as a graph schema
Global Comprehension
Limited to a single chunk
Obtains a global perspective via graphs
Applicable Scenario
Simple Q&A
Complex reasoning, relationship querying

Architecture Design

┌──────────────────────────────────────────────────────┐
│ GraphRAG System │
├──────────────────────────────────────────────────────┤
│ │
│ Document Input → Entity Extraction → Graph Construction → Index Creation │
│ │
User Query → Entity Recognition → Graph Search + Vector Search → Fusion Generation │
│ │
├──────────────────────────────────────────────────────┤
│ Storage Layer │
│ ┌──────────────┐ ┌──────────────┐ ┌───────────┐ │
│ │ Apache AGE │ │ pgvector │ │ Relational Table │ │
│ │ (Knowledge Graph) │ │ (Document Vector) │ │ (Metadata) │ │
│ └──────────────┘ └──────────────┘ └───────────┘ │
│ │
│ AI Layer │
│ ┌─────────────────────────────────────────────┐ │
│ │ tencentdb_ai (Entity Extraction / Relation Recognition / Generation) │ │
│ └─────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────┘

Database Preparation

-- Create the extension
CREATE EXTENSION IF NOT EXISTS vector;
CREATE EXTENSION IF NOT EXISTS age;

-- Set search_path
SET search_path = ag_catalog, "$user", public;

-- Create a knowledge graph
SELECT create_graph('knowledge_graph');

-- Document storage table
CREATE TABLE graphrag.documents (
id BIGSERIAL PRIMARY KEY,
title TEXT,
content TEXT NOT NULL,
embedding vector(1024),
metadata JSONB DEFAULT '{}',
created_at TIMESTAMPTZ DEFAULT NOW()
);

CREATE INDEX ON graphrag.documents
USING hnsw (embedding vector_cosine_ops);

-- Entity table (auxiliary index)
CREATE TABLE graphrag.entities (
id BIGSERIAL PRIMARY KEY,
name TEXT NOT NULL,
entity_type TEXT NOT NULL,
description TEXT,
embedding vector(1024),
source_doc_id BIGINT REFERENCES graphrag.documents(id),
UNIQUE(name, entity_type)
);

CREATE INDEX ON graphrag.entities
USING hnsw (embedding vector_cosine_ops);

Core Process

Note:
The vector dimension must match the output dimension of the large model you are using. Common model dimensions: OpenAI text-embedding-3-small (1536 dimensions), BGE-M3 (1024 dimensions), ChatGLM Embedding (1024 dimensions).

Entity Extraction

-- Use a large model to extract entities and relations from documents
CREATE OR REPLACE FUNCTION graphrag.extract_entities(doc_content TEXT)
RETURNS JSONB AS $$
DECLARE
result TEXT;
BEGIN
SELECT tencentdb_ai.chat_completions(
'<your-llm-model>',
'Extract entities and relations from the following text and return them in JSON format.' ||
'Format: {"entities": [{"name": "entity name", "type": "type", "description": "description"}], ' ||
"relations": [{"source": "source entity", "target": "target entity", "relation": "relation type", "description": "description"}]}' ||
E'\\n\\nText: ' || doc_content
) INTO result;

RETURN result::JSONB;
END;
$$ LANGUAGE plpgsql;

Knowledge Graph Construction (Including Security Verification)

The entity.type / relation.relation from the LLM are used as Cypher Tags in queries. You must validate them against an allowlist to prevent arbitrary strings from entering Cypher:
-- Allowlist for permitted entity types and relation types
CREATE TABLE IF NOT EXISTS graphrag.allowed_entity_types (type TEXT PRIMARY KEY);
CREATE TABLE IF NOT EXISTS graphrag.allowed_relation_types (type TEXT PRIMARY KEY);

INSERT INTO graphrag.allowed_entity_types(type) VALUES
('Person'), ('Company'), ('Product'), ('Technology'), ('Concept')
ON CONFLICT DO NOTHING;

INSERT INTO graphrag.allowed_relation_types(type) VALUES
('WORKS_AT'), ('USES'), ('HAS_FEATURE'), ('RELATED_TO'), ('PART_OF')
ON CONFLICT DO NOTHING;

-- Store the extracted entities in the graph
CREATE OR REPLACE FUNCTION graphrag.build_graph(doc_id BIGINT)
RETURNS VOID AS $$
DECLARE
doc_content TEXT;
extracted JSONB;
entity JSONB;
relation JSONB;
e_type TEXT;
r_type TEXT;
BEGIN
SELECT content INTO doc_content FROM graphrag.documents WHERE id = doc_id;
extracted := graphrag.extract_entities(doc_content);

-- Create entity nodes (with allowlist validation for types)
FOR entity IN SELECT * FROM jsonb_array_elements(extracted->'entities')
LOOP
e_type := entity->>'type';
IF NOT EXISTS (SELECT 1 FROM graphrag.allowed_entity_types WHERE type = e_type) THEN
RAISE NOTICE 'Skip entity with disallowed type: %', e_type;
CONTINUE;
END IF;

PERFORM * FROM cypher('knowledge_graph', format($$
MERGE (n:%I {name: %L})
SET n.description = %L, n.doc_id = %s
$$, e_type, entity->>'name', entity->>'description', doc_id))
AS (v agtype);
END LOOP;

-- Create relations (with allowlist validation for relation types)
FOR relation IN SELECT * FROM jsonb_array_elements(extracted->'relations')
LOOP
r_type := relation->>'relation';
IF NOT EXISTS (SELECT 1 FROM graphrag.allowed_relation_types WHERE type = r_type) THEN
RAISE NOTICE 'Skip relation with disallowed type: %', r_type;
CONTINUE;
END IF;

PERFORM * FROM cypher('knowledge_graph', format($$
MATCH (a {name: %L}), (b {name: %L})
MERGE (a)-[r:%I]->(b)
SET r.description = %L
$$, relation->>'source', relation->>'target',
r_type, relation->>'description'))
AS (r agtype);
END LOOP;
END;
$$ LANGUAGE plpgsql;
Key Points:
Node Tags / Relation Types: Use dual validation with %I safe identifiers and an allowlist.
Attribute values: Use %L for safe escaping.
If it fails the allowlist check: skip it directly, log the event, and do not allow any string to enter the Cypher statement.

GraphRAG Search

-- Hybrid Search: Graph Traversal + Vector Search
CREATE OR REPLACE FUNCTION graphrag.hybrid_retrieve(query TEXT, top_k INT DEFAULT 5)
RETURNS TABLE(source TEXT, content TEXT, score FLOAT) AS $$
BEGIN
-- Vector Search Documentation
RETURN QUERY
SELECT 'vector'::TEXT, d.content,
(1 - (d.embedding <=> tencentdb_ai.get_embedding('bge-m3', query)))::FLOAT
FROM graphrag.documents d
ORDER BY d.embedding <=> tencentdb_ai.get_embedding('bge-m3', query)
LIMIT top_k;

-- Graph Search for Relevant Entities
RETURN QUERY
SELECT 'graph'::TEXT,
(a->>'name' || ' --[' || r->>'relation' || ']--> ' || b->>'name')::TEXT,
0.8::FLOAT
FROM cypher('knowledge_graph', format($$
MATCH (a)-[r]->(b)
WHERE a.name CONTAINS %L OR b.name CONTAINS %L
RETURN properties(a) as a, properties(r) as r, properties(b) as b
LIMIT %s
$$, query, query, top_k)) AS (a agtype, r agtype, b agtype);
END;
$$ LANGUAGE plpgsql;

Generating Answers

-- GraphRAG Q&A
CREATE OR REPLACE FUNCTION graphrag.answer(question TEXT)
RETURNS TEXT AS $$
DECLARE
context TEXT;
answer TEXT;
BEGIN
SELECT string_agg('[' || source || '] ' || content, E'\\n')
INTO context
FROM graphrag.hybrid_retrieve(question, 5);

SELECT tencentdb_ai.chat_completions(
'<your-llm-model>',
Answer the question based on the following knowledge graph information and document content. Mark knowledge graph information as [graph] and document content as [vector].
E'\\n\\nContext:\\n' || COALESCE(context, 'None') ||
E'\\n\\nQuestion: ' || question
) INTO answer;

RETURN answer;
END;
$$ LANGUAGE plpgsql;

Usage Examples

-- Import Documents and Build Graph
INSERT INTO graphrag.documents (title, content, embedding)
VALUES (
'PostgreSQL and AI',
TencentDB for PostgreSQL supports directly invoking large models within SQL through the tencentdb_ai plugin...
tencentdb_ai.get_embedding('bge-m3', 'Tencent Cloud PostgreSQL supports directly calling large models within SQL through the tencentdb_ai plugin')
);

-- Build Graph
SELECT graphrag.build_graph(1);

-- GraphRAG Q&A
SELECT graphrag.answer('What AI capabilities does PostgreSQL support?');

Best Practices

1. Entity Deduplication: Use MERGE to avoid duplicate entities.
2. Incremental Update: New documents are only built incrementally, not rebuilt in full.
3. Graph Index: Create an index for frequently queried attributes.
4. Community Detection: Document clustering is performed using graph algorithms.
5. Hybrid Recall: Graph context + vector documents complement each other.
6. Allowlist Protection: Entity types / relation types must be validated against the allowlist to prevent arbitrary string injection from LLMs.

References

Bantuan dan Dukungan

Apakah halaman ini membantu?

masukan