Building a GraphRAG Knowledge Graph Application Based on Apache AGE

Unduh
Mode fokus
Ukuran font
Terakhir diperbarui: 2026-06-08 10:32:14
Overview
GraphRAG is an advanced search paradigm that combines knowledge graphs with RAG. It constructs a knowledge graph through entity extraction, stores it in Apache AGE, and integrates vector search using pgvector. This approach enables more accurate multi-hop reasoning capabilities compared to pure vector-based RAG.
GraphRAG vs. Traditional RAG
Level
Traditional RAG
GraphRAG
Search Methods
Pure vector similarity
Graph traversal + vector hybrid
Multi-hop Reasoning
Not supported
Reasoning via graph relationships
Entity Relationships
Implicit in text
Explicitly stored as a graph schema
Global Comprehension
Limited to a single chunk
Obtains a global perspective via graphs
Applicable Scenario
Simple Q&A
Complex reasoning, relationship querying
Architecture Design
┌──────────────────────────────────────────────────────┐
│                GraphRAG System                          │
├──────────────────────────────────────────────────────┤
│                                                      │
│  Document Input → Entity Extraction → Graph Construction → Index Creation           │
│                                                      │
│  User Query → Entity Recognition → Graph Search + Vector Search → Fusion Generation   │
│                                                      │
├──────────────────────────────────────────────────────┤
│  Storage Layer                                              │
│  ┌──────────────┐  ┌──────────────┐  ┌───────────┐   │
│  │ Apache AGE   │  │ pgvector     │  │ Relational Table    │   │
│  │ (Knowledge Graph)   │  │ (Document Vector)   │  │ (Metadata)  │   │
│  └──────────────┘  └──────────────┘  └───────────┘   │
│                                                      │
│  AI Layer                                               │
│  ┌─────────────────────────────────────────────┐     │
│  │ tencentdb_ai (Entity Extraction / Relation Recognition / Generation)  │     │
│  └─────────────────────────────────────────────┘     │
└──────────────────────────────────────────────────────┘
Database Preparation
-- Create the extension
CREATE EXTENSION IF NOT EXISTS vector;
CREATE EXTENSION IF NOT EXISTS age;
﻿
-- Set search_path
SET search_path = ag_catalog, "$user", public;
﻿
-- Create a knowledge graph
SELECT create_graph('knowledge_graph');
﻿
-- Document storage table
CREATE TABLE graphrag.documents (
    id BIGSERIAL PRIMARY KEY,
    title TEXT,
    content TEXT NOT NULL,
    embedding vector(1024),
    metadata JSONB DEFAULT '{}',
    created_at TIMESTAMPTZ DEFAULT NOW()
);
﻿
CREATE INDEX ON graphrag.documents
USING hnsw (embedding vector_cosine_ops);
﻿
-- Entity table (auxiliary index)
CREATE TABLE graphrag.entities (
    id BIGSERIAL PRIMARY KEY,
    name TEXT NOT NULL,
    entity_type TEXT NOT NULL,
    description TEXT,
    embedding vector(1024),
    source_doc_id BIGINT REFERENCES graphrag.documents(id),
    UNIQUE(name, entity_type)
);
﻿
CREATE INDEX ON graphrag.entities
USING hnsw (embedding vector_cosine_ops);
Core Process
Note:
The vector dimension must match the output dimension of the large model you are using. Common model dimensions: OpenAI text-embedding-3-small (1536 dimensions), BGE-M3 (1024 dimensions), ChatGLM Embedding (1024 dimensions).
Entity Extraction
-- Use a large model to extract entities and relations from documents
CREATE OR REPLACE FUNCTION graphrag.extract_entities(doc_content TEXT)
RETURNS JSONB AS $$
DECLARE
    result TEXT;
BEGIN
    SELECT tencentdb_ai.chat_completions(
        '<your-llm-model>',
        'Extract entities and relations from the following text and return them in JSON format.' ||
        'Format: {"entities": [{"name": "entity name", "type": "type", "description": "description"}], ' ||
        "relations": [{"source": "source entity", "target": "target entity", "relation": "relation type", "description": "description"}]}' ||
        E'\\n\\nText: ' || doc_content
    ) INTO result;
﻿
    RETURN result::JSONB;
END;
$$ LANGUAGE plpgsql;
Knowledge Graph Construction (Including Security Verification)
The entity.type / relation.relation from the LLM are used as Cypher Tags in queries. You must validate them against an allowlist to prevent arbitrary strings from entering Cypher:
-- Allowlist for permitted entity types and relation types
CREATE TABLE IF NOT EXISTS graphrag.allowed_entity_types (type TEXT PRIMARY KEY);
CREATE TABLE IF NOT EXISTS graphrag.allowed_relation_types (type TEXT PRIMARY KEY);
﻿
INSERT INTO graphrag.allowed_entity_types(type) VALUES
    ('Person'), ('Company'), ('Product'), ('Technology'), ('Concept')
ON CONFLICT DO NOTHING;
﻿
INSERT INTO graphrag.allowed_relation_types(type) VALUES
    ('WORKS_AT'), ('USES'), ('HAS_FEATURE'), ('RELATED_TO'), ('PART_OF')
ON CONFLICT DO NOTHING;
﻿
-- Store the extracted entities in the graph
CREATE OR REPLACE FUNCTION graphrag.build_graph(doc_id BIGINT)
RETURNS VOID AS $$
DECLARE
    doc_content TEXT;
    extracted JSONB;
    entity JSONB;
    relation JSONB;
    e_type TEXT;
    r_type TEXT;
BEGIN
    SELECT content INTO doc_content FROM graphrag.documents WHERE id = doc_id;
    extracted := graphrag.extract_entities(doc_content);
﻿
    -- Create entity nodes (with allowlist validation for types)
    FOR entity IN SELECT * FROM jsonb_array_elements(extracted->'entities')
    LOOP
        e_type := entity->>'type';
        IF NOT EXISTS (SELECT 1 FROM graphrag.allowed_entity_types WHERE type = e_type) THEN
            RAISE NOTICE 'Skip entity with disallowed type: %', e_type;
            CONTINUE;
        END IF;
﻿
        PERFORM * FROM cypher('knowledge_graph', format($$
            MERGE (n:%I {name: %L})
            SET n.description = %L, n.doc_id = %s
        $$, e_type, entity->>'name', entity->>'description', doc_id))
        AS (v agtype);
    END LOOP;
﻿
    -- Create relations (with allowlist validation for relation types)
    FOR relation IN SELECT * FROM jsonb_array_elements(extracted->'relations')
    LOOP
        r_type := relation->>'relation';
        IF NOT EXISTS (SELECT 1 FROM graphrag.allowed_relation_types WHERE type = r_type) THEN
            RAISE NOTICE 'Skip relation with disallowed type: %', r_type;
            CONTINUE;
        END IF;
﻿
        PERFORM * FROM cypher('knowledge_graph', format($$
            MATCH (a {name: %L}), (b {name: %L})
            MERGE (a)-[r:%I]->(b)
            SET r.description = %L
        $$, relation->>'source', relation->>'target',
            r_type, relation->>'description'))
        AS (r agtype);
    END LOOP;
END;
$$ LANGUAGE plpgsql;
Key Points:
Node Tags / Relation Types: Use dual validation with %I safe identifiers and an allowlist.
Attribute values: Use %L for safe escaping.
If it fails the allowlist check: skip it directly, log the event, and do not allow any string to enter the Cypher statement.
GraphRAG Search
-- Hybrid Search: Graph Traversal + Vector Search
CREATE OR REPLACE FUNCTION graphrag.hybrid_retrieve(query TEXT, top_k INT DEFAULT 5)
RETURNS TABLE(source TEXT, content TEXT, score FLOAT) AS $$
BEGIN
    -- Vector Search Documentation
    RETURN QUERY
    SELECT 'vector'::TEXT, d.content,
           (1 - (d.embedding <=> tencentdb_ai.get_embedding('bge-m3', query)))::FLOAT
    FROM graphrag.documents d
    ORDER BY d.embedding <=> tencentdb_ai.get_embedding('bge-m3', query)
    LIMIT top_k;
﻿
    -- Graph Search for Relevant Entities
    RETURN QUERY
    SELECT 'graph'::TEXT,
           (a->>'name' || ' --[' || r->>'relation' || ']--> ' || b->>'name')::TEXT,
           0.8::FLOAT
    FROM cypher('knowledge_graph', format($$
        MATCH (a)-[r]->(b)
        WHERE a.name CONTAINS %L OR b.name CONTAINS %L
        RETURN properties(a) as a, properties(r) as r, properties(b) as b
        LIMIT %s
    $$, query, query, top_k)) AS (a agtype, r agtype, b agtype);
END;
$$ LANGUAGE plpgsql;
Generating Answers
-- GraphRAG Q&A
CREATE OR REPLACE FUNCTION graphrag.answer(question TEXT)
RETURNS TEXT AS $$
DECLARE
    context TEXT;
    answer TEXT;
BEGIN
    SELECT string_agg('[' || source || '] ' || content, E'\\n')
    INTO context
    FROM graphrag.hybrid_retrieve(question, 5);
﻿
    SELECT tencentdb_ai.chat_completions(
        '<your-llm-model>',
        Answer the question based on the following knowledge graph information and document content. Mark knowledge graph information as [graph] and document content as [vector].
        E'\\n\\nContext:\\n' || COALESCE(context, 'None') ||
        E'\\n\\nQuestion: ' || question
    ) INTO answer;
﻿
    RETURN answer;
END;
$$ LANGUAGE plpgsql;
Usage Examples
-- Import Documents and Build Graph
INSERT INTO graphrag.documents (title, content, embedding)
VALUES (
    'PostgreSQL and AI',
    TencentDB for PostgreSQL supports directly invoking large models within SQL through the tencentdb_ai plugin...
    tencentdb_ai.get_embedding('bge-m3', 'Tencent Cloud PostgreSQL supports directly calling large models within SQL through the tencentdb_ai plugin')
);
﻿
-- Build Graph
SELECT graphrag.build_graph(1);
﻿
-- GraphRAG Q&A
SELECT graphrag.answer('What AI capabilities does PostgreSQL support?');
Best Practices
1. Entity Deduplication: Use MERGE to avoid duplicate entities.
2. Incremental Update: New documents are only built incrementally, not rebuilt in full.
3. Graph Index: Create an index for frequently queried attributes.
4. Community Detection: Document clustering is performed using graph algorithms.
5. Hybrid Recall: Graph context + vector documents complement each other.
6. Allowlist Protection: Entity types / relation types must be validated against the allowlist to prevent arbitrary string injection from LLMs.
References
Apache AGE Graph Database Usage Guide.
pgvector Vector Extension Usage Guide.
tencentdb_ai Plugin Feature Introduction.
Bantuan dan Dukungan

Apakah halaman ini membantu?
Anda juga dapat Menghubungi Penjualan atau Mengirimkan Tiket untuk meminta bantuan.
masukan
Level	Traditional RAG	GraphRAG
Search Methods	Pure vector similarity	Graph traversal + vector hybrid
Multi-hop Reasoning	Not supported	Reasoning via graph relationships
Entity Relationships	Implicit in text	Explicitly stored as a graph schema
Global Comprehension	Limited to a single chunk	Obtains a global perspective via graphs
Applicable Scenario	Simple Q&A	Complex reasoning, relationship querying
tencent cloud

TencentDB for PostgreSQL

Building a GraphRAG Knowledge Graph Application Based on Apache AGE

Overview

GraphRAG vs. Traditional RAG

Architecture Design

Database Preparation

Core Process

Entity Extraction

Knowledge Graph Construction (Including Security Verification)

GraphRAG Search

Generating Answers

Usage Examples

Best Practices

References

Bantuan dan Dukungan