Knowledge Base (RAG) - Technical Deep Dive

Overview

Spritz uses Retrieval Augmented Generation (RAG) to enhance AI agent responses with custom knowledge bases. The system uses vector embeddings stored in PostgreSQL with pgvector for semantic search.

Architecture

Components

Content Ingestion: URLs are fetched and parsed
Text Chunking: Content is split into manageable chunks
Embedding Generation: Chunks are converted to vector embeddings using Google's text-embedding-004 model
Vector Storage: Embeddings stored in PostgreSQL with pgvector extension
Semantic Search: Query embeddings are matched against stored chunks using cosine similarity
Context Injection: Relevant chunks are injected into the LLM prompt

Database Schema

`shout_agent_knowledge`

Stores knowledge base URLs and their processing status:

CREATE TABLE shout_agent_knowledge (
    id UUID PRIMARY KEY,
    agent_id UUID REFERENCES shout_agents(id) ON DELETE CASCADE,
    title TEXT NOT NULL,
    url TEXT NOT NULL,
    content_type TEXT DEFAULT 'webpage', -- 'webpage', 'github', 'docs'
    status TEXT DEFAULT 'pending', -- 'pending', 'processing', 'indexed', 'failed'
    error_message TEXT,
    embedding_id TEXT, -- Reference to external embedding service (if used)
    chunk_count INTEGER DEFAULT 0,
    created_at TIMESTAMPTZ DEFAULT NOW(),
    indexed_at TIMESTAMPTZ,
    UNIQUE(agent_id, url)
);

`shout_knowledge_chunks`

Stores the actual content chunks with vector embeddings:

CREATE TABLE shout_knowledge_chunks (
    id UUID PRIMARY KEY,
    agent_id UUID REFERENCES shout_agents(id) ON DELETE CASCADE,
    knowledge_id UUID NOT NULL, -- References shout_agent_knowledge
    content TEXT NOT NULL,
    embedding vector(768), -- pgvector type, 768 dimensions
    metadata JSONB DEFAULT '{}',
    created_at TIMESTAMPTZ DEFAULT NOW()
);

-- Index for vector similarity search
CREATE INDEX ON shout_knowledge_chunks
USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100);

Embedding Model

Spritz uses Google's text-embedding-004 model:

Dimensions: 768
Distance Metric: Cosine similarity
API: Google GenAI Embedding API

Gemini Model Configuration

For chat generation, Spritz uses gemini-2.0-flash from Google's Gemini API. This model provides:

Fast response times (optimized for real-time chat)
High-quality responses suitable for most use cases
Good balance between cost and capability

All agents use this model by default. Model availability may change as Google updates their API.

Generating Embeddings

async function generateQueryEmbedding(query: string): Promise<number[] | null> {
    const result = await ai.models.embedContent({
        model: "text-embedding-004",
        contents: query,
    });

    return result.embeddings?.[0]?.values || null;
}

Vector Search

Similarity Search Function

The database uses a PostgreSQL function for vector similarity search:

CREATE OR REPLACE FUNCTION match_knowledge_chunks(
    p_agent_id UUID,
    p_query_embedding vector(768),
    p_match_count INT DEFAULT 5,
    p_match_threshold FLOAT DEFAULT 0.3
)
RETURNS TABLE (
    id UUID,
    content TEXT,
    similarity FLOAT,
    metadata JSONB
)
LANGUAGE plpgsql
AS $$
BEGIN
    RETURN QUERY
    SELECT
        kc.id,
        kc.content,
        1 - (kc.embedding <=> p_query_embedding) AS similarity,
        kc.metadata
    FROM shout_knowledge_chunks kc
    WHERE
        kc.agent_id = p_agent_id
        AND 1 - (kc.embedding <=> p_query_embedding) > p_match_threshold
    ORDER BY kc.embedding <=> p_query_embedding
    LIMIT p_match_count;
END;
$$;

Query Implementation

async function getRAGContext(
    agentId: string,
    message: string,
): Promise<string | null> {
    // Generate embedding for the query
    const queryEmbedding = await generateQueryEmbedding(message);
    if (!queryEmbedding) return null;

    // Search for relevant chunks
    const { data: chunks, error } = await db.rpc("match_knowledge_chunks", {
        p_agent_id: agentId,
        p_query_embedding: `[${queryEmbedding.join(",")}]`,
        p_match_count: 5,
        p_match_threshold: 0.3, // 30% similarity threshold
    });

    if (error || !chunks?.length) return null;

    // Format context with relevance scores
    const context = chunks
        .map(
            (chunk: { content: string; similarity: number }) =>
                `[Relevance: ${(chunk.similarity * 100).toFixed(0)}%]\n${chunk.content}`,
        )
        .join("\n\n---\n\n");

    return context;
}

Indexing Process

Step 1: URL Submission

POST /api/agents/:id/knowledge
{
    "url": "https://example.com/docs",
    "title": "Example Documentation",
    "content_type": "webpage"
}

Step 2: Content Fetching

The system supports two scraping methods:

Basic Scraping (Default)

Available to all users. Fetches URL content using simple HTML parsing:

HTML pages: Extracts text, removes scripts/styles/navigation
GitHub repos: Fetches markdown files
Documentation sites: Parses structured content

Firecrawl Integration (Official Agents Only)

For official agents, Spritz integrates with Firecrawl for high-quality web scraping:

POST /api/agents/:id/knowledge
{
    "url": "https://docs.example.com",
    "title": "Example Documentation",
    "scrapeMethod": "firecrawl",   // Use Firecrawl instead of basic
    "crawlDepth": 3,               // Follow links up to 3 levels deep
    "excludePatterns": ["/blog/*", "/changelog/*"],  // Skip these paths
    "autoSync": true,              // Enable automatic re-indexing
    "syncIntervalHours": 24        // Re-index every 24 hours
}

Firecrawl Features:

JavaScript rendering for SPAs
Multi-page crawling for documentation sites
Clean markdown output optimized for RAG
Anti-bot bypass capabilities
Up to 50 pages per knowledge source

Official Agents Only

Firecrawl integration is available only for official agents due to API costs. Regular agents use the basic scraping method which works well for most static content.

Step 3: Chunking

Content is split into chunks:

Chunk Size: ~500-1000 tokens
Overlap: ~100 tokens between chunks
Strategy: Semantic chunking when possible

Step 4: Embedding Generation

Each chunk is embedded:

for (const chunk of chunks) {
    const embedding = await generateEmbedding(chunk.content);

    await db.from("shout_knowledge_chunks").insert({
        agent_id: agentId,
        knowledge_id: knowledgeId,
        content: chunk.content,
        embedding: `[${embedding.join(",")}]`,
        metadata: {
            source_url: url,
            chunk_index: index,
            title: title,
        },
    });
}

Step 5: Status Update

await db
    .from("shout_agent_knowledge")
    .update({
        status: "indexed",
        chunk_count: chunks.length,
        indexed_at: new Date().toISOString(),
    })
    .eq("id", knowledgeId);

Context Injection

When a user sends a message, the system:

Generates an embedding for the query
Searches for top 5 most similar chunks (30% threshold)
Formats chunks with relevance scores
Injects into the prompt:

const ragContext = await getRAGContext(agentId, message);
const fullMessage =
    message +
    `\n\nRelevant context from knowledge base:\n${ragContext}\n\n` +
    `Use this context to inform your response when relevant.`;

Performance Considerations

Index Optimization

IVFFlat Index: Used for approximate nearest neighbor search
Lists Parameter: 100 (tuned for ~1M vectors)
Rebuild Threshold: Rebuild when 10x growth

Caching

Query embeddings are not cached (queries vary)
Chunk embeddings are cached in database
Search results cached for 5 minutes per query

Cost Optimization

Batch Embedding: Process multiple chunks in parallel
Selective Indexing: Only index when use_knowledge_base is enabled
Cleanup: Remove old chunks when knowledge is deleted

Auto-Sync for Knowledge Sources

Official agents can enable automatic re-indexing to keep knowledge bases up-to-date with source content changes.

How Auto-Sync Works

Cron Job: Runs every 6 hours (configurable via Vercel Cron)
Eligibility Check: Only processes sources with auto_sync: true on official agents
Interval Check: Respects each source's sync_interval_hours setting
Re-indexing: Fetches fresh content, generates new embeddings, replaces old chunks

Database Schema Updates

ALTER TABLE shout_agent_knowledge ADD COLUMN
    scrape_method TEXT DEFAULT 'basic',        -- 'basic' or 'firecrawl'
    crawl_depth INTEGER DEFAULT 1,             -- Max depth for Firecrawl crawls
    exclude_patterns TEXT[],                   -- URL patterns to skip
    auto_sync BOOLEAN DEFAULT false,           -- Enable automatic re-indexing
    sync_interval_hours INTEGER DEFAULT 24,    -- Hours between syncs
    last_synced_at TIMESTAMPTZ;                -- Last successful sync time

Cron Endpoint

GET /api/cron/sync-knowledge
Authorization: Bearer ${CRON_SECRET}

The cron job:

Queries all knowledge sources with auto_sync: true on official agents
Filters by those needing sync (based on sync_interval_hours and last_synced_at)
Re-indexes each source sequentially with rate limiting
Updates last_synced_at on success

Cost Management

Auto-sync is limited to official agents to manage Firecrawl and embedding API costs. Each sync operation generates new embeddings for all chunks in the knowledge source.

Best Practices

Chunk Size: Keep chunks between 500-1000 tokens for optimal retrieval
Overlap: Use 10-20% overlap to maintain context across chunks
Metadata: Store source URLs and titles for citation
Threshold: Adjust similarity threshold (0.3 default) based on use case
Refresh: Re-index when source content changes (or use auto-sync for official agents)
Exclude Patterns: For documentation sites, exclude changelog, blog, and versioned paths to reduce noise

Troubleshooting

Low Relevance Results

Issue: Chunks not matching queries well
Solution: Lower similarity threshold, increase chunk overlap, improve chunking strategy

Slow Indexing

Issue: Large documents taking too long
Solution: Process in batches, use async processing, optimize chunking

High Costs

Issue: Too many embedding API calls
Solution: Cache embeddings, batch requests, limit chunk size

API Reference

Index Knowledge

POST /api/agents/:id/knowledge/index
{
    "knowledge_id": "uuid"
}

Get Knowledge Base

GET /api/agents/:id/knowledge
// Returns: Array of knowledge items with status

Delete Knowledge

DELETE /api/agents/:id/knowledge/:knowledge_id
// Cascades to delete all associated chunks

Next Steps

AI Architecture — Chat flow, RAG retrieval (match_knowledge_chunks), indexing pipeline, and code examples
MCP Servers & API Tools — External tools for agents
Agents API Reference — Knowledge endpoints

Overview​

Architecture​

Components​

Database Schema​

shout_agent_knowledge​

shout_knowledge_chunks​

Embedding Model​

Generating Embeddings​

Vector Search​

Similarity Search Function​

Query Implementation​

Indexing Process​

Step 1: URL Submission​

Step 2: Content Fetching​

Basic Scraping (Default)​

Firecrawl Integration (Official Agents Only)​

Step 3: Chunking​

Step 4: Embedding Generation​

Step 5: Status Update​

Context Injection​

Performance Considerations​

Index Optimization​

Caching​

Cost Optimization​

Auto-Sync for Knowledge Sources​

How Auto-Sync Works​

Database Schema Updates​

Cron Endpoint​

Best Practices​

Troubleshooting​

Low Relevance Results​

Slow Indexing​

High Costs​

API Reference​

Index Knowledge​

Get Knowledge Base​

Delete Knowledge​

Next Steps​