logicspike/docs

Chat Engine

Memory Management Specification

Last Updated: 2026-04-07 Status: Active (v2 — memory delegated to AI Brain + Contact Intelligence)


1. Architecture Change: Chat Engine No Longer Owns Memory

In v1, the Chat Engine owned its own tri-layer memory (pgvector, Redis, PostgreSQL). In v2, memory is fully delegated:

Memory Type Service How Chat Engine Uses It
Business knowledge (RAG) AI Brain (ai_knowledge_*) Brain searches during LLM orchestration
Per-tenant business memory AI Brain (ai_memory) Brain injects into system prompt
Per-contact memory Contact Intelligence (ci_contact_memory) Chat Engine calls GET /context/{id}
Per-contact mood/state Contact Intelligence (ci_contact_state) Returned in context for tone adaptation
Per-contact entities Contact Intelligence (ci_contact_entities) Returned in context for personalization
Session history Chat Engine (own DB + KV cache) Only recent messages for thread display

2. What Chat Engine Still Manages

2.1 Session History (Short-Term)

Chat Engine maintains a sliding window of recent messages per session:

  • Hot cache: CF KV with 24h TTL

    • Key: chat:{tenantId}:{sessionId}:history
    • Value: JSON array of last 20 messages (role, content, timestamp)
    • Written on every message, trimmed to 20
  • Cold storage: PostgreSQL chat_messages table

    • Permanent append-only ledger
    • Used for Unified Inbox display, analytics, GDPR export

2.2 Session History Assembly

When AI Brain needs conversation history:

// Chat Engine builds history from KV cache or DB fallback
const history = await getSessionHistory(sessionId)
 
// Send to Brain with contact context
const response = await brain.chat({
  message: userMessage,
  conversation_history: history,
  contact_context: ciContext.context_text,  // from Contact Intelligence
  personality: tenantPersonality,           // from AI Brain personality config
})

3. Contact Intelligence Context Flow

3.1 Before AI Response (Fetch Context)

GET /context/{contact_id}?query={user_message}

Returns:

{
  "contact": { "displayName": "Arjun", "channel": "whatsapp" },
  "state": {
    "mood": "frustrated",
    "moodConfidence": 0.9,
    "energy": "high",
    "relationshipStage": "building",
    "activeStreak": 5,
    "churnRisk": 0.15
  },
  "memories": [
    { "memoryType": "fact", "content": "Runs a bakery in Mumbai called Sweet Dreams" },
    { "memoryType": "preference", "content": "Prefers short blog posts" }
  ],
  "entities": [
    { "entityType": "pet", "displayName": "Bruno", "metadata": { "species": "dog" } },
    { "entityType": "person", "displayName": "Priya", "metadata": { "relation": "wife" } }
  ],
  "context_text": "Contact: Arjun (whatsapp)\nMood: frustrated...\nMemories:\n- ...",
  "memory_budget": 500
}

3.2 After AI Response (Ingest Interaction)

POST /ingest { contact_id, message, role: "user", conversation_id }

Contact Intelligence handles (async, returns 202):

  • Mood classification (LLM haiku-class)
  • Memory extraction (rules-based, < 10ms)
  • Entity extraction (rules-based, < 5ms)
  • Embedding generation (fire-and-forget)
  • Relationship stage progression
  • Churn risk recalculation
  • Streak tracking

3.3 Mood-Based Tone Adaptation

The context_text includes adaptation hints injected into AI Brain's system prompt:

Mood Adaptation
frustrated Be patient, acknowledge difficulty, offer concrete solutions. No cheerful emojis.
sad Be empathetic and supportive. Don't be overly cheerful.
anxious Be calm and reassuring. Break things into small steps.
excited Match their energy. Be enthusiastic.
confused Explain clearly, use examples, ask clarifying questions.

3.4 Memory Budget by Relationship Stage

Stage Token Budget Context Density
new 0 tokens No memories injected (first interaction)
building ~500 tokens Basic facts only
established ~1200 tokens Facts + preferences + recent episodes
deep ~2000 tokens Full context, entity references
fading ~800 tokens Key facts to re-engage
dormant ~200 tokens Minimal, just name + last topic

4. Knowledge Base (Handled by AI Brain)

Chat Engine does NOT manage knowledge. AI Brain handles:

  • Upload & ingestion: POST /brain/knowledge → chunk → embed → pgvector
  • Semantic search: During LLM orchestration, Brain searches ai_knowledge_chunks automatically
  • RAG injection: Brain injects top-K chunks into system prompt before LLM call

Chat Engine's role: pass the user message to Brain. Brain handles the rest.


5. Data Flow Summary

Customer message arrives

  ├─ Chat Engine: persist to chat_messages + KV cache

  ├─ Contact Intelligence: GET /context → memories, mood, entities

  ├─ AI Brain: POST /chat → orchestration pipeline:
  │     ├─ Router (intent classification)
  │     ├─ Memory retrieval (per-tenant from ai_memory)
  │     ├─ Knowledge search (RAG from ai_knowledge_chunks)
  │     ├─ Specialist prompt (with contact context + personality)
  │     ├─ LLM streaming response
  │     └─ Tool execution if needed

  ├─ Chat Engine: deliver response to channel

  └─ Contact Intelligence: POST /ingest (async)
        ├─ Mood classification
        ├─ Memory extraction
        ├─ Entity extraction
        └─ Relationship/churn update
Chat Engine