Memory Management Specification · Chat Engine

Last Updated: 2026-04-07 Status: Active (v2 — memory delegated to AI Brain + Contact Intelligence)

1. Architecture Change: Chat Engine No Longer Owns Memory

In v1, the Chat Engine owned its own tri-layer memory (pgvector, Redis, PostgreSQL). In v2, memory is fully delegated:

Memory Type	Service	How Chat Engine Uses It
Business knowledge (RAG)	AI Brain (`ai_knowledge_*`)	Brain searches during LLM orchestration
Per-tenant business memory	AI Brain (`ai_memory`)	Brain injects into system prompt
Per-contact memory	Contact Intelligence (`ci_contact_memory`)	Chat Engine calls `GET /context/{id}`
Per-contact mood/state	Contact Intelligence (`ci_contact_state`)	Returned in context for tone adaptation
Per-contact entities	Contact Intelligence (`ci_contact_entities`)	Returned in context for personalization
Session history	Chat Engine (own DB + KV cache)	Only recent messages for thread display

2. What Chat Engine Still Manages

2.1 Session History (Short-Term)

Chat Engine maintains a sliding window of recent messages per session:

Hot cache: CF KV with 24h TTL
- Key: chat:{tenantId}:{sessionId}:history
- Value: JSON array of last 20 messages (role, content, timestamp)
- Written on every message, trimmed to 20
Cold storage: PostgreSQL chat_messages table
- Permanent append-only ledger
- Used for Unified Inbox display, analytics, GDPR export

2.2 Session History Assembly

When AI Brain needs conversation history:

// Chat Engine builds history from KV cache or DB fallback
const history = await getSessionHistory(sessionId)
 
// Send to Brain with contact context
const response = await brain.chat({
  message: userMessage,
  conversation_history: history,
  contact_context: ciContext.context_text,  // from Contact Intelligence
  personality: tenantPersonality,           // from AI Brain personality config
})

3. Contact Intelligence Context Flow

3.1 Before AI Response (Fetch Context)

GET /context/{contact_id}?query={user_message}

Returns:

{
  "contact": { "displayName": "Arjun", "channel": "whatsapp" },
  "state": {
    "mood": "frustrated",
    "moodConfidence": 0.9,
    "energy": "high",
    "relationshipStage": "building",
    "activeStreak": 5,
    "churnRisk": 0.15
  },
  "memories": [
    { "memoryType": "fact", "content": "Runs a bakery in Mumbai called Sweet Dreams" },
    { "memoryType": "preference", "content": "Prefers short blog posts" }
  ],
  "entities": [
    { "entityType": "pet", "displayName": "Bruno", "metadata": { "species": "dog" } },
    { "entityType": "person", "displayName": "Priya", "metadata": { "relation": "wife" } }
  ],
  "context_text": "Contact: Arjun (whatsapp)\nMood: frustrated...\nMemories:\n- ...",
  "memory_budget": 500
}

3.2 After AI Response (Ingest Interaction)

POST /ingest { contact_id, message, role: "user", conversation_id }

Contact Intelligence handles (async, returns 202):

Mood classification (LLM haiku-class)
Memory extraction (rules-based, < 10ms)
Entity extraction (rules-based, < 5ms)
Embedding generation (fire-and-forget)
Relationship stage progression
Churn risk recalculation
Streak tracking

3.3 Mood-Based Tone Adaptation

The context_text includes adaptation hints injected into AI Brain's system prompt:

Mood	Adaptation
frustrated	Be patient, acknowledge difficulty, offer concrete solutions. No cheerful emojis.
sad	Be empathetic and supportive. Don't be overly cheerful.
anxious	Be calm and reassuring. Break things into small steps.
excited	Match their energy. Be enthusiastic.
confused	Explain clearly, use examples, ask clarifying questions.

3.4 Memory Budget by Relationship Stage

Stage	Token Budget	Context Density
new	0 tokens	No memories injected (first interaction)
building	~500 tokens	Basic facts only
established	~1200 tokens	Facts + preferences + recent episodes
deep	~2000 tokens	Full context, entity references
fading	~800 tokens	Key facts to re-engage
dormant	~200 tokens	Minimal, just name + last topic

4. Knowledge Base (Handled by AI Brain)

Chat Engine does NOT manage knowledge. AI Brain handles:

Upload & ingestion: POST /brain/knowledge → chunk → embed → pgvector
Semantic search: During LLM orchestration, Brain searches ai_knowledge_chunks automatically
RAG injection: Brain injects top-K chunks into system prompt before LLM call

Chat Engine's role: pass the user message to Brain. Brain handles the rest.

5. Data Flow Summary

Customer message arrives
  │
  ├─ Chat Engine: persist to chat_messages + KV cache
  │
  ├─ Contact Intelligence: GET /context → memories, mood, entities
  │
  ├─ AI Brain: POST /chat → orchestration pipeline:
  │     ├─ Router (intent classification)
  │     ├─ Memory retrieval (per-tenant from ai_memory)
  │     ├─ Knowledge search (RAG from ai_knowledge_chunks)
  │     ├─ Specialist prompt (with contact context + personality)
  │     ├─ LLM streaming response
  │     └─ Tool execution if needed
  │
  ├─ Chat Engine: deliver response to channel
  │
  └─ Contact Intelligence: POST /ingest (async)
        ├─ Mood classification
        ├─ Memory extraction
        ├─ Entity extraction
        └─ Relationship/churn update