Last Updated: 2026-04-07 Status: Active (v2 — memory delegated to AI Brain + Contact Intelligence)
1. Architecture Change: Chat Engine No Longer Owns Memory
In v1, the Chat Engine owned its own tri-layer memory (pgvector, Redis, PostgreSQL). In v2, memory is fully delegated:
| Memory Type | Service | How Chat Engine Uses It |
|---|---|---|
| Business knowledge (RAG) | AI Brain (ai_knowledge_*) |
Brain searches during LLM orchestration |
| Per-tenant business memory | AI Brain (ai_memory) |
Brain injects into system prompt |
| Per-contact memory | Contact Intelligence (ci_contact_memory) |
Chat Engine calls GET /context/{id} |
| Per-contact mood/state | Contact Intelligence (ci_contact_state) |
Returned in context for tone adaptation |
| Per-contact entities | Contact Intelligence (ci_contact_entities) |
Returned in context for personalization |
| Session history | Chat Engine (own DB + KV cache) | Only recent messages for thread display |
2. What Chat Engine Still Manages
2.1 Session History (Short-Term)
Chat Engine maintains a sliding window of recent messages per session:
-
Hot cache: CF KV with 24h TTL
- Key:
chat:{tenantId}:{sessionId}:history - Value: JSON array of last 20 messages (role, content, timestamp)
- Written on every message, trimmed to 20
- Key:
-
Cold storage: PostgreSQL
chat_messagestable- Permanent append-only ledger
- Used for Unified Inbox display, analytics, GDPR export
2.2 Session History Assembly
When AI Brain needs conversation history:
// Chat Engine builds history from KV cache or DB fallback
const history = await getSessionHistory(sessionId)
// Send to Brain with contact context
const response = await brain.chat({
message: userMessage,
conversation_history: history,
contact_context: ciContext.context_text, // from Contact Intelligence
personality: tenantPersonality, // from AI Brain personality config
})3. Contact Intelligence Context Flow
3.1 Before AI Response (Fetch Context)
GET /context/{contact_id}?query={user_message}Returns:
{
"contact": { "displayName": "Arjun", "channel": "whatsapp" },
"state": {
"mood": "frustrated",
"moodConfidence": 0.9,
"energy": "high",
"relationshipStage": "building",
"activeStreak": 5,
"churnRisk": 0.15
},
"memories": [
{ "memoryType": "fact", "content": "Runs a bakery in Mumbai called Sweet Dreams" },
{ "memoryType": "preference", "content": "Prefers short blog posts" }
],
"entities": [
{ "entityType": "pet", "displayName": "Bruno", "metadata": { "species": "dog" } },
{ "entityType": "person", "displayName": "Priya", "metadata": { "relation": "wife" } }
],
"context_text": "Contact: Arjun (whatsapp)\nMood: frustrated...\nMemories:\n- ...",
"memory_budget": 500
}3.2 After AI Response (Ingest Interaction)
POST /ingest { contact_id, message, role: "user", conversation_id }Contact Intelligence handles (async, returns 202):
- Mood classification (LLM haiku-class)
- Memory extraction (rules-based, < 10ms)
- Entity extraction (rules-based, < 5ms)
- Embedding generation (fire-and-forget)
- Relationship stage progression
- Churn risk recalculation
- Streak tracking
3.3 Mood-Based Tone Adaptation
The context_text includes adaptation hints injected into AI Brain's system prompt:
| Mood | Adaptation |
|---|---|
| frustrated | Be patient, acknowledge difficulty, offer concrete solutions. No cheerful emojis. |
| sad | Be empathetic and supportive. Don't be overly cheerful. |
| anxious | Be calm and reassuring. Break things into small steps. |
| excited | Match their energy. Be enthusiastic. |
| confused | Explain clearly, use examples, ask clarifying questions. |
3.4 Memory Budget by Relationship Stage
| Stage | Token Budget | Context Density |
|---|---|---|
| new | 0 tokens | No memories injected (first interaction) |
| building | ~500 tokens | Basic facts only |
| established | ~1200 tokens | Facts + preferences + recent episodes |
| deep | ~2000 tokens | Full context, entity references |
| fading | ~800 tokens | Key facts to re-engage |
| dormant | ~200 tokens | Minimal, just name + last topic |
4. Knowledge Base (Handled by AI Brain)
Chat Engine does NOT manage knowledge. AI Brain handles:
- Upload & ingestion:
POST /brain/knowledge→ chunk → embed → pgvector - Semantic search: During LLM orchestration, Brain searches
ai_knowledge_chunksautomatically - RAG injection: Brain injects top-K chunks into system prompt before LLM call
Chat Engine's role: pass the user message to Brain. Brain handles the rest.
5. Data Flow Summary
Customer message arrives
│
├─ Chat Engine: persist to chat_messages + KV cache
│
├─ Contact Intelligence: GET /context → memories, mood, entities
│
├─ AI Brain: POST /chat → orchestration pipeline:
│ ├─ Router (intent classification)
│ ├─ Memory retrieval (per-tenant from ai_memory)
│ ├─ Knowledge search (RAG from ai_knowledge_chunks)
│ ├─ Specialist prompt (with contact context + personality)
│ ├─ LLM streaming response
│ └─ Tool execution if needed
│
├─ Chat Engine: deliver response to channel
│
└─ Contact Intelligence: POST /ingest (async)
├─ Mood classification
├─ Memory extraction
├─ Entity extraction
└─ Relationship/churn update