Last Updated: 2026-04-03 Status: Draft
1. The Challenge
-
Chat Engine is stateless. It processes one message at a time. The 20-message sliding window is fine for support but destroys relationship continuity. An AI that forgets "your dog's name is Bruno" isn't building a relationship — it's breaking one.
-
AI Brain memory is per-tenant, not per-contact. Brain memories answer "what does this business care about?" not "what does this specific end-user care about?" They're different domains with different lifecycles.
-
Emotions matter in relationship contexts. A customer support bot can be consistently neutral. An AI companion, tutor, or coach must adapt: playful when the user is happy, supportive when they're sad, calm when they're anxious. Tone-deaf responses destroy trust.
-
Relationship depth requires context density. A new contact needs discovery questions. A 30-day contact expects the AI to know their life. The system must scale context injection based on relationship stage — thin prompts for new contacts, rich prompts for deep relationships.
2. Memory Architecture — Per-Contact, Entity-Linked
2.1 How It Differs from AI Brain Memory
| Dimension | AI Brain Memory | Contact Memory |
|---|---|---|
| Scope | Per-tenant (business owner) | Per-contact (end-user) |
| Content | Business facts, patterns, preferences | Personal facts, relationship episodes, user preferences |
| Decay rate | 0.01/day (business context changes fast) | 0.005/day (personal facts are stable) |
| Volume | ~100 memories per tenant (consolidated) | ~50-80 per active contact |
| Access pattern | On-demand via copilot | Every message (hot path) |
| Extraction | From owner conversations | From end-user conversations |
2.2 Memory Types
| Type | Default Importance | Decay Rate | Example | Extraction |
|---|---|---|---|---|
fact |
0.7 | 0.003 | "Has a golden retriever named Bruno" | Rules-based ("I have", "my X is") |
preference |
0.8 | 0.005 | "Doesn't like talking about politics" | Rules-based ("I prefer", "don't like", "I love") |
episode |
0.5 | 0.008 | "Bruno ate his shoes on April 3rd" | Every interaction creates a lightweight episode |
pattern |
0.8 | 0.004 | "Usually chats between 10pm-12am" | Consolidation (computed from 5+ episodes) |
Key insight: Facts decay slowest (0.003/day). "Your dog's name is Bruno" should last for months. Episodes decay fastest (0.008/day). "We talked about the weather on March 5th" fades unless reinforced.
2.3 Multi-Signal Retrieval
Same scoring formula as AI Brain, tuned for contact context:
RetrievalScore = (similarity × 0.35)
+ (recency × 0.25)
+ (importance × 0.20)
+ (access_frequency × 0.10)
+ (entity_match × 0.10)Key difference: entity_match weight is doubled (0.10 vs 0.05). In personal conversations, entity relevance matters more — if the user mentions "Bruno," memories linked to pet:bruno should surface regardless of embedding similarity.
2.4 Entity Graph — Personal Knowledge Map
Every contact builds a personal knowledge graph:
Contact: Arjun
├── pet:bruno
│ ├── fact: "Golden retriever, 3 years old"
│ ├── episode: "Had vet appointment April 1"
│ ├── episode: "Ate shoes again April 3"
│ └── preference: "Loves belly rubs"
├── workplace:infosys
│ ├── fact: "Software engineer"
│ ├── episode: "Was stressed about sprint deadline"
│ └── episode: "Passed over for promotion"
├── hobby:gaming
│ ├── fact: "Plays Valorant"
│ └── preference: "Prefers to game on weekends"
├── person:mom
│ ├── fact: "Lives in Chennai"
│ └── episode: "Visited mom for Diwali"
└── topic:food
├── preference: "Loves biryani"
└── preference: "Doesn't like spicy food"Entity extraction runs in deferred processing (CF Queue, ~5min delay). A lightweight rules-based extractor identifies named entities and links them to existing or new ContactEntity records.
3. Immediate Memory Extraction (Inline, < 10ms)
Runs synchronously on every ingested interaction. Must be fast — no LLM calls, purely rules-based.
3.1 Extraction Rules
Fact detection:
Patterns:
"I am {X}" / "I'm {X}" → fact about identity
"I have {X}" / "I've got {X}" → fact about possessions/relationships
"My {X} is {Y}" → fact about named entity
"I work at {X}" / "I study at {X}" → fact about workplace/school
"I live in {X}" → fact about location
"I'm {age} years old" → fact about age
Example:
"My dog Bruno had his vet appointment today"
→ fact: "Dog Bruno had vet appointment", entity: pet:brunoPreference detection:
Patterns:
"I like {X}" / "I love {X}" → positive preference
"I don't like {X}" / "I hate {X}" → negative preference
"I prefer {X}" / "I'd rather {X}" → comparative preference
"Don't talk about {X}" → avoidance preference
"Can we talk about {X}?" → interest signal
Example:
"I don't really like talking about politics"
→ preference: "Doesn't like talking about politics", entity: topic:politicsEpisode detection:
Every interaction generates a lightweight episode:
"{summary of what was discussed}" — extracted from user message + AI response
But ONLY if the conversation contains meaningful content:
Skip: "lol", "ok", "hmm", "haha" (low-content messages)
Keep: "Bruno ate my shoes again" (event with entity reference)3.2 Deduplication at Extraction
Before storing, check for semantic overlap with existing memories:
New extraction: "Has a dog named Bruno"
Existing memory: "Has a golden retriever named Bruno"
Cosine similarity: 0.94 (above 0.90 threshold)
→ Don't create duplicate
→ Boost existing memory importance by 0.05 (reinforced)This check runs against KV-cached recent memories (last 20) for speed. Full dedup happens in deferred consolidation.
4. Emotional Intelligence Layer
4.1 Mood Classification
Classifier prompt (haiku-class, ~50 input tokens):
Classify this message's emotion. JSON only.
Message: "{user_message}"
{ "mood": "<emotion>", "energy": "<level>", "style": "<approach>" }
mood: happy|sad|anxious|excited|neutral|angry|frustrated|flirty|bored|grateful
energy: high|medium|low
style: playful|deep|casual|supportive|romantic|ventingCost: ~$0.0001 per message
Fallback chain:
- Attempt haiku-class classification (primary)
- If timeout (>200ms) → use previous mood from KV state
- If LLM provider is down → use keyword-based fallback:
Keywords → mood mapping (fallback only):
"haha", "lol", "😂", "amazing" → happy, high
"sad", "crying", "😢", "miss" → sad, low
"worried", "nervous", "anxious" → anxious, low
"ugh", "annoyed", "frustrated" → frustrated, medium
"bored", "meh", "whatever" → bored, low
default → neutral, medium4.2 Mood History & Pattern Detection
Mood transitions are logged in InteractionLog. The consolidation engine detects patterns:
Pattern: Sustained negative mood
If mood ∈ {sad, anxious, angry} for 3+ consecutive messages:
→ Create memory: "Contact went through a difficult period around {date}"
→ Create trigger: check-in message next morning
→ If crisis language detected ("I want to die", "can't take this anymore"):
→ Flag for human review (if configured)
→ AI responds with helpline information (immediate, no delay)Pattern: Consistent timing
If 70%+ of messages arrive between 10pm-12am over 14+ days:
→ Create pattern memory: "Usually chats between 10pm-12am"
→ Use for outreach timing: schedule proactive messages near preferred hoursPattern: Engagement decline
If avg_messages_per_session drops 50%+ over 7 days:
→ Increase churn_risk by 0.2
→ If mood pattern shows "bored" → create insight for business owner4.3 Adaptive System Prompt Injection
The emotional state is injected into the Chat Engine's system prompt:
CONTACT EMOTIONAL CONTEXT:
- Current mood: sad (energy: low)
- This is unusual: normally playful and medium-energy
- Relationship stage: established (20 days, high trust)
- Recent context: was upset about missing a promotion yesterday
ADAPTATION:
- Be warm and genuinely supportive, not cheerful
- Don't minimize feelings
- Reference what you remember about their work situation
- Don't pivot to lighter topics unless they do first
- Shorter, calmer responses — match their energyThe adaptation rules vary by mood:
| Mood | Energy | Adaptation |
|---|---|---|
| happy + high | Match energy — be enthusiastic, playful, use emojis | |
| happy + low | Gentle warmth — they're content but tired | |
| sad + low | Supportive, listen first — don't force positivity | |
| sad + high | They want to talk about it — engage deeply | |
| anxious + any | Calm, reassuring — acknowledge the feeling | |
| frustrated + high | Let them vent — validate, then offer perspective | |
| bored + any | Introduce new topics, ask engaging questions | |
| flirty + any | (Depends on bot persona) — match or gently redirect |
5. Context Injection by Relationship Stage
Different relationship stages get different context density:
5.1 Stage: new (< 3 sessions)
CONTACT CONTEXT:
- New contact (first session)
- No memories yet
- Mood: neutral
BEHAVIOR:
- Ask discovery questions naturally (name, interests, pets, work)
- Don't overload with questions — 1-2 per response
- Be warm and welcoming
- Focus on building rapportToken budget for memories: 0 (none exist yet)
5.2 Stage: building (3-14 sessions)
CONTACT CONTEXT:
- Building relationship (8 sessions over 10 days)
- Knows: Arjun, software engineer, has dog Bruno
- Preferences: likes playful banter, chats at night
- Mood: playful (normal for him)
MEMORIES:
- [fact] Software engineer at Infosys
- [fact] Golden retriever named Bruno
- [preference] Likes playful banter and humorToken budget for memories: ~500 tokens (5-8 memories)
5.3 Stage: established (15+ sessions)
CONTACT CONTEXT:
- Established relationship (22 sessions over 20 days, active streak: 20)
- Deep knowledge: work, pet, hobbies, preferences, emotional patterns
- Mood: frustrated (slightly unusual)
MEMORIES:
- [fact] Software engineer at Infosys (importance: 0.85)
- [fact] Golden retriever named Bruno, 3 years old (importance: 0.88)
- [preference] Likes playful banter and humor (importance: 0.80)
- [pattern] Usually chats between 10pm-12am (importance: 0.75)
- [pattern] Work stress comes in sprints — supportive then, playful otherwise (importance: 0.72)
- [episode] Passed over for promotion last week — was very upset (importance: 0.70)
- [preference] Doesn't like talking about politics (importance: 0.65)
- [fact] Mom lives in Chennai, visited for Diwali (importance: 0.60)Token budget for memories: ~1200 tokens (10-15 memories)
5.4 Stage: deep (30+ sessions, 14+ day streak)
Full context injection — the AI should feel like it truly knows this person.
Token budget for memories: ~2000 tokens (15-20 memories, richest context)
6. Memory Consolidation
6.1 Three-Tier Consolidation (Same Pattern as AI Brain)
| Tier | Trigger | Tasks | Speed |
|---|---|---|---|
| Immediate | Every interaction (inline) | Extract facts + preferences, update state, check duplicates | < 10ms |
| Deferred | CF Queue (~5min delay) | Generate embeddings, merge duplicates, extract entities, create triggers | ~30s |
| Periodic | CF Cron (every 6 hours) | Decay importance, compress episodes → patterns, prune dead memories, recompute churn | ~minutes |
6.2 Episode → Pattern Compression
When 5+ episodes share a theme:
Episodes about Bruno:
- "Bruno went to the vet" (March 20)
- "Bruno ate Arjun's shoes" (March 28)
- "Bruno ate shoes again" (April 3)
- "Bruno learned a new trick" (April 5)
- "Took Bruno to the park" (April 8)
Pattern extracted:
"Arjun frequently shares updates about his dog Bruno —
Bruno is a significant part of his daily life"
(type: pattern, importance: 0.80, entity: pet:bruno)Individual episodes remain in PostgreSQL for history but are NOT loaded into the LLM context. The pattern replaces them — denser and more useful.
6.3 Memory Growth Over Time
| Timeline | Memory Count | Composition |
|---|---|---|
| Day 1 | 3-5 | Facts from first conversation |
| Week 1 | 15-20 | Facts + preferences + early episodes |
| Month 1 | 40-50 | Facts + preferences + patterns emerging |
| Month 3 | 50-70 | Consolidated: patterns replace episodes, noise pruned |
| Month 6 | 60-80 | Mature: mostly facts + patterns + preferences, few raw episodes |
| Year 1 | 70-100 | Stable: consolidation keeps it bounded, high signal-to-noise |
The count doesn't grow linearly because consolidation compresses and prunes. A year-old contact has ~100 high-quality memories, not 1,000 raw episodes.