Last Updated: 2026-04-03 Status: Draft
1. High-Level Design (HLD)
1.1 Core Architectural Components
| Component | Responsibility | Technology |
|---|---|---|
| Contact Intelligence Service | API layer — context retrieval, ingestion, outreach management | Hono on Cloudflare Workers |
| Mood Classifier | Detect emotional state from message text per interaction | Haiku-class LLM (~20 tokens, inline) |
| Memory Store | Per-contact long-term memory with entity linking | Neon PostgreSQL + pgvector |
| State Manager | Hot contact state (mood, streak, stage) for fast reads | CF KV + PostgreSQL |
| Outreach Engine | Fire proactive triggers — scheduled, inactivity, milestones | CF Cron Triggers + CF Queues |
| Consolidation Engine | Merge duplicates, compress episodes, decay stale memories | CF Queues (deferred) + Cron (periodic) |
| Analytics Aggregator | Engagement metrics, churn risk scoring, retention computation | PostgreSQL views + Cron |
1.2 Overall System Diagram (HLD)
2. Low-Level Design (LLD)
2.1 Request Lifecycle — Context Retrieval (Hot Path)
This is the critical hot path — called on every incoming message. Must be fast (< 30ms).
Performance budget:
| Operation | Budget | Notes |
|---|---|---|
| KV state read | < 5ms | Edge-cached, single key lookup |
| pgvector query | < 20ms | HNSW index, tenant+contact filtered |
| Mood classifier | < 100ms | Haiku-class, but runs in parallel |
| Total response | < 30ms | KV and pgvector are the gates (mood is async-optional) |
Key design choice: The mood classifier runs in parallel with KV + pgvector reads. If it's slower than the memory fetch, we return the previous mood from KV state and update it asynchronously. The chat engine never waits for mood classification.
2.2 Request Lifecycle — Ingestion (Post-Response)
After the Chat Engine generates a response, it sends the interaction back for processing:
Why split immediate vs deferred?
The Chat Engine needs a fast acknowledgment (< 50ms). Embedding generation, duplicate merging, and trigger creation are not time-sensitive — they can process in the background.
2.3 Mood Classification Pipeline
Challenge: Running an LLM call on every message adds latency and cost. Must be ultra-lightweight.
Solution: Haiku-class model with a tiny prompt (~50 input tokens, ~10 output tokens).
Prompt:
Classify the mood from this message. Respond with JSON only.
Message: "{user_message}"
Output: { "mood": "<emotion>", "energy": "<level>", "style": "<approach>" }
mood: happy, sad, anxious, excited, neutral, angry, frustrated, flirty, bored, grateful
energy: high, medium, low
style: playful, deep, casual, supportive, romantic, ventingCost: ~$0.0001 per classification. At 10,000 messages/day = ~$1/day.
Fallback: If the classifier fails or times out, use the previous mood from KV state. Mood doesn't change drastically between adjacent messages — staleness of one message is acceptable.
2.4 Outreach Engine — Proactive Message Delivery
Rate limiting rules (non-negotiable):
| Rule | Enforcement |
|---|---|
| Max 2 re-engagement messages per inactivity period | Counter on contact_triggers |
| Min 3 days between inactivity messages | last_outreach_at check |
| Max 1 recurring message per day | Dedup by contact_id + trigger_type + date |
| If contact says "stop" → disable all outreach permanently | outreach_disabled: true on contact_state |
| If contact dormant (30+ days) → stop all outreach | relationship_stage: "dormant" check |
2.5 Relationship Stage Computation
The relationship stage is computed from engagement signals, not hardcoded:
new → < 3 sessions
building → 3-14 sessions, active in last 7 days
established → 15+ sessions, active in last 7 days
deep → 30+ sessions, 14+ day active streak
fading → Was building/established, inactive for 5+ days
dormant → Inactive for 30+ daysfunction computeRelationshipStage(state: ContactState): RelationshipStage {
const daysSinceActive = daysBetween(state.last_active_at, now())
if (daysSinceActive > 30) return 'dormant'
if (daysSinceActive > 5 && state.total_sessions >= 3) return 'fading'
if (state.total_sessions >= 30 && state.active_streak >= 14) return 'deep'
if (state.total_sessions >= 15) return 'established'
if (state.total_sessions >= 3) return 'building'
return 'new'
}2.6 Churn Risk Scoring
Computed hourly via cron, stored on contact_state:
ChurnRisk = (inactivity_signal × 0.40)
+ (engagement_decline × 0.30)
+ (session_shortening × 0.20)
+ (mood_decline × 0.10)
inactivity_signal: days_since_last_active / 7, clamped [0, 1]
engagement_decline: 1 - (messages_this_week / messages_last_week), clamped [0, 1]
session_shortening: 1 - (avg_session_this_week / avg_session_last_week), clamped [0, 1]
mood_decline: negative_mood_count_this_week / total_messages_this_weekA churn_risk > 0.6 is flagged for outreach. > 0.8 is surfaced to the business owner via AI Brain insights.
3. Data Flow Guarantees
-
Context retrieval never blocks on embedding generation. Memories are embedded during deferred processing. Until embedded, they are still stored in PostgreSQL with content and can be retrieved by exact entity match — just not by vector similarity. No memory is lost during the embedding delay.
-
Mood staleness is bounded to one message. If the mood classifier is slow, we return the previous mood. The worst case: one AI response uses a mood that's one message stale. Mood is updated asynchronously and ready for the next message.
-
Outreach never spams. Hard rate limits are enforced at the database level (unique constraints) and application level (rate check before send). A bug in trigger creation cannot cause duplicate sends — the delivery layer deduplicates by
contact_id + trigger_type + date. -
Contact data isolation is absolute. Every query scopes to
tenant_idANDcontact_id. Tenant A's contacts are invisible to Tenant B. RLS is the safety net. A contact's memories are never shared with another contact within the same tenant. -
Memory deletion is irreversible and complete. When a contact says "forget me", ALL rows in
contact_memory,contact_state, andcontact_triggersfor that contact are deleted. The operation is synchronous — the next message starts from a blank slate.
4. Open Design Decisions (TBD)
-
Should mood classification use an LLM or a fine-tuned classifier? LLM (haiku-class) is simpler and more accurate for nuanced emotions but costs ~$0.0001/message. A fine-tuned BERT-class model would be free to run on the edge but needs training data and infra. Start with LLM, evaluate at scale.
-
Cross-channel contact merging. If Arjun chats on WhatsApp and later switches to Telegram, should we merge the profiles? Matching by phone number works for WhatsApp+Telegram. Web widget users have no phone number — matching is harder. Start with same-channel identity, add cross-channel merging in v2.
-
Memory limits per contact. Should we cap the number of memories per contact? At 100+ memories, vector retrieval is still fast (HNSW), but context injection bloats. Suggested: soft cap at 100 active memories, consolidation compresses older ones. Monitor and adjust.
-
Who pays for mood classification? Currently bundled into platform costs. At scale (1M messages/day), mood classification alone costs ~$100/day. Should this be a premium feature? Or absorb into platform pricing?