System Architecture: HLD & LLD — Contact Intelligence · Contact Intelligence

Last Updated: 2026-04-03 Status: Draft

1. High-Level Design (HLD)

1.1 Core Architectural Components

Component	Responsibility	Technology
Contact Intelligence Service	API layer — context retrieval, ingestion, outreach management	Hono on Cloudflare Workers
Mood Classifier	Detect emotional state from message text per interaction	Haiku-class LLM (~20 tokens, inline)
Memory Store	Per-contact long-term memory with entity linking	Neon PostgreSQL + pgvector
State Manager	Hot contact state (mood, streak, stage) for fast reads	CF KV + PostgreSQL
Outreach Engine	Fire proactive triggers — scheduled, inactivity, milestones	CF Cron Triggers + CF Queues
Consolidation Engine	Merge duplicates, compress episodes, decay stale memories	CF Queues (deferred) + Cron (periodic)
Analytics Aggregator	Engagement metrics, churn risk scoring, retention computation	PostgreSQL views + Cron

1.2 Overall System Diagram (HLD)

2. Low-Level Design (LLD)

2.1 Request Lifecycle — Context Retrieval (Hot Path)

This is the critical hot path — called on every incoming message. Must be fast (< 30ms).

Performance budget:

Operation	Budget	Notes
KV state read	< 5ms	Edge-cached, single key lookup
pgvector query	< 20ms	HNSW index, tenant+contact filtered
Mood classifier	< 100ms	Haiku-class, but runs in parallel
Total response	< 30ms	KV and pgvector are the gates (mood is async-optional)

Key design choice: The mood classifier runs in parallel with KV + pgvector reads. If it's slower than the memory fetch, we return the previous mood from KV state and update it asynchronously. The chat engine never waits for mood classification.

2.2 Request Lifecycle — Ingestion (Post-Response)

After the Chat Engine generates a response, it sends the interaction back for processing:

Why split immediate vs deferred?

The Chat Engine needs a fast acknowledgment (< 50ms). Embedding generation, duplicate merging, and trigger creation are not time-sensitive — they can process in the background.

2.3 Mood Classification Pipeline

Challenge: Running an LLM call on every message adds latency and cost. Must be ultra-lightweight.

Solution: Haiku-class model with a tiny prompt (~50 input tokens, ~10 output tokens).

Prompt:

Classify the mood from this message. Respond with JSON only.
Message: "{user_message}"
Output: { "mood": "<emotion>", "energy": "<level>", "style": "<approach>" }
 
mood: happy, sad, anxious, excited, neutral, angry, frustrated, flirty, bored, grateful
energy: high, medium, low
style: playful, deep, casual, supportive, romantic, venting

Cost: ~$0.0001 per classification. At 10,000 messages/day = ~$1/day.

Fallback: If the classifier fails or times out, use the previous mood from KV state. Mood doesn't change drastically between adjacent messages — staleness of one message is acceptable.

2.4 Outreach Engine — Proactive Message Delivery

Rate limiting rules (non-negotiable):

Rule	Enforcement
Max 2 re-engagement messages per inactivity period	Counter on `contact_triggers`
Min 3 days between inactivity messages	`last_outreach_at` check
Max 1 recurring message per day	Dedup by `contact_id + trigger_type + date`
If contact says "stop" → disable all outreach permanently	`outreach_disabled: true` on `contact_state`
If contact dormant (30+ days) → stop all outreach	`relationship_stage: "dormant"` check

2.5 Relationship Stage Computation

The relationship stage is computed from engagement signals, not hardcoded:

new         → < 3 sessions
building    → 3-14 sessions, active in last 7 days
established → 15+ sessions, active in last 7 days
deep        → 30+ sessions, 14+ day active streak
fading      → Was building/established, inactive for 5+ days
dormant     → Inactive for 30+ days

function computeRelationshipStage(state: ContactState): RelationshipStage {
  const daysSinceActive = daysBetween(state.last_active_at, now())
  
  if (daysSinceActive > 30) return 'dormant'
  if (daysSinceActive > 5 && state.total_sessions >= 3) return 'fading'
  if (state.total_sessions >= 30 && state.active_streak >= 14) return 'deep'
  if (state.total_sessions >= 15) return 'established'
  if (state.total_sessions >= 3) return 'building'
  return 'new'
}

2.6 Churn Risk Scoring

Computed hourly via cron, stored on contact_state:

ChurnRisk = (inactivity_signal × 0.40)
          + (engagement_decline × 0.30)
          + (session_shortening × 0.20)
          + (mood_decline × 0.10)
 
inactivity_signal:    days_since_last_active / 7, clamped [0, 1]
engagement_decline:   1 - (messages_this_week / messages_last_week), clamped [0, 1]
session_shortening:   1 - (avg_session_this_week / avg_session_last_week), clamped [0, 1]
mood_decline:         negative_mood_count_this_week / total_messages_this_week

A churn_risk > 0.6 is flagged for outreach. > 0.8 is surfaced to the business owner via AI Brain insights.

3. Data Flow Guarantees

Context retrieval never blocks on embedding generation. Memories are embedded during deferred processing. Until embedded, they are still stored in PostgreSQL with content and can be retrieved by exact entity match — just not by vector similarity. No memory is lost during the embedding delay.
Mood staleness is bounded to one message. If the mood classifier is slow, we return the previous mood. The worst case: one AI response uses a mood that's one message stale. Mood is updated asynchronously and ready for the next message.
Outreach never spams. Hard rate limits are enforced at the database level (unique constraints) and application level (rate check before send). A bug in trigger creation cannot cause duplicate sends — the delivery layer deduplicates by contact_id + trigger_type + date.
Contact data isolation is absolute. Every query scopes to tenant_id AND contact_id. Tenant A's contacts are invisible to Tenant B. RLS is the safety net. A contact's memories are never shared with another contact within the same tenant.
Memory deletion is irreversible and complete. When a contact says "forget me", ALL rows in contact_memory, contact_state, and contact_triggers for that contact are deleted. The operation is synchronous — the next message starts from a blank slate.

4. Open Design Decisions (TBD)

Should mood classification use an LLM or a fine-tuned classifier? LLM (haiku-class) is simpler and more accurate for nuanced emotions but costs ~$0.0001/message. A fine-tuned BERT-class model would be free to run on the edge but needs training data and infra. Start with LLM, evaluate at scale.
Cross-channel contact merging. If Arjun chats on WhatsApp and later switches to Telegram, should we merge the profiles? Matching by phone number works for WhatsApp+Telegram. Web widget users have no phone number — matching is harder. Start with same-channel identity, add cross-channel merging in v2.
Memory limits per contact. Should we cap the number of memories per contact? At 100+ memories, vector retrieval is still fast (HNSW), but context injection bloats. Suggested: soft cap at 100 active memories, consolidation compresses older ones. Monitor and adjust.
Who pays for mood classification? Currently bundled into platform costs. At scale (1M messages/day), mood classification alone costs ~$100/day. Should this be a premium feature? Or absorb into platform pricing?