logicspike/docs

Contact Intelligence

System Architecture: HLD & LLD — Contact Intelligence

Last Updated: 2026-04-03 Status: Draft


1. High-Level Design (HLD)

1.1 Core Architectural Components

Component Responsibility Technology
Contact Intelligence Service API layer — context retrieval, ingestion, outreach management Hono on Cloudflare Workers
Mood Classifier Detect emotional state from message text per interaction Haiku-class LLM (~20 tokens, inline)
Memory Store Per-contact long-term memory with entity linking Neon PostgreSQL + pgvector
State Manager Hot contact state (mood, streak, stage) for fast reads CF KV + PostgreSQL
Outreach Engine Fire proactive triggers — scheduled, inactivity, milestones CF Cron Triggers + CF Queues
Consolidation Engine Merge duplicates, compress episodes, decay stale memories CF Queues (deferred) + Cron (periodic)
Analytics Aggregator Engagement metrics, churn risk scoring, retention computation PostgreSQL views + Cron

1.2 Overall System Diagram (HLD)


2. Low-Level Design (LLD)

2.1 Request Lifecycle — Context Retrieval (Hot Path)

This is the critical hot path — called on every incoming message. Must be fast (< 30ms).

Performance budget:

Operation Budget Notes
KV state read < 5ms Edge-cached, single key lookup
pgvector query < 20ms HNSW index, tenant+contact filtered
Mood classifier < 100ms Haiku-class, but runs in parallel
Total response < 30ms KV and pgvector are the gates (mood is async-optional)

Key design choice: The mood classifier runs in parallel with KV + pgvector reads. If it's slower than the memory fetch, we return the previous mood from KV state and update it asynchronously. The chat engine never waits for mood classification.

2.2 Request Lifecycle — Ingestion (Post-Response)

After the Chat Engine generates a response, it sends the interaction back for processing:

Why split immediate vs deferred?

The Chat Engine needs a fast acknowledgment (< 50ms). Embedding generation, duplicate merging, and trigger creation are not time-sensitive — they can process in the background.

2.3 Mood Classification Pipeline

Challenge: Running an LLM call on every message adds latency and cost. Must be ultra-lightweight.

Solution: Haiku-class model with a tiny prompt (~50 input tokens, ~10 output tokens).

Prompt:

Classify the mood from this message. Respond with JSON only.
Message: "{user_message}"
Output: { "mood": "<emotion>", "energy": "<level>", "style": "<approach>" }
 
mood: happy, sad, anxious, excited, neutral, angry, frustrated, flirty, bored, grateful
energy: high, medium, low
style: playful, deep, casual, supportive, romantic, venting

Cost: ~$0.0001 per classification. At 10,000 messages/day = ~$1/day.

Fallback: If the classifier fails or times out, use the previous mood from KV state. Mood doesn't change drastically between adjacent messages — staleness of one message is acceptable.

2.4 Outreach Engine — Proactive Message Delivery

Rate limiting rules (non-negotiable):

Rule Enforcement
Max 2 re-engagement messages per inactivity period Counter on contact_triggers
Min 3 days between inactivity messages last_outreach_at check
Max 1 recurring message per day Dedup by contact_id + trigger_type + date
If contact says "stop" → disable all outreach permanently outreach_disabled: true on contact_state
If contact dormant (30+ days) → stop all outreach relationship_stage: "dormant" check

2.5 Relationship Stage Computation

The relationship stage is computed from engagement signals, not hardcoded:

new         → < 3 sessions
building    → 3-14 sessions, active in last 7 days
established → 15+ sessions, active in last 7 days
deep        → 30+ sessions, 14+ day active streak
fading      → Was building/established, inactive for 5+ days
dormant     → Inactive for 30+ days
function computeRelationshipStage(state: ContactState): RelationshipStage {
  const daysSinceActive = daysBetween(state.last_active_at, now())
  
  if (daysSinceActive > 30) return 'dormant'
  if (daysSinceActive > 5 && state.total_sessions >= 3) return 'fading'
  if (state.total_sessions >= 30 && state.active_streak >= 14) return 'deep'
  if (state.total_sessions >= 15) return 'established'
  if (state.total_sessions >= 3) return 'building'
  return 'new'
}

2.6 Churn Risk Scoring

Computed hourly via cron, stored on contact_state:

ChurnRisk = (inactivity_signal × 0.40)
          + (engagement_decline × 0.30)
          + (session_shortening × 0.20)
          + (mood_decline × 0.10)
 
inactivity_signal:    days_since_last_active / 7, clamped [0, 1]
engagement_decline:   1 - (messages_this_week / messages_last_week), clamped [0, 1]
session_shortening:   1 - (avg_session_this_week / avg_session_last_week), clamped [0, 1]
mood_decline:         negative_mood_count_this_week / total_messages_this_week

A churn_risk > 0.6 is flagged for outreach. > 0.8 is surfaced to the business owner via AI Brain insights.


3. Data Flow Guarantees

  1. Context retrieval never blocks on embedding generation. Memories are embedded during deferred processing. Until embedded, they are still stored in PostgreSQL with content and can be retrieved by exact entity match — just not by vector similarity. No memory is lost during the embedding delay.

  2. Mood staleness is bounded to one message. If the mood classifier is slow, we return the previous mood. The worst case: one AI response uses a mood that's one message stale. Mood is updated asynchronously and ready for the next message.

  3. Outreach never spams. Hard rate limits are enforced at the database level (unique constraints) and application level (rate check before send). A bug in trigger creation cannot cause duplicate sends — the delivery layer deduplicates by contact_id + trigger_type + date.

  4. Contact data isolation is absolute. Every query scopes to tenant_id AND contact_id. Tenant A's contacts are invisible to Tenant B. RLS is the safety net. A contact's memories are never shared with another contact within the same tenant.

  5. Memory deletion is irreversible and complete. When a contact says "forget me", ALL rows in contact_memory, contact_state, and contact_triggers for that contact are deleted. The operation is synchronous — the next message starts from a blank slate.


4. Open Design Decisions (TBD)

  1. Should mood classification use an LLM or a fine-tuned classifier? LLM (haiku-class) is simpler and more accurate for nuanced emotions but costs ~$0.0001/message. A fine-tuned BERT-class model would be free to run on the edge but needs training data and infra. Start with LLM, evaluate at scale.

  2. Cross-channel contact merging. If Arjun chats on WhatsApp and later switches to Telegram, should we merge the profiles? Matching by phone number works for WhatsApp+Telegram. Web widget users have no phone number — matching is harder. Start with same-channel identity, add cross-channel merging in v2.

  3. Memory limits per contact. Should we cap the number of memories per contact? At 100+ memories, vector retrieval is still fast (HNSW), but context injection bloats. Suggested: soft cap at 100 active memories, consolidation compresses older ones. Monitor and adjust.

  4. Who pays for mood classification? Currently bundled into platform costs. At scale (1M messages/day), mood classification alone costs ~$100/day. Should this be a premium feature? Or absorb into platform pricing?

Contact Intelligence