System Architecture: HLD & LLD

Last Updated: 2026-04-07 Status: Active (v2 — updated for AI Brain + Contact Intelligence integration)

This document defines the High-Level Design (HLD) and Low-Level Design (LLD) for the LogicSpike Chat Engine, an omnichannel AI chatbot platform.

1. High-Level Design (HLD)

The Chat Engine is the thin orchestration layer that connects customers (via WhatsApp, Telegram, Website Widget) to the AI pipeline. It does NOT own LLM calls, memory, or knowledge — those are delegated to AI Brain and Contact Intelligence.

1.1 Architecture Principle: Thin Orchestrator

The Chat Engine's job is narrow:

Receive messages from channels (webhooks, widget SSE)
Enrich with contact context (call Contact Intelligence)
Route to AI Brain for LLM response generation
Deliver the response back to the channel
Ingest the interaction into Contact Intelligence for memory/mood tracking

Everything else — LLM orchestration, memory, knowledge RAG, mood classification, entity extraction, outreach — is handled by other services.

1.2 Core Architectural Components

Component	Responsibility	Technology
API Gateway	Entry point, rate limiting, JWT validation	Cloudflare Gateway
Webhook Ingress	Accept WhatsApp/Telegram webhooks, verify signatures, enqueue	Hono on CF Workers
Widget API	SSE streaming for website widget	Hono on CF Workers
Chat Orchestrator	Fetch context → call Brain → deliver response	CF Workers
Contact Intelligence	Per-contact memory, mood, entities, outreach	Separate service (port 8794)
AI Brain	LLM orchestration, knowledge RAG, tools	Separate service (port 8793)
Communication Service	Email/SMS delivery (SMTP, Twilio)	Separate service (port 8790)
Unified Inbox	Real-time WebSocket for dashboard agents	CF Durable Objects / SSE

1.3 System Diagram

2. Message Flow (Critical Path)

2.1 Inbound Message Processing

Customer sends "How much do your cakes cost?"
  │
  ├─ 1. Webhook/Widget receives message
  │     └─ Verify signature (WhatsApp HMAC / widget token)
  │     └─ ACK 200 immediately (< 100ms)
  │
  ├─ 2. Resolve or create contact
  │     └─ POST /contacts { external_id, channel, display_name }
  │     └─ Contact Intelligence returns contact_id
  │
  ├─ 3. Fetch contact context (parallel)
  │     └─ GET /context/{contact_id}?query="How much do cakes cost"
  │     └─ Returns: mood, memories, entities, context_text, adaptation hints
  │     └─ Target: < 30ms
  │
  ├─ 4. Call AI Brain for response
  │     └─ POST /brain/chat { message, context_text, personality }
  │     └─ Brain handles: routing → specialist → tools → RAG → streaming
  │     └─ Returns: SSE stream of tokens
  │
  ├─ 5. Deliver response to customer
  │     └─ Widget: stream SSE tokens directly
  │     └─ WhatsApp/Telegram: buffer full response, send via API
  │
  ├─ 6. Ingest interaction into Contact Intelligence
  │     └─ POST /ingest { contact_id, message, role: "user" }
  │     └─ POST /ingest { contact_id, message, role: "assistant" }
  │     └─ CI handles: mood classification, memory extraction, entity extraction
  │     └─ Returns 202 (async processing)
  │
  └─ 7. Persist to DB + broadcast to dashboard
        └─ INSERT into chat_messages
        └─ WebSocket/SSE event to Unified Inbox

2.2 What Chat Engine Does NOT Do

Concern	Handled By	NOT by Chat Engine
LLM model selection	AI Brain (registry.ts)
Knowledge base RAG	AI Brain (knowledge/ingest.ts)
Memory extraction	Contact Intelligence (ingest)
Mood classification	Contact Intelligence (mood.ts)
Entity graph	Contact Intelligence (entities.ts)
Proactive outreach	Contact Intelligence (outreach-scanner)
Tool execution (blog, content)	AI Brain (tools/)
Email/SMS delivery	Communication Service
Cost governance	AI Brain (cost-governor)

3. Low-Level Design

3.1 Session Management

Sessions are lightweight — the heavy context lives in Contact Intelligence.

interface ChatSession {
  id: string
  tenantId: string
  contactId: string          // → Contact Intelligence contact
  channelId: string          // → ChannelIntegration
  status: "ai_handled" | "human_escalated" | "closed"
  assignedAgentId?: string
  summary?: string           // AI-generated on close
  slaBreachAt?: Date
  messageCount: number
  createdAt: Date
  lastMessageAt: Date
}

Session cache: CF KV with 24h TTL (replaces Redis from v1 docs).

Key: session:{tenantId}:{contactId}:active
Value: session ID + last 20 message IDs

3.2 Webhook Ingress (WhatsApp)

POST /webhooks/whatsapp/{tenantId}
  ├─ Verify HMAC-SHA256 signature (< 1ms)
  ├─ Parse webhook payload → normalize to internal format
  ├─ Dedup check: external_message_id unique constraint
  ├─ Find or create ChatSession
  ├─ Resolve ContactIntelligence contact
  ├─ Run orchestration pipeline (steps 2-7 above)
  └─ Return 200 OK

Idempotency: external_message_id + channelId unique constraint prevents duplicate processing on webhook retries.

POST /chat/widget/message
  ├─ Validate widget token against ChannelIntegration
  ├─ CORS check: origin must match registered domain
  ├─ Run orchestration pipeline
  ├─ Stream AI Brain response as SSE events
  │     event: token → { text: "Our" }
  │     event: token → { text: " cakes" }
  │     event: done  → { session_id, message_id }
  └─ Return SSE stream

3.4 Human Escalation & Handoff

AI Brain returns [[HANDOFF]] in response
  ├─ Chat Engine intercepts (does NOT send to customer)
  ├─ Send fallback message: "Connecting you to a team member..."
  ├─ Update session: status = "human_escalated"
  ├─ Set sla_breach_at = NOW() + 15 minutes
  ├─ Fire WebSocket event: chat:escalated
  └─ Agent sees it in Unified Inbox

SLA Breach handling:

Cron sweeps every 1 minute for unassigned escalated sessions past SLA
Sends fallback: "Our team is offline. We'll email you back!"
Fires urgent alert via Communication Service

3.5 Contact Intelligence Integration Points

Chat Engine Event	CI API Call	Purpose
New message from customer	`POST /contacts`	Resolve/create contact
Before AI response	`GET /context/{id}?query=...`	Fetch memories, mood, adaptation hints
After AI response	`POST /ingest` (user msg)	Mood classification, memory extraction
After AI response	`POST /ingest` (assistant msg)	Track assistant messages
Session closed	(none — CI tracks via inactivity)
Outreach trigger fires	CI calls Chat Engine delivery	Send proactive message to channel

Bootstrapper script (~3KB) → sandboxed <iframe> on widget.logicspike.com
postMessage API for host ↔ iframe communication
SSE connection for streaming AI responses
Auto-reconnect on network drop
Widget token validated against ChannelIntegration.identifier (allowed domain)

4. Technology Choices (Updated)

Concern	v1 Docs (March 2026)	v2 Decision (April 2026)
Runtime	Node.js (Express/Hono)	Hono on Cloudflare Workers
Queue	Redis Streams / BullMQ	CF Queues (or direct invocation)
Session Cache	Redis	CF KV (24h TTL)
LLM Orchestration	In-service (own RAG pipeline)	Delegated to AI Brain service
Memory/Mood	Not planned	Contact Intelligence service
Embedding Provider	OpenAI text-embedding-3-small	Gemini embedding-001 (free tier)
LLM Provider	gpt-4o-mini / gemini-1.5-flash	Anthropic (primary) + Gemini (fallback)
Database	PostgreSQL + pgvector	Neon PostgreSQL + pgvector
Real-time	Socket.io WebSocket	SSE (simpler, CF Workers compatible)

5. Data Flow Guarantees

At-Least-Once Delivery: Webhooks retry if we fail to ACK. Dedup via external_message_id.
Idempotence: Unique constraint on external_message_id + channelId.
Ordering: Messages within a session processed sequentially (CF Queue concurrency key).
Context Never Blocks: Contact Intelligence context fetch has 30ms SLA. If CI is slow, proceed without context.
Mood Never Blocks: Mood classification is async (ingest returns 202). Response delivery is not delayed.

6. Service Bindings

# Chat Engine wrangler.toml
[[services]]
binding = "BRAIN_SERVICE"
service = "logicspike-brain-service"
 
[[services]]
binding = "CI_SERVICE"
service = "logicspike-contact-intelligence"
 
[[services]]
binding = "COMMS_SERVICE"
service = "logicspike-communication"

The Chat Engine calls Brain and CI via Cloudflare service bindings (zero-latency inter-worker calls in production).