logicspike/docs

Chat Engine

System Architecture: HLD & LLD

Last Updated: 2026-04-07 Status: Active (v2 — updated for AI Brain + Contact Intelligence integration)

This document defines the High-Level Design (HLD) and Low-Level Design (LLD) for the LogicSpike Chat Engine, an omnichannel AI chatbot platform.


1. High-Level Design (HLD)

The Chat Engine is the thin orchestration layer that connects customers (via WhatsApp, Telegram, Website Widget) to the AI pipeline. It does NOT own LLM calls, memory, or knowledge — those are delegated to AI Brain and Contact Intelligence.

1.1 Architecture Principle: Thin Orchestrator

The Chat Engine's job is narrow:

  1. Receive messages from channels (webhooks, widget SSE)
  2. Enrich with contact context (call Contact Intelligence)
  3. Route to AI Brain for LLM response generation
  4. Deliver the response back to the channel
  5. Ingest the interaction into Contact Intelligence for memory/mood tracking

Everything else — LLM orchestration, memory, knowledge RAG, mood classification, entity extraction, outreach — is handled by other services.

1.2 Core Architectural Components

Component Responsibility Technology
API Gateway Entry point, rate limiting, JWT validation Cloudflare Gateway
Webhook Ingress Accept WhatsApp/Telegram webhooks, verify signatures, enqueue Hono on CF Workers
Widget API SSE streaming for website widget Hono on CF Workers
Chat Orchestrator Fetch context → call Brain → deliver response CF Workers
Contact Intelligence Per-contact memory, mood, entities, outreach Separate service (port 8794)
AI Brain LLM orchestration, knowledge RAG, tools Separate service (port 8793)
Communication Service Email/SMS delivery (SMTP, Twilio) Separate service (port 8790)
Unified Inbox Real-time WebSocket for dashboard agents CF Durable Objects / SSE

1.3 System Diagram


2. Message Flow (Critical Path)

2.1 Inbound Message Processing

Customer sends "How much do your cakes cost?"

  ├─ 1. Webhook/Widget receives message
  │     └─ Verify signature (WhatsApp HMAC / widget token)
  │     └─ ACK 200 immediately (< 100ms)

  ├─ 2. Resolve or create contact
  │     └─ POST /contacts { external_id, channel, display_name }
  │     └─ Contact Intelligence returns contact_id

  ├─ 3. Fetch contact context (parallel)
  │     └─ GET /context/{contact_id}?query="How much do cakes cost"
  │     └─ Returns: mood, memories, entities, context_text, adaptation hints
  │     └─ Target: < 30ms

  ├─ 4. Call AI Brain for response
  │     └─ POST /brain/chat { message, context_text, personality }
  │     └─ Brain handles: routing → specialist → tools → RAG → streaming
  │     └─ Returns: SSE stream of tokens

  ├─ 5. Deliver response to customer
  │     └─ Widget: stream SSE tokens directly
  │     └─ WhatsApp/Telegram: buffer full response, send via API

  ├─ 6. Ingest interaction into Contact Intelligence
  │     └─ POST /ingest { contact_id, message, role: "user" }
  │     └─ POST /ingest { contact_id, message, role: "assistant" }
  │     └─ CI handles: mood classification, memory extraction, entity extraction
  │     └─ Returns 202 (async processing)

  └─ 7. Persist to DB + broadcast to dashboard
        └─ INSERT into chat_messages
        └─ WebSocket/SSE event to Unified Inbox

2.2 What Chat Engine Does NOT Do

Concern Handled By NOT by Chat Engine
LLM model selection AI Brain (registry.ts)
Knowledge base RAG AI Brain (knowledge/ingest.ts)
Memory extraction Contact Intelligence (ingest)
Mood classification Contact Intelligence (mood.ts)
Entity graph Contact Intelligence (entities.ts)
Proactive outreach Contact Intelligence (outreach-scanner)
Tool execution (blog, content) AI Brain (tools/)
Email/SMS delivery Communication Service
Cost governance AI Brain (cost-governor)

3. Low-Level Design

3.1 Session Management

Sessions are lightweight — the heavy context lives in Contact Intelligence.

interface ChatSession {
  id: string
  tenantId: string
  contactId: string          // → Contact Intelligence contact
  channelId: string          // → ChannelIntegration
  status: "ai_handled" | "human_escalated" | "closed"
  assignedAgentId?: string
  summary?: string           // AI-generated on close
  slaBreachAt?: Date
  messageCount: number
  createdAt: Date
  lastMessageAt: Date
}

Session cache: CF KV with 24h TTL (replaces Redis from v1 docs).

  • Key: session:{tenantId}:{contactId}:active
  • Value: session ID + last 20 message IDs

3.2 Webhook Ingress (WhatsApp)

POST /webhooks/whatsapp/{tenantId}
  ├─ Verify HMAC-SHA256 signature (< 1ms)
  ├─ Parse webhook payload → normalize to internal format
  ├─ Dedup check: external_message_id unique constraint
  ├─ Find or create ChatSession
  ├─ Resolve ContactIntelligence contact
  ├─ Run orchestration pipeline (steps 2-7 above)
  └─ Return 200 OK

Idempotency: external_message_id + channelId unique constraint prevents duplicate processing on webhook retries.

3.3 Widget API (SSE Streaming)

POST /chat/widget/message
  ├─ Validate widget token against ChannelIntegration
  ├─ CORS check: origin must match registered domain
  ├─ Run orchestration pipeline
  ├─ Stream AI Brain response as SSE events
  │     event: token → { text: "Our" }
  │     event: token → { text: " cakes" }
  │     event: done  → { session_id, message_id }
  └─ Return SSE stream

3.4 Human Escalation & Handoff

AI Brain returns [[HANDOFF]] in response
  ├─ Chat Engine intercepts (does NOT send to customer)
  ├─ Send fallback message: "Connecting you to a team member..."
  ├─ Update session: status = "human_escalated"
  ├─ Set sla_breach_at = NOW() + 15 minutes
  ├─ Fire WebSocket event: chat:escalated
  └─ Agent sees it in Unified Inbox

SLA Breach handling:

  • Cron sweeps every 1 minute for unassigned escalated sessions past SLA
  • Sends fallback: "Our team is offline. We'll email you back!"
  • Fires urgent alert via Communication Service

3.5 Contact Intelligence Integration Points

Chat Engine Event CI API Call Purpose
New message from customer POST /contacts Resolve/create contact
Before AI response GET /context/{id}?query=... Fetch memories, mood, adaptation hints
After AI response POST /ingest (user msg) Mood classification, memory extraction
After AI response POST /ingest (assistant msg) Track assistant messages
Session closed (none — CI tracks via inactivity)
Outreach trigger fires CI calls Chat Engine delivery Send proactive message to channel

3.6 Embeddable Widget Architecture

  • Bootstrapper script (~3KB) → sandboxed <iframe> on widget.logicspike.com
  • postMessage API for host ↔ iframe communication
  • SSE connection for streaming AI responses
  • Auto-reconnect on network drop
  • Widget token validated against ChannelIntegration.identifier (allowed domain)

4. Technology Choices (Updated)

Concern v1 Docs (March 2026) v2 Decision (April 2026)
Runtime Node.js (Express/Hono) Hono on Cloudflare Workers
Queue Redis Streams / BullMQ CF Queues (or direct invocation)
Session Cache Redis CF KV (24h TTL)
LLM Orchestration In-service (own RAG pipeline) Delegated to AI Brain service
Memory/Mood Not planned Contact Intelligence service
Embedding Provider OpenAI text-embedding-3-small Gemini embedding-001 (free tier)
LLM Provider gpt-4o-mini / gemini-1.5-flash Anthropic (primary) + Gemini (fallback)
Database PostgreSQL + pgvector Neon PostgreSQL + pgvector
Real-time Socket.io WebSocket SSE (simpler, CF Workers compatible)

5. Data Flow Guarantees

  1. At-Least-Once Delivery: Webhooks retry if we fail to ACK. Dedup via external_message_id.
  2. Idempotence: Unique constraint on external_message_id + channelId.
  3. Ordering: Messages within a session processed sequentially (CF Queue concurrency key).
  4. Context Never Blocks: Contact Intelligence context fetch has 30ms SLA. If CI is slow, proceed without context.
  5. Mood Never Blocks: Mood classification is async (ingest returns 202). Response delivery is not delayed.

6. Service Bindings

# Chat Engine wrangler.toml
[[services]]
binding = "BRAIN_SERVICE"
service = "logicspike-brain-service"
 
[[services]]
binding = "CI_SERVICE"
service = "logicspike-contact-intelligence"
 
[[services]]
binding = "COMMS_SERVICE"
service = "logicspike-communication"

The Chat Engine calls Brain and CI via Cloudflare service bindings (zero-latency inter-worker calls in production).

Chat Engine