Last Updated: 2026-04-07 Status: Active (v2 — updated for AI Brain + Contact Intelligence integration)
This document defines the High-Level Design (HLD) and Low-Level Design (LLD) for the LogicSpike Chat Engine, an omnichannel AI chatbot platform.
1. High-Level Design (HLD)
The Chat Engine is the thin orchestration layer that connects customers (via WhatsApp, Telegram, Website Widget) to the AI pipeline. It does NOT own LLM calls, memory, or knowledge — those are delegated to AI Brain and Contact Intelligence.
1.1 Architecture Principle: Thin Orchestrator
The Chat Engine's job is narrow:
- Receive messages from channels (webhooks, widget SSE)
- Enrich with contact context (call Contact Intelligence)
- Route to AI Brain for LLM response generation
- Deliver the response back to the channel
- Ingest the interaction into Contact Intelligence for memory/mood tracking
Everything else — LLM orchestration, memory, knowledge RAG, mood classification, entity extraction, outreach — is handled by other services.
1.2 Core Architectural Components
| Component | Responsibility | Technology |
|---|---|---|
| API Gateway | Entry point, rate limiting, JWT validation | Cloudflare Gateway |
| Webhook Ingress | Accept WhatsApp/Telegram webhooks, verify signatures, enqueue | Hono on CF Workers |
| Widget API | SSE streaming for website widget | Hono on CF Workers |
| Chat Orchestrator | Fetch context → call Brain → deliver response | CF Workers |
| Contact Intelligence | Per-contact memory, mood, entities, outreach | Separate service (port 8794) |
| AI Brain | LLM orchestration, knowledge RAG, tools | Separate service (port 8793) |
| Communication Service | Email/SMS delivery (SMTP, Twilio) | Separate service (port 8790) |
| Unified Inbox | Real-time WebSocket for dashboard agents | CF Durable Objects / SSE |
1.3 System Diagram
2. Message Flow (Critical Path)
2.1 Inbound Message Processing
Customer sends "How much do your cakes cost?"
│
├─ 1. Webhook/Widget receives message
│ └─ Verify signature (WhatsApp HMAC / widget token)
│ └─ ACK 200 immediately (< 100ms)
│
├─ 2. Resolve or create contact
│ └─ POST /contacts { external_id, channel, display_name }
│ └─ Contact Intelligence returns contact_id
│
├─ 3. Fetch contact context (parallel)
│ └─ GET /context/{contact_id}?query="How much do cakes cost"
│ └─ Returns: mood, memories, entities, context_text, adaptation hints
│ └─ Target: < 30ms
│
├─ 4. Call AI Brain for response
│ └─ POST /brain/chat { message, context_text, personality }
│ └─ Brain handles: routing → specialist → tools → RAG → streaming
│ └─ Returns: SSE stream of tokens
│
├─ 5. Deliver response to customer
│ └─ Widget: stream SSE tokens directly
│ └─ WhatsApp/Telegram: buffer full response, send via API
│
├─ 6. Ingest interaction into Contact Intelligence
│ └─ POST /ingest { contact_id, message, role: "user" }
│ └─ POST /ingest { contact_id, message, role: "assistant" }
│ └─ CI handles: mood classification, memory extraction, entity extraction
│ └─ Returns 202 (async processing)
│
└─ 7. Persist to DB + broadcast to dashboard
└─ INSERT into chat_messages
└─ WebSocket/SSE event to Unified Inbox2.2 What Chat Engine Does NOT Do
| Concern | Handled By | NOT by Chat Engine |
|---|---|---|
| LLM model selection | AI Brain (registry.ts) | |
| Knowledge base RAG | AI Brain (knowledge/ingest.ts) | |
| Memory extraction | Contact Intelligence (ingest) | |
| Mood classification | Contact Intelligence (mood.ts) | |
| Entity graph | Contact Intelligence (entities.ts) | |
| Proactive outreach | Contact Intelligence (outreach-scanner) | |
| Tool execution (blog, content) | AI Brain (tools/) | |
| Email/SMS delivery | Communication Service | |
| Cost governance | AI Brain (cost-governor) |
3. Low-Level Design
3.1 Session Management
Sessions are lightweight — the heavy context lives in Contact Intelligence.
interface ChatSession {
id: string
tenantId: string
contactId: string // → Contact Intelligence contact
channelId: string // → ChannelIntegration
status: "ai_handled" | "human_escalated" | "closed"
assignedAgentId?: string
summary?: string // AI-generated on close
slaBreachAt?: Date
messageCount: number
createdAt: Date
lastMessageAt: Date
}Session cache: CF KV with 24h TTL (replaces Redis from v1 docs).
- Key:
session:{tenantId}:{contactId}:active - Value: session ID + last 20 message IDs
3.2 Webhook Ingress (WhatsApp)
POST /webhooks/whatsapp/{tenantId}
├─ Verify HMAC-SHA256 signature (< 1ms)
├─ Parse webhook payload → normalize to internal format
├─ Dedup check: external_message_id unique constraint
├─ Find or create ChatSession
├─ Resolve ContactIntelligence contact
├─ Run orchestration pipeline (steps 2-7 above)
└─ Return 200 OKIdempotency: external_message_id + channelId unique constraint prevents duplicate processing on webhook retries.
3.3 Widget API (SSE Streaming)
POST /chat/widget/message
├─ Validate widget token against ChannelIntegration
├─ CORS check: origin must match registered domain
├─ Run orchestration pipeline
├─ Stream AI Brain response as SSE events
│ event: token → { text: "Our" }
│ event: token → { text: " cakes" }
│ event: done → { session_id, message_id }
└─ Return SSE stream3.4 Human Escalation & Handoff
AI Brain returns [[HANDOFF]] in response
├─ Chat Engine intercepts (does NOT send to customer)
├─ Send fallback message: "Connecting you to a team member..."
├─ Update session: status = "human_escalated"
├─ Set sla_breach_at = NOW() + 15 minutes
├─ Fire WebSocket event: chat:escalated
└─ Agent sees it in Unified InboxSLA Breach handling:
- Cron sweeps every 1 minute for unassigned escalated sessions past SLA
- Sends fallback: "Our team is offline. We'll email you back!"
- Fires urgent alert via Communication Service
3.5 Contact Intelligence Integration Points
| Chat Engine Event | CI API Call | Purpose |
|---|---|---|
| New message from customer | POST /contacts |
Resolve/create contact |
| Before AI response | GET /context/{id}?query=... |
Fetch memories, mood, adaptation hints |
| After AI response | POST /ingest (user msg) |
Mood classification, memory extraction |
| After AI response | POST /ingest (assistant msg) |
Track assistant messages |
| Session closed | (none — CI tracks via inactivity) | |
| Outreach trigger fires | CI calls Chat Engine delivery | Send proactive message to channel |
3.6 Embeddable Widget Architecture
- Bootstrapper script (~3KB) → sandboxed
<iframe>onwidget.logicspike.com postMessageAPI for host ↔ iframe communication- SSE connection for streaming AI responses
- Auto-reconnect on network drop
- Widget token validated against
ChannelIntegration.identifier(allowed domain)
4. Technology Choices (Updated)
| Concern | v1 Docs (March 2026) | v2 Decision (April 2026) |
|---|---|---|
| Runtime | Node.js (Express/Hono) | Hono on Cloudflare Workers |
| Queue | Redis Streams / BullMQ | CF Queues (or direct invocation) |
| Session Cache | Redis | CF KV (24h TTL) |
| LLM Orchestration | In-service (own RAG pipeline) | Delegated to AI Brain service |
| Memory/Mood | Not planned | Contact Intelligence service |
| Embedding Provider | OpenAI text-embedding-3-small | Gemini embedding-001 (free tier) |
| LLM Provider | gpt-4o-mini / gemini-1.5-flash | Anthropic (primary) + Gemini (fallback) |
| Database | PostgreSQL + pgvector | Neon PostgreSQL + pgvector |
| Real-time | Socket.io WebSocket | SSE (simpler, CF Workers compatible) |
5. Data Flow Guarantees
- At-Least-Once Delivery: Webhooks retry if we fail to ACK. Dedup via
external_message_id. - Idempotence: Unique constraint on
external_message_id+channelId. - Ordering: Messages within a session processed sequentially (CF Queue concurrency key).
- Context Never Blocks: Contact Intelligence context fetch has 30ms SLA. If CI is slow, proceed without context.
- Mood Never Blocks: Mood classification is async (ingest returns 202). Response delivery is not delayed.
6. Service Bindings
# Chat Engine wrangler.toml
[[services]]
binding = "BRAIN_SERVICE"
service = "logicspike-brain-service"
[[services]]
binding = "CI_SERVICE"
service = "logicspike-contact-intelligence"
[[services]]
binding = "COMMS_SERVICE"
service = "logicspike-communication"The Chat Engine calls Brain and CI via Cloudflare service bindings (zero-latency inter-worker calls in production).