Database Strategy & Schema Architecture · Architecture

Last Updated: 2026-02-28

1. High-Level Strategy: "Building vs Floor"

To scale LogicSpike as a multi-tenant platform, we categorize data into two levels.

🏢 Level 1: The Building (Public / Shared)

Data that exists outside any specific tenant. This is the Platform Infrastructure.

Database Location: public schema (default).
Access Pattern: Highly read-heavy, global integrity.
Examples:
- Who are the users? (users)
- Which tenants exist? (tenants)
- Who owns what? (memberships)
- What services are available? (services - e.g., Blog, Store)

🚪 Level 2: The Floor (Tenant Private)

Data that belongs exclusively to one tenant.

Database Location: Shared Database, but logically isolated via tenant_id (Discriminator Column).
Access Pattern: Strictly scoped. Every query MUST have WHERE tenant_id = ?.
Examples:
- Blog Posts (posts)
- Products (products)
- Customers (customers)
- API Keys (api_keys)
- Media Files (media_files)

3. Deep Dive: Tenancy Model Comparison

We evaluated two primary patterns for LogicSpike.

Option A: Shared Schema (Selected)

All tenants share the same tables. Rows are distinguished by a tenant_id column.

Option B: Schema-per-Tenant

Each tenant has their own Postgres Schema (e.g., tenant_101.posts, tenant_102.posts).

The Comparison Matrix

Feature	Shared Schema (Option A)	Schema-per-Tenant (Option B)
Simplicity	🟢 High. Standard SQL. ORMs like Prism/Drizzle handle it easily.	🔴 Low. Need complex routing logic to switch schemas per request.
Migrations	🟢 Fast. Run `ALTER TABLE` once.	🔴 Slow & Risky. Must run `ALTER TABLE` 10,000 times for 10,000 tenants. If #500 fails, you have a split-brain state.
Connection Pooling	🟢 Efficient. One pool serves all.	🟠 Complex. Apps like PgBouncer struggle with thousands of schema search paths.
Data Isolation	🟠 Logical. Relies on `WHERE tenant_id = ?`. Bug in code = Data Leak.	🟢 Physical. Database engine enforces separation. Harder to leak data.
Performance (Small Scale)	🟢 Fast.	🟢 Fast.
Performance (Massive Scale)	🟠 Index Bloat. Indexes get huge. Needs Partitioning eventually.	🟢 Scalable. Each tenant's index is small.
Cross-Tenant Analytics	🟢 Easy. `SELECT count(*) FROM posts`.	🔴 Hard. Need an ETL pipeline to aggregate data from 10k schemas.
Cost	🟢 Low. Standard instances.	🟠 Medium. Higher RAM usage for schema metadata.
"Noisy Neighbor"	🔴 Risk. One huge tenant can slow down the DB for everyone.	🟠 Risk. Still shares CPU/IO, but indexes are separate.

Why we chose Shared Schema

Operational Sanity: Managing migrations for 1000s of schemas is a DevOps nightmare for a startup team.
Tooling Support: Drizzle ORM and Next.js work flawlessly with Shared Schema. Schema switching often requires hacks or complex middleware.
Analytics: As an eCommerce/Blog platform, you will want to ask "How many posts were created today across the platform?". Shared schema makes this a 1ms query.

Mitigation for Shared Schema Risks

Row Level Security (RLS): We can enable Postgres RLS to enforce tenant_id checks at the database engine level, mimicking the isolation safety of Schema-per-Tenant.
Partitioning: When we hit 100M rows, we can partition the posts table by tenant_id.

4. The Enterprise Exception: Hybrid Tenancy (Handling "Astha")

You asked: "What if a large company (Astha) wants their own separate database?"

We handle this via a Hybrid Pattern.

The Strategy

99% of users stay in the Shared Pool (Public Cloud). 1% of users (Enterprise) get a Dedicated Database (Private Cloud).

Implementation

We modify the tenants table (Level 1) to include a "Router" config.

// Table: tenants
{
  id: "tenant_astha_corp",
  name: "Astha Corp",
  plan: "enterprise",
  // The magic field
  databaseConfig: {
     type: "isolated", 
     connectionUrl: "postgres://user:pass@sensitive-db-cluster.aws.com/astha_db"
  } 
}

// Table: tenants (General User)
{
  id: "tenant_small_shop",
  name: "Small Shop",
  plan: "basic",
  databaseConfig: null // Implies: Use default shared pool
}

Application Logic (The "Router")

Request comes in: astha.logicspike.com
Lookup: Middleware checks tenants table for id: "tenant_astha_corp".
Decision:
- If databaseConfig is present -> Create/Get connection to Dedicated DB.
- If databaseConfig is null -> Use Shared DB Pool.
Execute: Run the query SELECT * FROM posts.

Pros & Cons of Hybrid

✅ High Revenue: We can charge Astha 10x-100x more for this isolation.
✅ Performance: Astha's load doesn't kill the shared pool.
⚠️ Ops Complexity: Schema migrations must now run on the Shared DB AND Astha's DB. We need a strict CI/CD pipeline to ensure they don't drift out of sync.

5. Billing Strategy: "Bundle vs A La Carte"

We support both predefined plans and custom service selection (e.g., "Just the Blog").

The Model

Plans (The Menu): Standard bundles (e.g., "Pro").
Custom Selection: Users can pick individual services.
Tenant Services (Entitlement): The single source of truth for access.

Logic Flow

Standard Plan: User buys "Pro". System copies "Pro" features into tenant_services.
Custom: User selects "Blog" + "Chatbot". System inserts them into tenant_services.
Add-on: User is on "Pro" but adds "Chatbot". System adds "Chatbot" row to tenant_services.

Result: The application ONLY checks tenant_services. It ignores how they got there.

6. Schema Comparison: Current vs Proposed

A. Users Table

Feature	Current Schema (`core-database`)	Proposed Domain Model
Tenant Link	`tenant_id` (Hard link)	REMOVED. Users are global.
Role	`role` (String: "owner")	REMOVED. Roles are per-tenant.
Auth	`email`, `password_hash`	Moved to `user_auth_providers` (already started).

B. Access Control (New)

Entity	Current Status	Proposed Solution
Membership	❌ Does not exist.	NEW TABLE: `memberships(user_id, tenant_id, role_id)`
Roles	❌ Hardcoded strings.	NEW TABLE: `roles(id, tenant_id, permissions)`
Invites	❌ Does not exist.	NEW TABLE: `invitations(email, tenant_id, token)`

C. Tenant Data — ✅ Implemented

Existing tables for api_keys, media_files, services, tenant_services are correctly designed for "Level 2" (Floor). They all have tenant_id FKs.

NOTE

Since this doc was written, the following tables have been fully implemented:

memberships ✅
roles (system + custom per-tenant) ✅
invitations ✅
Full billing schema (11 tables) ✅ — See docs/billing/BILLING_STATUS.md
user_notification_preferences ✅
Communication schema (provider_configs, message_logs, message_events) ✅

See schema_specification.md for the complete, up-to-date schema.

4. Proposed Table categorization

🌍 Building Level (Global)

Tables that define the platform structure.

users (Identity only) ✅
user_auth_providers (Passwords/OAuth) ✅
user_notification_preferences (Email pref toggles) ✅
tenants (Floor metadata) ✅
memberships (Who has keys to which floor) ✅
roles (What keys open which doors — System + Custom) ✅
services (List of available "Appliances" to rent) ✅
invitations (Temporary passes in the Lobby) ✅
plans (Subscription tiers) ✅
plan_features / plan_service_limits (What each plan includes) ✅
subscriptions (Active contracts) ✅
verifications (OTP codes) ✅

🔒 Floor Level (Tenant Specific)

Tables that comprise the user's business data.

posts (Blog Service) ✅
api_keys (Gateway / Integrations) ✅
media_files (Media Service) ✅
tenant_services (Entitlements — which services are enabled + limits) ✅
tenant_coins / coin_transactions / coin_packs (Billing economy) ✅
tenant_addons / addon_catalog (Purchasable extras) ✅
processed_payment_events (Webhook idempotency) ✅

📡 Communication Schema

Isolated in communication Postgres schema.

communication.provider_configs ✅
communication.message_logs ✅
communication.message_events ✅

📝 Not Yet Implemented

audit_logs — Spec exists at audit_log_spec.md