Last Updated: 2026-05-06 Status: Draft Service:
apps/communication,apps/newsletter-serviceOwner: Vlozi platform
This doc scopes the migration of Vlozi's email sending stack from Resend to AWS SES, and adds per-tenant domain verification so customers can send newsletter campaigns from their own domain (e.g. news@theirbrand.com) instead of the platform's hello@vlozi.app.
The two changes are bundled because they share the same data-model and dispatch-layer rewrite, and a clean cut is cheaper pre-launch than retrofitting later.
1. Why
1.1 The Driver
Vlozi tenants want to send newsletter campaigns from their own verified domain so recipients see a familiar From: address and replies go to the tenant. Today, comms_sender_settings.fromEmail accepts any value but does not actually verify the domain — sends fail silently at the provider, and there is no DKIM/SPF state machine.
1.2 Why Swap Providers Now
Resend on Pro caps verified domains and bills per domain at scale. SES is ~4× cheaper and has effectively no domain ceiling (10k identities default, raisable). For a multi-tenant product where every customer gets a verified domain, the math flips early.
1.3 Why Bundle the Two
Both changes touch the same files: apps/communication/src/index.ts (dispatchSend, sendViaResend, the webhook route) and the comms schema. Doing them sequentially means rewriting dispatchSend twice. Bundling matches our "no production users yet" posture — see project_pre_launch.md.
2. Scope
2.1 In Scope
| # | Deliverable |
|---|---|
| 1 | Replace Resend HTTP calls with AWS SES v2 SendEmail (signed via SigV4 from Workers using aws4fetch). |
| 2 | Replace Resend Svix-signed webhook with SNS-signed webhook receiving SES Configuration Set events. |
| 3 | New tenant_sending_domains table — stores per-tenant domain identities, DKIM records, verification status. |
| 4 | New API routes for tenant domain lifecycle: add, fetch DNS records, check status, delete. |
| 5 | Background re-checker (Durable Object alarm, per feedback_cron_to_do_scheduler.md) that polls GetEmailIdentity until verified or failed. |
| 6 | Update dispatchSend to require a verified tenant domain when sending non-system traffic, and use the tenant's Configuration Set for event scoping. |
| 7 | Update apps/seller-dashboard — domain settings page (DNS-record copy panel + status badge). |
| 8 | Cutover plan with parallel-run window and rollback path. |
2.2 Out of Scope
| # | Deferred |
|---|---|
| 1 | Multi-region SES. Single region (ap-south-1) for v1. |
| 2 | Dedicated IPs / IP pools. Adds $24.95/mo per IP. Revisit when one tenant >100k/mo. |
| 3 | Bring-Your-Own-AWS (tenant-managed SES via AssumeRole). Enterprise-tier feature. |
| 4 | SMS provider swap. Twilio path stays as-is. |
| 5 | Inbound email (reply-handling). Out of scope until contact-intelligence ships. |
| 6 | Per-tenant suppression lists. Account-level suppression + existing newsletter bounce kill-switch is sufficient at launch. |
2.3 Non-Goals
- Backwards-compatibility shims for Resend. Pre-launch — see
project_pre_launch.md. Theprovider: "resend"literal inmessageLogs.providerflips to"ses"and stale rows get deleted. - A pluggable multi-provider abstraction. We build a thin
EmailProviderinterface so the code stays testable, but we ship with one concrete implementation. Universal-adaptor ambitions inadr-providers.mdremain aspirational.
3. Architecture
3.1 Current State
3.2 Target State
3.3 Verification Flow (Sequence)
4. Data Model Changes
4.1 New Table: comms_tenant_sending_domains
export const tenantSendingDomains = pgTable(
"comms_tenant_sending_domains",
{
id: text("id").primaryKey(), // dom_<nanoid>
tenantId: text("tenant_id").notNull(),
domain: text("domain").notNull(), // "news.brand.com"
// SES identity
sesIdentityArn: text("ses_identity_arn"), // arn:aws:ses:...
sesRegion: text("ses_region").notNull(), // "ap-south-1"
configurationSetName: text("configuration_set_name").notNull(),
// Verification state
verificationStatus: text("verification_status")
.notNull()
.default("pending"), // pending | verified | failed | temporary_failure
dkimTokens: jsonb("dkim_tokens"), // [{ name, value, status }]
dkimStatus: text("dkim_status"), // SUCCESS | FAILED | PENDING | NOT_STARTED
// Lifecycle
verifiedAt: timestamp("verified_at"),
lastCheckedAt: timestamp("last_checked_at"),
failureReason: text("failure_reason"),
createdAt: timestamp("created_at").defaultNow().notNull(),
updatedAt: timestamp("updated_at").defaultNow().notNull(),
},
(t) => ({
tenantIdx: index("comms_sending_domains_tenant_idx").on(t.tenantId),
domainUnique: uniqueIndex("comms_sending_domains_domain_unique").on(t.domain),
})
)IMPORTANT
domain is globally unique, not unique per tenant. SES rejects duplicate identity creation across the AWS account, so we must enforce uniqueness at our layer to give a clean error before hitting SES.
4.2 Modified Table: comms_sender_settings
Repurpose the existing table. The fromEmail column stays, but its semantics change: it is now the local-part + domain of an address that must belong to a verified row in tenant_sending_domains.
Add a foreign-key column for explicit binding:
sendingDomainId: text("sending_domain_id"), // FK → tenant_sending_domains.id4.3 Modified Table: comms_message_logs & comms_message_events
providercolumn accepts"ses"(was"resend").messageEvents.sourceaccepts"ses".
No schema change — just enum widening. Old "resend" rows get cleaned up at cutover (pre-launch, no historical preservation requirement).
4.4 Migration Strategy
Single Drizzle migration adds the new table + the FK column. No data backfill — there are no production tenants. Existing senderSettings.fromEmail values are wiped by the migration's UPDATE ... SET from_email = NULL step so dev tenants are forced through the new verification flow.
5. API Surface
5.1 New Routes (on apps/communication)
POST /v1/sending-domains
Add a new domain for the authenticated tenant. Calls SES CreateEmailIdentity, persists DKIM tokens, returns DNS records.
Request:
{ "domain": "news.brand.com" }Response (201):
{
"id": "dom_a1b2c3",
"domain": "news.brand.com",
"status": "pending",
"dnsRecords": [
{ "type": "CNAME", "name": "abc._domainkey.news.brand.com", "value": "abc.dkim.amazonses.com" },
{ "type": "CNAME", "name": "def._domainkey.news.brand.com", "value": "def.dkim.amazonses.com" },
{ "type": "CNAME", "name": "ghi._domainkey.news.brand.com", "value": "ghi.dkim.amazonses.com" }
]
}GET /v1/sending-domains
List the tenant's domains with current status.
GET /v1/sending-domains/:id
Fetch a single domain (used by the dashboard to poll status).
POST /v1/sending-domains/:id/check
Force an immediate GetEmailIdentity poll (manual "Check now" button in the UI). Rate-limited to 1/min per domain.
DELETE /v1/sending-domains/:id
Calls DeleteEmailIdentity on SES, removes the row. Rejects if the domain is referenced by any non-archived campaign.
5.2 New Webhook
POST /v1/webhooks/ses
Receives SNS-signed event notifications from the per-tenant Configuration Set's SNS topic. Replaces /v1/webhooks/resend.
SNS message types handled:
Type |
Action |
|---|---|
SubscriptionConfirmation |
Auto-confirm by GET-ing the SubscribeURL (one-time per topic). |
Notification |
Parse Message field as SES event JSON, route to the same internal/event + internal/bounce fan-out. |
5.3 Modified: POST /v1/send and /internal/send
Sender-resolution logic at dispatchSend gains a verification gate:
// pseudo
if (tenantId !== "system") {
const fromAddress = body.from ?? settings.fromEmail
const domain = parseDomain(fromAddress)
const sendingDomain = await lookupVerifiedDomain(tenantId, domain)
if (!sendingDomain) {
return { ok: false, error: "DOMAIN_NOT_VERIFIED" }
}
// pass sendingDomain.configurationSetName to SES SendEmail
}System tenant (OTPs, platform mail) keeps using the platform-owned vlozi.app identity. No verification check.
5.4 Removed: POST /v1/webhooks/resend
Deleted. All Resend bindings removed from wrangler.toml.
6. SES Provider Implementation
6.1 SDK Choice — aws4fetch
The official aws-sdk doesn't run on Cloudflare Workers (Node-only deps). Use aws4fetch — a tiny SigV4 signer (~2KB, pure browser/edge-compatible).
import { AwsClient } from "aws4fetch"
const aws = new AwsClient({
accessKeyId: env.AWS_ACCESS_KEY_ID,
secretAccessKey: env.AWS_SECRET_ACCESS_KEY,
region: env.AWS_REGION,
service: "ses",
})
const res = await aws.fetch(
`https://email.${env.AWS_REGION}.amazonaws.com/v2/email/outbound-emails`,
{
method: "POST",
body: JSON.stringify({
FromEmailAddress: opts.from,
Destination: { ToAddresses: [opts.to] },
Content: { Simple: { Subject: { Data: opts.subject }, Body: { Html: { Data: opts.html }, Text: { Data: opts.text } } } },
ConfigurationSetName: opts.configurationSetName,
ReplyToAddresses: opts.replyTo ? [opts.replyTo] : undefined,
}),
}
)6.2 Module Layout
apps/communication/src/providers/
email-provider.ts # interface { send, verifyDomain, getDomainStatus }
ses-provider.ts # concrete impl using aws4fetch
sns-verify.ts # SNS message signature verification
index.ts # factory + cached singleton6.3 SNS Signature Verification
SNS signs each notification with X.509 RSA-SHA256. The signing cert URL is in SigningCertURL on the message. Verification steps:
- Validate
SigningCertURLmatches^https://sns\.[a-z0-9-]+\.amazonaws\.com/SimpleNotificationService-[a-f0-9]+\.pem$. Reject anything else (defends against forged-cert attacks). - Fetch the cert (cache by URL — these are stable per topic).
- Build the canonical string from message fields in spec order.
crypto.subtle.verify("RSASSA-PKCS1-v1_5", publicKey, signature, message).
WARNING
The cert URL validation is load-bearing. Without it, an attacker can publish their own cert at any URL, sign a fake notification, and bypass verification. Use a strict regex, not a .includes("amazonaws.com") check.
6.4 Configuration Set Strategy
- One Configuration Set per tenant. Named
vlozi-tenant-{tenantId}. - Each config set has a single SNS event destination → one shared SNS topic per environment (prod / staging).
- Why per-tenant: lets us add per-tenant bounce-rate alarms and suppression policies later without re-tagging messages. Cost is zero.
- Created lazily in
createSendingDomainthe first time a tenant adds a domain.
6.5 IAM
Single platform-level IAM user with this policy. Credentials stored in Cloudflare Secrets as AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY.
{
"Version": "2012-10-17",
"Statement": [
{ "Effect": "Allow", "Action": ["ses:SendEmail", "ses:SendRawEmail"], "Resource": "*" },
{ "Effect": "Allow", "Action": ["ses:CreateEmailIdentity", "ses:GetEmailIdentity", "ses:DeleteEmailIdentity", "ses:ListEmailIdentities", "ses:PutEmailIdentityConfigurationSetAttributes"], "Resource": "*" },
{ "Effect": "Allow", "Action": ["ses:CreateConfigurationSet", "ses:GetConfigurationSet", "ses:CreateConfigurationSetEventDestination"], "Resource": "*" }
]
}7. Verifier Durable Object
7.1 Why a DO and Not a Cron
Per feedback_cron_to_do_scheduler.md, per-entity polling on a global cron is wasteful. A DomainVerifier DO (one per domain) wakes on alarm, polls SES once, reschedules with backoff, and self-deletes when terminal.
7.2 Lifecycle
class DomainVerifier implements DurableObject {
async alarm() {
const status = await ses.getEmailIdentity(this.domain)
await this.db.update(tenantSendingDomains)
.set({ verificationStatus: status, lastCheckedAt: new Date() })
.where(eq(tenantSendingDomains.id, this.domainId))
if (status === "SUCCESS" || status === "FAILED") return // terminal
if (this.attempts > MAX_ATTEMPTS) {
await this.markFailed("verification_timeout")
return
}
await this.storage.setAlarm(Date.now() + this.nextBackoff())
}
}7.3 Backoff Schedule
| Attempt | Delay |
|---|---|
| 1 | +1 min |
| 2 | +5 min |
| 3 | +15 min |
| 4 | +1 hr |
| 5 | +6 hr |
| 6 | +24 hr |
| 7 | +24 hr (final) → mark temporary_failure |
Total window: ~3 days. Most domains verify on attempt 1 or 2.
8. Migration Phases
8.1 Phase 1 — Provider Swap (Behind a Flag) — 3 days
Goal: SES path runs end-to-end against vlozi.app (the only verified domain), Resend path remains as fallback.
| # | Task |
|---|---|
| 1 | Add aws4fetch dependency. |
| 2 | Verify vlozi.app in SES (manual Console step). Submit production-access ticket the same day — has a 24h SLA. |
| 3 | Implement SesProvider.send(). |
| 4 | Implement /v1/webhooks/ses + SNS signature verification. |
| 5 | Add EMAIL_PROVIDER env var (resend | ses). Switch in dispatchSend. |
| 6 | Add per-environment SNS topic + Configuration Set vlozi-platform. |
| 7 | Update apps/communication/test — port send-resend.test.ts to send-ses.test.ts with aws4fetch mocked. |
| 8 | Deploy with EMAIL_PROVIDER=ses in staging. |
Exit criteria: Newsletter test campaign delivers via SES; bounce/open events arrive at /v1/webhooks/ses; events fan out to newsletter /internal/event.
8.2 Phase 2 — Domain Data Model + APIs — 4 days
Goal: Tenant can add a domain through the API and see DKIM records, but dispatchSend still ignores the table.
| # | Task |
|---|---|
| 1 | Drizzle migration: comms_tenant_sending_domains + FK on comms_sender_settings. |
| 2 | Implement the 5 routes in §5.1. |
| 3 | Implement DomainVerifier DO + alarm handler. |
| 4 | Wire DO scheduling into POST /v1/sending-domains. |
| 5 | Add tests: create → poll → verified, create → poll → failed, duplicate domain rejection, delete with active campaign rejection. |
Exit criteria: Adding a domain via API returns DKIM records; after publishing the records to a test DNS, verification flips to verified within 5 min.
8.3 Phase 3 — Dispatch Gate + Dashboard UI — 4 days
Goal: Tenants must use a verified domain to send; UI shipped.
| # | Task |
|---|---|
| 1 | Modify dispatchSend to enforce verified-domain gate (skipped for system tenant). |
| 2 | Modify dispatchSend to pass ConfigurationSetName per tenant. |
| 3 | Build /dashboard/settings/sending-domains page. Must follow the editorial style per feedback_dashboard_editorial_style.md — sharp/monochrome/rounded-none, reference BlogOverview.tsx. |
| 4 | DNS-record copy panel: 3 CNAME rows with one-click copy + status pill (pending/verified/failed). |
| 5 | "Check now" button → POST /v1/sending-domains/:id/check. |
| 6 | Delete confirmation modal. |
| 7 | Block "Send Campaign" in newsletter UI when no verified domain exists; route to settings page with explainer banner. |
Exit criteria: A tenant can sign up, add a domain, publish DNS, see verification flip in the UI within 5 min, send a campaign from news@theirdomain.com, and recipients see correct DKIM signing.
8.4 Phase 4 — Cutover & Cleanup — 1 day
| # | Task |
|---|---|
| 1 | Flip EMAIL_PROVIDER=ses in production. |
| 2 | Delete sendViaResend, verifySvixSignature, /v1/webhooks/resend route. |
| 3 | Remove RESEND_API_KEY and RESEND_WEBHOOK_SECRET from all wrangler.toml and Cloudflare Secrets. |
| 4 | Remove resend literal from messageLogs.provider docstring. |
| 5 | Drop the EMAIL_PROVIDER flag — there's only one provider now. |
| 6 | Update scope-definition.md, api-spec.md, and the schema doc to reflect SES-only state. |
Exit criteria: grep -ri resend apps/communication apps/newsletter-service returns zero hits outside changelogs.
9. Cutover & Rollback
9.1 Cutover Window
Pre-launch — there are no live customers. Cutover happens during business hours; any breakage costs internal dev time only.
9.2 Rollback (Phase 1 only)
Phase 1 keeps Resend code intact. Flipping EMAIL_PROVIDER=resend reverts to Resend within one Worker deploy (~30s). After Phase 4 the rollback path is gone — re-introducing Resend means a code revert.
9.3 Post-launch Failure Modes
| Failure | Detection | Response |
|---|---|---|
| SES rate limit (sending pause) | SES sends Reject event → status failed in messageLogs |
Account-level alarm on bounce rate (>5%); auto-disable sending. |
| SNS topic unsubscribed | Webhook stops receiving events; engagement counters stale | Daily dashboard widget on event-arrival lag; manual re-subscribe. |
| Tenant DNS change breaks DKIM | SES GetEmailIdentity flips to FAILED |
DO re-verifies on its next scheduled poll; UI badge flips to "DNS error". Auto re-trigger DO if a campaign tries to send and finds verified more than 24h old. |
| AWS credential leak | External (rotation playbook) | Rotate via IAM; deploy new secret. Per feedback_never_paste_secrets.md, don't paste rotated keys in chat. |
10. Tradeoffs & Risks
10.1 Shared Sending Reputation
All tenants share the platform's SES sending reputation. One spammy customer's bounces hurt everyone's deliverability.
Mitigations (in order of cost):
- Per-tenant Configuration Sets with bounce/complaint thresholds → CloudWatch alarm → auto-disable tenant's
Configuration Set. Free; ship at v1. - Account-level suppression list + existing newsletter bounce kill-switch. Free; already wired.
- Dedicated IPs ($24.95/mo each). Defer until one tenant >100k/mo.
- Cross-account isolation for enterprise — deferred (see §2.3).
10.2 SES Sandbox Mode
New SES accounts are sandboxed: only verified recipients, 200/day, 1/sec. Production access requires a support ticket (~24h). File the ticket on day 1 of Phase 1.
10.3 Workers ↔ AWS Latency
SES SendEmail from a Worker in Asia → SES ap-south-1 is ~30ms. From a Worker in Frankfurt → ap-south-1 is ~150ms. With ~10k newsletter emails per campaign serialized through one queue consumer, that adds up. Newsletter already uses Cloudflare Queue with batching — verify queue concurrency settles at a reasonable parallelism (≥10).
10.4 SNS Webhook Replay Attacks
SNS doesn't include a timestamp window in its signature. An attacker who captures one signed notification can replay it. Mitigation: dedupe on messageEvents.id (use SNS MessageId as the primary key). The existing messageEvents insert already keys on crypto.randomUUID() — switch to SNS MessageId for SES events.
10.5 Vendor Lock-In
SES API is non-portable. Migrating off SES later means rewriting dispatchSend again. Acceptable cost — SES is the cheapest credible option, and the EmailProvider interface keeps the swap cost bounded to one file.
11. Open Questions
| # | Question | Owner |
|---|---|---|
| 1 | Region pinning: ap-south-1 (cheapest, India-aligned) or us-east-1 (largest pool, lower latency for global recipients)? Decision impacts deliverability outside India. |
Founder |
| 2 | Do we accept news@vlozi.app subdomain delegation as an interim option for tenants who don't own a domain? Lowers onboarding friction; reuses platform reputation. |
Product |
| 3 | Should DELETE /v1/sending-domains/:id cascade-delete or hard-block when bound to a sender_settings row? Lean: hard-block + ask user to update settings first. |
Engineering |
| 4 | Per-tenant bounce-rate threshold for auto-disable — start at 5% or 10%? AWS pauses the whole account at 10%. | Engineering |
| 5 | Do we expose a "send test" button in the UI that sends to the tenant's own login email before allowing campaign sends? Catches DKIM-but-DMARC-misaligned cases. | Product |
12. References
- Current implementation:
apps/communication/src/index.ts(dispatchSend,sendViaResend,/v1/webhooks/resend) - Current schema:
apps/communication/src/db/schema.ts - Newsletter dispatch caller:
apps/newsletter-service/src/lib/campaign-send.ts(processCampaignSendJob) - Provider abstraction context:
adr-providers.md - SES v2 API: https://docs.aws.amazon.com/ses/latest/APIReference-V2/
- aws4fetch: https://github.com/mhart/aws4fetch
- SNS message signing: https://docs.aws.amazon.com/sns/latest/dg/sns-verify-signature-of-message.html