logicspike/docs

Learning

Architecture Decision: Team Management

Problem Statement

We need to implement the Team Management APIs (Invitations, Memberships, Roles). We have two primary architectural choices:

  1. Add these features directly to the existing apps/manager service.
  2. Create a brand new, independent microservice specifically for Team Management (e.g., apps/team-service).

Option 1: Add to apps/manager (The Monolithic / Core Service Approach)

In this approach, Team Management becomes a module within the manager application, similar to how authentication (auth) is currently handled.

Pros

  • Data Locality: Team management heavily relies on users, tenants, and roles. The manager service already deals with these entities extensively. Keeping them together avoids complex cross-service data fetching or synchronization.
  • Reduced Operational Overhead: You don't have to manage, deploy, monitor, and scale an entirely new service. One less service means less infrastructure complexity.
  • Simpler Transactions: If an action requires updating a user, their tenant, and their membership simultaneously, doing it within one service makes database transactions straightforward.
  • Shared Middleware: We can easily reuse the existing authentication and database middleware already set up in apps/manager.
  • Faster Initial Development: Less boilerplate code compared to spinning up a new Hono app, configuring its database bindings, and setting up CI/CD.

Cons

  • Increased Complexity in manager: As the application grows, the manager service might become a "monolith" that is harder to understand and maintain if boundaries aren't strictly enforced internally.
  • Coupled Deployments: A bug in the team management module could potentially bring down the entire manager service, affecting unrelated features like core authentication.
  • Scaling: We must scale the entire manager service even if only the Team Management endpoints are experiencing high traffic.

Option 2: Create a New Microservice (apps/team-service)

In this approach, we create a specialized worker/service that only handles domains related to team management.

Pros

  • Strong Boundaries (Separation of Concerns): The code for team management is completely isolated, making it easier for a specialized team to work on it without stepping on toes.
  • Independent Scalability: If team operations (like sending thousands of invites) become a bottleneck, we can scale this service independently of the core manager service.
  • Fault Isolation: A crash or memory leak in the team service won't directly crash the manager service (core auth and tenant routing can stay up).
  • Technology Flexibility: If, in the future, team operations require a different technology stack, a separate microservice makes that easier.

Cons

  • High Complexity: Distributed systems are hard. We now have to manage inter-service communication.
  • Data Consistency: How does the team service know about users and tenants? It either needs direct database access (which couples the services at the DB layer, an anti-pattern), or it needs to query the manager service via API, introducing latency and potential failure points.
  • Distributed Authorization: The team service needs to verify JWTs or session tokens. It needs access to the same secrets or an introspection endpoint on the manager.
  • More Boilerplate: Requires setting up a whole new Cloudflare Worker, routing, CI/CD pipelines, and internal networking (service bindings).

Conclusion & Recommendation

Recommendation: Option 1 (Add to apps/manager)

Why? The docs/architecture/api_spec_team.md file itself notes:

Service Owner: apps/manager (The Platform Core Service) Why? Team Management is inseparable from Authentication and Tenant Metadata. Splitting it would cause distributed complexity.

Team management is inherently intertwined with identity (users) and tenancy. To authorize almost any action in the platform, you need to know "Is User X a member of Tenant Y with Role Z?". If this logic lives in a separate microservice, every single request to any other service might require a network hop to the team service to check permissions, causing massive latency.

Keeping it in the manager service (which acts as the core gateway/identity provider) ensures these checks remain fast and transactionally safe.

Learning