Scaling

Single brain replica

For most deployments one brain replica is plenty. The bottleneck is:

Postgres — plenty of room; sizing in the self-hosting guide.
WebSocket connections — one socket per agent, ~10 KiB RAM steady-state. 1000 agents is roughly 10 MiB.
Event persistence — batched into Postgres; the brain handles thousands of events per second on modest hardware.

Horizontal scaling

Multiple brain replicas are supported on Postgres. The brain selects its registry and dashboard-fan-out backend from Z4J_REGISTRY_BACKEND:

postgres_notify (the default on Postgres) — agent commands and dashboard updates fan out across replicas via Postgres LISTEN/NOTIFY. Each agent’s WebSocket lives on whichever replica it happened to connect to; commands minted on any replica route to the right one through the registry. Dashboard subscribers connected to one replica still see events captured by another.
local (forced on SQLite, since SQLite has no LISTEN/NOTIFY) — single-process only.

What you still need to provide yourself:

Sticky session routing on /ws — each agent’s WebSocket must pin to one brain pod. Configure your load balancer’s session affinity (e.g. nginx-ingress’s nginx.ingress.kubernetes.io/affinity: cookie, an ALB target-group’s stickiness, or your service-mesh equivalent).
TLS termination in front of the brain. The brain itself speaks plaintext WebSocket on its bind port; production deployments put a reverse proxy in front.

The dashboard fan-out is over WebSocket (/ws/dashboard), not SSE; cross-replica delivery is the postgres_notify DashboardHub.

Scaling Postgres

Read replicas help dashboards but not the hot event-persist path.
Native partitioning on events(received_at) is built in; partition retention drops the oldest partition once it ages past Z4J_EVENT_RETENTION_DAYS.
Set statement_timeout on the brain’s database role to prevent runaway queries (Z4J_DB_STATEMENT_TIMEOUT_MS).

Scaling agents

Agents scale with your app. One agent per app process; the worker-first protocol identifies each worker by (agent_id, worker_id) so multi-worker servers (gunicorn, uwsgi) coexist under a single agent identity. No coordination between agents; deploying more app replicas registers more workers automatically.

When you hit a ceiling

If you are running 500+ agents or 100M+ events per day, file an issue. We want the feedback.