Scaling
Single brain replica
Section titled “Single brain replica”For most deployments one brain replica is plenty. The bottleneck is:
- Postgres — plenty of room; sizing in the self-hosting guide.
- WebSocket connections — one socket per agent, ~10 KiB RAM steady-state. 1000 agents is roughly 10 MiB.
- Event persistence — batched into Postgres; the brain handles thousands of events per second on modest hardware.
Horizontal scaling
Section titled “Horizontal scaling”Multiple brain replicas are supported on Postgres. The brain selects its registry and dashboard-fan-out backend from Z4J_REGISTRY_BACKEND:
postgres_notify(the default on Postgres) — agent commands and dashboard updates fan out across replicas via PostgresLISTEN/NOTIFY. Each agent’s WebSocket lives on whichever replica it happened to connect to; commands minted on any replica route to the right one through the registry. Dashboard subscribers connected to one replica still see events captured by another.local(forced on SQLite, since SQLite has noLISTEN/NOTIFY) — single-process only.
What you still need to provide yourself:
- Sticky session routing on
/ws— each agent’s WebSocket must pin to one brain pod. Configure your load balancer’s session affinity (e.g. nginx-ingress’snginx.ingress.kubernetes.io/affinity: cookie, an ALB target-group’s stickiness, or your service-mesh equivalent). - TLS termination in front of the brain. The brain itself speaks plaintext WebSocket on its bind port; production deployments put a reverse proxy in front.
The dashboard fan-out is over WebSocket (/ws/dashboard), not SSE; cross-replica delivery is the postgres_notify DashboardHub.
Scaling Postgres
Section titled “Scaling Postgres”- Read replicas help dashboards but not the hot event-persist path.
- Native partitioning on
events(received_at)is built in; partition retention drops the oldest partition once it ages pastZ4J_EVENT_RETENTION_DAYS. - Set
statement_timeouton the brain’s database role to prevent runaway queries (Z4J_DB_STATEMENT_TIMEOUT_MS).
Scaling agents
Section titled “Scaling agents”Agents scale with your app. One agent per app process; the worker-first protocol identifies each worker by (agent_id, worker_id) so multi-worker servers (gunicorn, uwsgi) coexist under a single agent identity. No coordination between agents; deploying more app replicas registers more workers automatically.
When you hit a ceiling
Section titled “When you hit a ceiling”If you are running 500+ agents or 100M+ events per day, file an issue. We want the feedback.