WebSocket protocol
Protocol version: 2. v2 adds a per-frame HMAC envelope plus replay protection; v1 is not accepted on the wire. See the wire protocol concept for the narrative; this page is the schema reference. The canonical definitions live in z4j_core.transport.frames.
Endpoint
Section titled “Endpoint”wss://<brain>/wsAuthorization: Bearer <agent-token>No subprotocol. Plain WebSocket carrying JSON frames.
Frame envelope
Section titled “Frame envelope”Every frame:
{ v: 2, // protocol version type: "<string>", id: string, // 1..64 chars, agent-generated ts: string | null, // RFC 3339, optional // signed frames also carry: nonce: string, // up to 32 chars seq: number, // monotonic per agent hmac: string, // base64; computed by FrameSigner payload: { ... } // type-specific}Stateful frames (event_batch, event_batch_ack, heartbeat, command, command_ack, command_result, registry_delta, error, agent_status) are signed. The handshake pair (hello / hello_ack) is unsigned because the agent and brain are still negotiating which key to use.
Agent to brain
Section titled “Agent to brain”First frame the agent sends. The brain validates protocol_version and accepts only "2".
{ type: "hello", payload: { protocol_version: "2", agent_version: string, framework: string, // "django" | "flask" | "fastapi" | "bare" engines: string[], // up to 64 schedulers: string[], // up to 64 capabilities: Record<string, string[]>, host: Record<string, any>, // optional, worker-first protocol (one connection per worker): worker_id?: string, worker_role?: "web" | "task" | "scheduler" | "beat" | "other", worker_pid?: number, worker_started_at?: string }}event_batch
Section titled “event_batch”The hot path. events is capped at 5000 entries; the agent’s batcher caps itself at 500.
{ type: "event_batch", payload: { events: Record<string, any>[] }}heartbeat
Section titled “heartbeat”{ type: "heartbeat", payload: {} }Default cadence is 10 seconds; the brain returns its preferred heartbeat_interval_seconds in hello_ack.
command_result
Section titled “command_result”{ type: "command_result", payload: { command_id: string, ok: boolean, error?: { code: string, message: string }, result?: any }}registry_delta
Section titled “registry_delta”Schedule / engine registry updates. The brain treats it as additive state.
Brain to agent
Section titled “Brain to agent”hello_ack
Section titled “hello_ack”Brain’s response to a successful hello.
{ type: "hello_ack", payload: { protocol_version: "2", brain_version: string, agent_id: string, project_id: string, session_id: string, heartbeat_interval_seconds: 10, // default max_frame_size_bytes: 1048576 // default 1 MiB }}event_batch_ack
Section titled “event_batch_ack”Round-trip ack so the agent knows which buffered batch it can drop.
{ type: "event_batch_ack", payload: { acked_id: string, // matches the original event_batch.id received: number, accepted: number, rejected: number }}command
Section titled “command”Dispatched in response to a REST call against /api/v1/projects/{slug}/commands/.... The set of valid verb values matches the routes there: retry_task, cancel_task, bulk_retry, purge_queue, restart_worker, pool_resize, add_consumer, cancel_consumer, rate_limit.
{ type: "command", payload: { command_id: string, verb: string, args: Record<string, any> }}command_ack
Section titled “command_ack”Brain confirms receipt of a command_result before the agent drops its in-flight record.
Fatal protocol error; the brain will close the socket immediately after.
agent_status
Section titled “agent_status”Brain pushes status changes (e.g. another worker joined / left under the same agent_id).
Close codes
Section titled “Close codes”Observed on the brain side:
| Code | Meaning |
|---|---|
| 1000 | Normal closure. |
| 1011 | Internal error. |
| 4400 | Malformed handshake or non-hello first frame. |
| 4401 | Bearer rejected. |
| 4408 | Idle timeout reached without a heartbeat. |
| 4426 | Protocol version not in SUPPORTED_PROTOCOLS. |
| 4429 | Per-agent connection cap exceeded. |
Dashboard-side (/ws/dashboard) uses a separate set: 4400 (bad request), 4401 (no session), 4402 (session-bound origin mismatch), 4403 (insufficient role), 4408 (idle).
Reconnect
Section titled “Reconnect”Reconnect with exponential backoff on 1006, 1011, 4408, and similar transient codes. On 4401, 4426, or 4429, stop and surface the error — these are configuration problems.