Skip to content

Wire protocol

Protocol version: 2. v2 adds a per-frame HMAC envelope plus replay protection; v1 is not accepted on the wire. The canonical definitions live in z4j_core.transport.frames; see the WebSocket protocol reference for the full schema.

  • WebSocket, TLS in production (wss://), plaintext for local dev (ws://).
  • One JSON object per WebSocket frame, UTF-8. There is no WS-layer batching; the event_batch frame carries the application-level batch.
  • Authentication: Authorization: Bearer <agent-token> on the WebSocket handshake.
  • HTTP longpoll fallback for networks that block WebSockets: POST /api/v1/agent/events to upload signed frames and GET /api/v1/agent/commands to poll for inbound commands. Same z4j_core frame types, same HMAC envelope.

Every frame carries the version, an id, and a type. Stateful frames also carry the HMAC envelope (nonce, seq, hmac); handshake frames (hello / hello_ack) are unsigned because the agent and brain are still negotiating which key to use.

{
"v": 2,
"type": "event_batch",
"id": "<agent-generated, 1..64 chars>",
"ts": "2026-04-16T12:34:56.789Z",
"nonce": "<urlsafe random>",
"seq": 4281,
"hmac": "<base64 HMAC over the canonical envelope>",
"payload": { "events": [] }
}

First frame the agent sends. Declares the agent’s protocol version, framework, engines, schedulers, and host info. The brain validates compatibility before accepting the connection.

The hot path. payload.events is capped at 5000 entries; the agent’s batcher caps itself at 500. Signed in v2 so a stolen bearer token alone cannot forge events.

Default cadence is 10 seconds; the brain returns its preferred heartbeat_interval_seconds in hello_ack and the agent honours that.

Reply to a brain-initiated command. Correlated by command_id.

Engine / scheduler registry updates after hello (e.g. a new queue appeared, a worker capability changed). Treated as additive state on the brain.

Carries agent_id, project_id, session_id, plus tuning parameters (heartbeat interval, max frame size).

Round-trip ack so the agent knows which buffered batch it can drop. Includes the original frame id (acked_id) so the agent’s in-flight map can match precisely.

Dispatched in response to a REST call against /api/v1/projects/{slug}/commands/.... The verb set matches the routes there: retry_task, cancel_task, bulk_retry, purge_queue, restart_worker, pool_resize, add_consumer, cancel_consumer, rate_limit. There are no schedule-CRUD commands on the wire — schedules are stored in the brain and the scheduler adapters read them directly.

Brain confirms receipt of a command_result before the agent drops its in-flight record.

Fatal protocol error; the brain will close the socket immediately after.

Brain pushes status changes (e.g. another worker joined or left under the same agent_id).

  • Agent to brain: the agent’s seq is monotonic per signed-frame stream. Replay protection rejects out-of-order or repeated nonces.
  • The brain dedupes accepted events by (agent_id, seq, event_id) so a retried batch after a brain-side restart is idempotent.
  • The event_batch_ack carries received, accepted, rejected so the agent can confirm exactly which rows landed.
  • Brain to agent: command_id is a UUID. The agent replies with the same command_id in command_result. Commands time out server-side and surface as command_timeout errors (504) on the REST side.

hello.payload.protocol_version must be "2". A mismatch closes the WebSocket with code 4426 (Upgrade Required). The agent logs and does not reconnect until upgraded.

Forward-compat: agents and brain ignore unknown fields. New command verbs added in a minor are rejected by older agents with ok: false, error: "unsupported_command"; the brain surfaces that as a command failure rather than a protocol error.