Skip to content

Troubleshooting

  1. Check the agent’s app logs for [z4j] lines at boot. Look for handshake errors.
  2. Verify Z4J_BRAIN_URL uses wss:// in production, ws:// in local dev.
  3. Verify the token is the one shown at mint time (tokens are not recoverable - re-mint if lost).
  4. Egress firewall: the agent’s host must reach the brain on TCP 443 (or wherever your proxy is).
  5. Proxy WebSocket passthrough: nginx-ingress needs proxy_set_header Upgrade and proxy_set_header Connection.
  1. Agent is online (Agents page shows online)?
  2. Engine is auto-detected (agent drawer → engines list)?
  3. For Django: INSTALLED_APPS includes z4j_django after any Celery apps?
  4. For Flask: z4j.init_app(app, ...) was called on the app factory?
  5. For FastAPI: agent is inside the lifespan context manager?
  6. Task names in registry (agent drawer → registry)?

You’ve upgraded the brain to a newer major than the agent. Re-deploy agents with pip install -U z4j-*. Agents up to one major version behind still work but may lack new features.

Someone modified the audit_log table directly, or a backup restore is incomplete.

  1. Identify first_broken_id.
  2. If known-intentional (e.g., planned DB surgery), document the break externally. The chain from there is not recoverable.
  3. If unexpected, treat as a compromise event - preserve the DB, alert security, investigate.

Another agent already has that (project_id, name). Pick a different name or delete the old agent first.

The scheduler backend doesn’t support writes (e.g. celery-beat with PersistentScheduler, rq-scheduler in v1.0). See schedulers overview.

  1. SMTP configured? z4j-brain smtp-test --to you@example.com.
  2. From: domain has SPF / DKIM / DMARC?
  3. Sender reputation good? Gmail drops many SMTP senders silently.
  1. Check rate(z4j_events_persisted_total[1m]) - is one agent emitting millions of events?
  2. Redaction patterns not looping on giant payloads? - the redactor has a 2 MiB payload cap.
  3. Hot endpoint - check z4j_http_request_duration_seconds.

The target agent dropped between UI showing the action and the brain dispatching. Wait for reconnect or pick a different agent handling the same engine.

Include the brain logs (with X-Request-Id), agent logs, and a description of the sequence - file at github.com/z4jdev/z4j/issues.