Troubleshooting
Agent shows offline
Section titled “Agent shows offline”- Check the agent’s app logs for
[z4j]lines at boot. Look for handshake errors. - Verify
Z4J_BRAIN_URLuseswss://in production,ws://in local dev. - Verify the token is the one shown at mint time (tokens are not recoverable - re-mint if lost).
- Egress firewall: the agent’s host must reach the brain on TCP 443 (or wherever your proxy is).
- Proxy WebSocket passthrough: nginx-ingress needs
proxy_set_header Upgradeandproxy_set_header Connection.
Events don’t appear
Section titled “Events don’t appear”- Agent is online (Agents page shows online)?
- Engine is auto-detected (agent drawer → engines list)?
- For Django:
INSTALLED_APPSincludesz4j_djangoafter any Celery apps? - For Flask:
z4j.init_app(app, ...)was called on the app factory? - For FastAPI: agent is inside the lifespan context manager?
- Task names in registry (agent drawer → registry)?
”Agent version mismatch” banner
Section titled “”Agent version mismatch” banner”You’ve upgraded the brain to a newer major than the agent. Re-deploy agents with pip install -U z4j-*. Agents up to one major version behind still work but may lack new features.
Audit chain verify fails
Section titled “Audit chain verify fails”Someone modified the audit_log table directly, or a backup restore is incomplete.
- Identify
first_broken_id. - If known-intentional (e.g., planned DB surgery), document the break externally. The chain from there is not recoverable.
- If unexpected, treat as a compromise event - preserve the DB, alert security, investigate.
409 conflict_duplicate_name on mint-token
Section titled “409 conflict_duplicate_name on mint-token”Another agent already has that (project_id, name). Pick a different name or delete the old agent first.
Schedules showing read_only
Section titled “Schedules showing read_only”The scheduler backend doesn’t support writes (e.g. celery-beat with PersistentScheduler, rq-scheduler in v1.0). See schedulers overview.
Password reset email not arriving
Section titled “Password reset email not arriving”- SMTP configured?
z4j-brain smtp-test --to you@example.com. From:domain has SPF / DKIM / DMARC?- Sender reputation good? Gmail drops many SMTP senders silently.
Very high CPU
Section titled “Very high CPU”- Check
rate(z4j_events_persisted_total[1m])- is one agent emitting millions of events? - Redaction patterns not looping on giant payloads? - the redactor has a 2 MiB payload cap.
- Hot endpoint - check
z4j_http_request_duration_seconds.
503 agent_offline on a retry
Section titled “503 agent_offline on a retry”The target agent dropped between UI showing the action and the brain dispatching. Wait for reconnect or pick a different agent handling the same engine.
When stuck
Section titled “When stuck”Include the brain logs (with X-Request-Id), agent logs, and a description of the sequence - file at github.com/z4jdev/z4j/issues.