Migrate from Flower to z4j

This guide walks through replacing Flower with z4j on a production Celery deployment. Both tools observe the same Celery broker events, so you can run them side by side during the cut-over and turn Flower off only when you are confident z4j covers your workflow.

End state: persistent task history in Postgres, retry/cancel/bulk actions from the dashboard, full Celery Beat schedule CRUD, RBAC with invitations, an HMAC-chained audit log, and the same dashboard for any other engines you add later (RQ, Dramatiq, Huey, arq, taskiq).

Plan for a Friday afternoon, not a sprint. Most teams cut over in a few hours.

Who this guide is for

You are running Celery 5.6 or newer with Flower in front of it, and you have outgrown Flower for at least one of these reasons:

Worker crashes leave tasks “started” forever because Flower has no reconciler.
Flower forgets everything on restart and you needed to know what failed last week.
You want bulk retry, schedule CRUD, or RBAC and Flower does not have them.
Your stack is going multi-engine (RQ alongside Celery, for example) and you want one dashboard.
An auditor asked for a tamper-evident log of who retried which job and when.

If none of that applies and you are happy with Flower, do not migrate. Flower is a fine tool inside the scope it was built for.

Prerequisites

Celery 5.6 or newer.
A reachable Postgres or SQLite path for z4j. Production should use Postgres; SQLite is fine for evaluation.
The same broker URL and result backend the workers already use. No data migration is required.
Permission to add one pip package (z4j-celery) to the worker venv and one container or process for z4j.

Phase 1: Install z4j alongside Flower

Flower keeps running unchanged. z4j is added in parallel. Neither tool interferes with the other.

# Option A: pip + bundled SQLite, fastest path for evaluation
pip install z4j
z4j serve

# Option B: docker compose, recommended for anything other than a laptop
git clone https://github.com/z4jdev/z4j
cd z4j
docker compose up -d

The first boot prints a setup URL to stderr. Open it, create the first admin, and you have an empty dashboard.

Verify the install:

z4j check    # config + DB connectivity + alembic head
z4j status   # user/project/agent/task counts, version

At this point z4j is running but has no agents and no events. Flower is still your source of truth.

Phase 2: Add the z4j-celery agent to your workers

Install the adapter into the venv your Celery workers use:

pip install z4j-celery

Set four environment variables on the worker process. The first three are mandatory; the last two have sensible defaults:

export Z4J_BRAIN_URL="https://z4j.example.com"
export Z4J_TOKEN="<bearer-from-the-dashboard>"
export Z4J_HMAC_SECRET="<hmac-from-the-dashboard>"
export Z4J_PROJECT_ID="default"          # optional, defaults to "default"
export Z4J_AGENT_NAME="celery-prod-1"    # optional, derived from hostname if unset

Mint the token and HMAC secret in the dashboard under Project > Agents > New agent. The dashboard prints them once; store them in your secrets manager.

Restart one worker. You should see this on stdout:

INFO:z4j.celery.worker_bootstrap:z4j worker bootstrap: agent runtime started

If you do not see that line, the agent is not running. Common causes:

Worker was started as celery beat or celery inspect, not celery worker. The bootstrap signal only fires under celery worker.
Z4J_BRAIN_URL is unreachable from the worker. Check DNS and TLS from inside the worker pod or container.
Z4J_TOKEN is wrong. z4j logs an unauthorized request when this happens.

Once the first worker reports in, restart the rest. New tasks should appear in the z4j dashboard within seconds.

Phase 3: Validate parity for a week

Run Flower and z4j side by side for one full business cycle. The goal is to make sure z4j shows the same operational picture you trusted Flower for, plus the new things you came for.

A reasonable validation checklist:

Live tasks appear in z4j as they appear in Flower.
Task counts per queue match between the two dashboards.
A task you intentionally fail shows up with the full traceback in z4j.
A task you intentionally retry from z4j actually re-runs.
A scheduled task fires at its expected time.
The audit log entry for the retry shows your user, the timestamp, and the action.
Worker restart does not lose tasks (z4j picks them up on reconciliation).
Secret arguments are redacted by default in the z4j detail view.

If any of these fail, file an issue at github.com/z4jdev/z4j/issues with the agent log and z4j log. Do not turn Flower off until they all pass.

Phase 4: Cut over

Three things to do, in order:

Move any operator playbooks or runbooks that reference flower.example.com to point at z4j.example.com.
Stop the Flower process or remove the flower service from docker-compose.yml. Keep the entry commented out in git for one release cycle in case you need to roll back.
Tell the team. Include a one-paragraph diff so they know what changed and where the new buttons are.

z4j keeps recording from now on. Flower’s last in-memory snapshot is gone the moment you stop it, but you no longer care.

Rollback plan

If something goes wrong in the first 48 hours, recovery is fast because z4j and Flower are independent.

# Re-enable Flower (it never had state to migrate, so it just starts).
docker compose up -d flower

# Stop the z4j agent on workers (uninstall or unset Z4J_BRAIN_URL).
# Workers keep running normally; only the agent thread stops phoning home.
pip uninstall -y z4j-celery

Operators can keep using Flower exactly as before. z4j retains the history it has already recorded; you can come back to it later.

Frequently asked questions

Will z4j double-process my tasks?

No. z4j is observation-only. The agent reads broker events, worker signals, and the result backend. It does not enqueue or execute tasks. Running z4j alongside Flower or any other observer is safe.

Does z4j handle Celery chords, groups, and chains?

Yes. The z4j-celery adapter understands chord/group/chain parents and renders them as a tidy-tree DAG with runtime badges per node. Flower shows these as a flat parent/child list, which is one of the things you may have come here to fix.

What about Celery Beat?

Install pip install z4j-celerybeat on the host that runs celery beat. The dashboard then exposes full schedule CRUD: create, edit, delete, enable/disable, trigger-now. It writes back to whichever scheduler your worker uses (django-celery-beat, redbeat, or the default file-backed scheduler).

Do I need to change my broker, result backend, or worker code?

No. z4j-celery is a passive observer. Your CELERY_BROKER_URL, CELERY_RESULT_BACKEND, task definitions, and worker invocation stay exactly as they are.

Will my secret task arguments leak to the dashboard?

Secrets are scrubbed by default. The redaction layer recursively walks args and kwargs and replaces values that match common credential patterns (password, token, secret, api_key, authorization, etc.). You can extend the patterns or mark specific tasks as fully redacted in the project settings.

Does z4j work with django-celery-beat / redbeat?

Yes for both. The z4j-celerybeat adapter detects the active scheduler and writes to the correct backend. Schedule entries you create in the z4j dashboard are visible to celery beat, and entries you create directly in django-celery-beat are visible in the z4j dashboard.

What happens to historical tasks Flower had cached?

They are gone the moment Flower restarts, the same as before. z4j starts a fresh, persistent record from the moment its agent connects. There is no migration step because there is nothing in Flower to migrate from.

How much overhead does the z4j agent add?

The agent buffers events in memory and ships them to z4j in batches over a single WebSocket. Per-task overhead is on the order of microseconds at the agent and a single Postgres insert at z4j. At thousands of tasks per second, z4j’s batched writes outperform Flower’s full re-render of in-memory state.

Can I still use Flower’s HTTP API for scripts I have already written?

z4j ships a richer REST and WebSocket API documented under API reference. Anything you scripted against Flower’s API has an equivalent in z4j, usually with more filters and an audit-log entry recording who called it.

Is there a paid version with different features?

No. z4j is open source under split licensing: z4j is AGPL v3, the adapters are Apache 2.0. Every feature is in the open-source release. There is no commercial gate, no telemetry, and no phone-home.

Common pitfalls

A short list of things that have tripped operators in the wild:

Worker started without celery worker. The bootstrap signal only fires under that command. celery beat, celery inspect, celery purge will not start the agent.
Reverse proxy strips the WebSocket upgrade. The agent uses a long-lived WebSocket. nginx, Cloudflare, and Traefik all need explicit upgrade handling. See the TLS setup guide for working configs.
Token shared across multiple agents. Each worker host should get its own agent token. Sharing one token means you cannot tell which host is reporting which event.
HMAC clock skew. The HMAC frames embed a timestamp. If the worker clock drifts by more than five minutes from z4j clock, z4j rejects the frames. Run NTP everywhere.
Forgotten Z4J_PROJECT_ID. All agents in a project share a project namespace. Workers without an explicit Z4J_PROJECT_ID land in the default project, which is fine for single-team setups but confusing in a multi-tenant deployment.

Next steps

The Celery adapter reference covers what events z4j captures and how reconciliation works.
The Production checklist walks through hardening for self-hosting.
The Comparison page shows how z4j stacks up against other dashboards if you are still evaluating.
Marketing landing for the high-level story: Flower vs. z4j on z4j.com.