Sentry

z4j brain ships an optional Sentry hook. With a DSN configured, unhandled exceptions inside HTTP handlers, background workers, and migrations are captured to a Sentry project of your choosing. The integration is off by default, opt-in via a single env var, and runs every event through a redaction pass that strips Authorization headers, cookies, webhook URLs, and OAuth-style query tokens before the SDK ships them.

What the brain captures

Source	Without Sentry	With Sentry
Unhandled exception in an HTTP handler	500 response, structured stderr log	Same, plus a Sentry event with stack, route name, and HTTP method
Background worker crash (registry, audit retention, schedule fires)	Structlog error event	Same, plus Sentry event tagged with worker name
`logger.exception(...)` in domain code	Structlog error event	Same, plus Sentry event with the call site
Successful requests	Counted in Prometheus, no log line	Optionally captured as transactions when `Z4J_SENTRY_TRACES_SAMPLE_RATE > 0`

Migrations (alembic upgrade head) and one-shot CLI commands (z4j init, z4j audit verify, z4j reset-mfa) deliberately do NOT init Sentry. Failures on those paths already land on the operator’s terminal and in the audit log; adding a network round trip for every CLI invocation would be net negative.

Enabling

Install the optional dependency, then set the DSN:

pip install 'z4j[sentry]'

# In your env file or systemd unit
Z4J_SENTRY_DSN=https://<public-key>@<instance>/<project-id>

Restart the brain. On boot you should see:

INFO z4j.brain.observability.sentry: Sentry initialised (environment=production, release=z4j@1.6.0, traces=0.000, profiles=0.000)

If sentry-sdk is not installed when the DSN is set, the brain logs a single WARNING explaining what to install and continues running without Sentry. There is no fallback retry, no init backoff, and no exception that propagates out of create_app.

Settings

All settings are prefixed Z4J_ and read from your env file or the process environment.

Variable	Default	Notes
`Z4J_SENTRY_DSN`	unset	When unset OR empty, every other knob below is ignored. SecretStr at the Pydantic layer, so the value is never echoed in startup logs or in a Pydantic validation traceback.
`Z4J_SENTRY_ENVIRONMENT`	unset	Override the `environment` tag Sentry attaches to events. Defaults to `Z4J_ENVIRONMENT` (`production` / `staging` / `dev`). Use this when several brains route into one Sentry project and you want to distinguish them (e.g. `staging-eu` vs `staging-us`).
`Z4J_SENTRY_TRACES_SAMPLE_RATE`	`0.0`	Fraction of requests captured as transactions, in `[0.0, 1.0]`. Default 0 keeps Sentry on error-only mode. `0.05` is a reasonable opening bid; raise once you have a feel for event volume + your Sentry quota.
`Z4J_SENTRY_PROFILES_SAMPLE_RATE`	`0.0`	Fraction of in-transaction code that is profiled, in `[0.0, 1.0]`. Bounded above by `Z4J_SENTRY_TRACES_SAMPLE_RATE` (no transaction, no profile). Leave at 0 unless traces are already on.
`Z4J_SENTRY_SEND_DEFAULT_PII`	`false`	Forward identifying data (IP, username, raw cookies) to Sentry. Default false. The brain’s redaction pass strips Authorization headers, OAuth tokens, and webhook URLs even when this is on, so flipping it on still leaves credentials redacted.

Out-of-range sample rates fail validation at startup. A typo like Z4J_SENTRY_TRACES_SAMPLE_RATE=1.5 raises a Pydantic ValidationError before the FastAPI app is built; you see the error on the terminal, not at runtime.

What gets redacted

Every event is passed through z4j_brain.observability.sentry.scrub_event before reaching the SDK transport. The scrubber strips:

Request headers (case-insensitive): Authorization, Proxy-Authorization, Cookie, Set-Cookie, X-Z4J-Signature, X-Z4J-Audit-Signature, X-Z4J-API-Key, X-API-Key, X-Auth-Token, X-CSRF-Token, X-CSRFToken, plus the IP-chain set X-Forwarded-For / X-Real-IP / Forwarded / CF-Connecting-IP / True-Client-IP / Fastly-Client-IP / X-Cluster-Client-IP / Remote-User. The header name stays so the Sentry event still shows “Authorization was set”; the value is replaced with [REDACTED by z4j].
Request env (CGI-style): request.env dict from the WSGI/ASGI integration. HTTP_* keys map back to header names and are redacted via the same allowlist. Operator-added keys whose name matches a credential pattern (password, secret, token, api_key, etc.) have their values redacted.
Query parameters: token, access_token, refresh_token, id_token, api_key, apikey, key, secret, password, code (covers OAuth callbacks and MFA verification codes), signature, sig, session, csrf. Values stripped, parameter names kept.
Request cookies: the entire request.cookies block, wholesale.
Request body: the entire request.data block, wholesale. Webhook payloads outbound from the notification dispatchers can carry workflow tokens, and the brain prefers a coarse strip over a per-channel allowlist.
User block (event["user"]): email, username, ip_address are redacted even when send_default_pii=true so the brain’s PII-redaction promise holds. The id field (a UUID) is kept so Sentry issue-grouping works.
logentry: event["logentry"].message, formatted, and params (dict OR positional list). Top-level event["message"] is scrubbed identically.
Exception values and stacktraces: every event["exception"]["values"][i]["value"] (the str(exc), which routinely includes the offending URL) is scrubbed. Each stacktrace frame is walked: vars (locals) via the value-key pattern; context_line / pre_context / post_context (the surrounding source lines) via URL scrubbing; filename / abs_path / module redacted when the path itself matches a credential pattern.
Threads: same stacktrace shape as exception values; same scrubber pass.
Transaction: when the transaction string is a raw URL (unmatched route case) it is URL-scrubbed.
Spans + contexts.trace.data: span data.http.url / url.full / url.query URL attributes are scrubbed. Same applies to the contexts.trace.data block.
Outbound URLs in breadcrumbs (httpx / requests log lines): the query string is re-scrubbed with the same key set above; the host and path stay visible. Token-in-path webhook URLs (Slack / Discord / Teams / PagerDuty / Workflow webhooks) are replaced with <scheme>://<host>/[REDACTED by z4j] when they appear in free-form log message text.
extra / tags / contexts blocks: any key matching password, passwd, _pass, secret, token, api_key, auth, signature, private_key, bot_token, webhook_url, integration_key, recovery_code, or mfa_secret (substring, case-insensitive) has its value redacted. Nested dicts and lists of dicts are walked up to 32 levels deep; deeper structures are replaced with [REDACTED by z4j] (a hostile event with 1000-deep nesting cannot crash the scrubber and bypass redaction by recursion).

The redactor is a pure function, exercised by tests/unit/test_sentry_observability.py. A change that drops a header from the allowlist or weakens the query-key scrub will fail tests before it ships.

Webhook URL caveat

The Slack, Discord, Microsoft Teams, and PagerDuty channels store their destinations in a webhook_url config field. The scrubber redacts the URL when it appears under a key matching webhook_url, but the URL’s path often contains the credential (Slack and Discord both embed an HMAC-signed path segment). If you flip send_default_pii=true AND traces are on AND the brain dispatches a delivery that fails, the breadcrumb URL scrubber strips the query string but keeps the path. Keep send_default_pii=false (the default) if you treat the dispatcher URLs as secrets.

Release tags

The brain reads its installed package version and uses z4j@<version> as the Sentry release tag. This is what powers Sentry’s “first seen in” attribution and lets you correlate a spike of issues with a specific deploy. If the package version cannot be resolved (e.g. an editable install in a packaging mode that does not register metadata), the release tag is omitted; Sentry handles that fine but you lose the issue-to-release correlation.

Disabling

Unset Z4J_SENTRY_DSN and restart the brain. The SDK does not need to be uninstalled; an unset DSN is a complete no-op. No further outbound connections are attempted.

Workers

Adapter-side workers (z4j-celery, z4j-django, etc.) currently do NOT initialise their own Sentry client; only the brain process does. If you want Sentry on the worker side, install sentry-sdk in the worker’s venv and call sentry_sdk.init(...) from your worker entrypoint. Worker integration via the same scrubber surface is a candidate for a later release.