Skip to content

fix: security audit remediation#226

Merged
pratyush618 merged 30 commits into
masterfrom
fix/security-audit-remediation
May 30, 2026
Merged

fix: security audit remediation#226
pratyush618 merged 30 commits into
masterfrom
fix/security-audit-remediation

Conversation

@pratyush618
Copy link
Copy Markdown
Collaborator

@pratyush618 pratyush618 commented May 30, 2026

Summary

Hardens taskito against the issues found in a full security audit: untrusted-payload
RCE, dashboard authz gaps, webhook SSRF, proxy/interception abuse, and several DoS /
info-leak vectors. 29 focused commits grouped into phases; each fix is self-contained
with tests where applicable.

The central change in posture: storage write-access is no longer implicitly equivalent
to worker RCE. The new opt-in SignedSerializer and a documented trust model make that
boundary explicit.

What changed

Dashboard / API

  • Settings API never exposes or accepts internal auth: keys (password hashes,
    sessions, CSRF secret).
  • Admin role is enforced server-side on every mutating route; viewers are read-only.
  • Session/CSRF cookies carry Secure; /metrics + /readiness can be gated behind
    TASKITO_DASHBOARD_METRICS_TOKEN. Bootstrap admin password is popped from the env.
  • Job-list API surfaces only the final traceback line; full traces stay on the per-job
    errors endpoint.

Serialization / proxies / interception

  • New opt-in SignedSerializer(inner, key) — HMAC-SHA256 over payloads, rejecting
    forged bytes before deserialization.
  • Recipe-signing key hardened + a loud warning when proxies run unsigned.
  • File-proxy allowlist bypass (prefix + symlink) closed; type_path reconstruction
    restricted to safe kinds (no os.system-style targets).

Webhooks

  • Target URL re-validated at delivery time (closes the DNS-rebinding SSRF window) and
    redirects are no longer followed.

Locks / limits / Rust core

  • DistributedLock.info() masks a foreign holder's owner_id.
  • Per-enqueue payload size cap (max_payload_bytes, default 1 MiB); NUL idempotency
    separator; scaler binds to localhost by default.
  • Scheduler skips dispatch on a claim error (no duplicate execution); Postgres schema
    identifier quoted; Redis locks set a native TTL with an atomic reap; prefork recovers
    from a poisoned slot mutex; workflow fan-out capped.

Docs

  • New security guide (trust model + hardening checklist) plus updates to the
    serializers, dashboard-auth, webhooks, and locking guides.

Testing

  • Rust: cargo check (default/postgres/redis/workflows) + clippy + cargo test — green.
  • Python: full pytest tests/ — 1037 passed, 4 skipped, 0 failed.
  • ruff + mypy clean.

Known gaps / follow-ups (not in this PR)

  • Autoscale max_spawn_per_tick cap is not yet implemented (Low-severity DoS).
  • max_retries=0 still falls back to the policy default (needs a None/0 contract change
    across the PyO3 boundary).
  • Contrib FastAPI/Flask routers remain unauthenticated by default (documented, not
    default-denied); EncryptedSerializer AAD binding not added.

Summary by CodeRabbit

  • New Features

    • Payload integrity via SignedSerializer and configurable max payload size; fan-out creation capped.
  • Bug Fixes

    • Scheduler claim handling hardened to avoid duplicate dispatches; atomic lock reaper to prevent TOCTOU deletions; file-proxy path checks fixed to prevent symlink/sibling-prefix bypass.
  • Security & Hardening

    • Dashboard roles (admin/viewer), optional metrics bearer token, secure cookie defaults with CLI override, services default to localhost; webhook delivery now re-validates targets and blocks redirects.
  • Documentation

    • Added comprehensive security guide and updated auth/webhook/serializer docs.
  • Tests

    • New tests for serializers, error summaries, proxies, and security checks.

Review Change Stack

@github-actions github-actions Bot added the rust label May 30, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 30, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 5149577d-511b-4fef-93cf-2d3350c5315f

📥 Commits

Reviewing files that changed from the base of the PR and between a4730b8 and 557e525.

📒 Files selected for processing (1)
  • py_src/taskito/proxies/reconstruct.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • py_src/taskito/proxies/reconstruct.py

📝 Walkthrough

</review_stack_artifact_start -->

Walkthrough

Security and reliability hardening across Taskito: atomic Redis lock TTL operations and owner masking, HMAC-signed payloads with size limits, webhook delivery-time validation and no-redirects, dashboard role/token/config protections, safer type/file reconstruction, scheduler claim error handling, mutex-poison recovery, and supporting docs/tests.

Changes

Distributed Lock Expiry & Owner Privacy

Layer / File(s) Summary
Lock expiry TOCTOU-safe atomicity
crates/taskito-core/src/storage/redis_backend/locks.rs
EXTEND_LOCK_SCRIPT and ACQUIRE_LOCK_SCRIPT now apply PEXPIREAT; new REAP_LOCK_SCRIPT re-checks expiry inside the script and atomically deletes only expired locks; reap_expired_locks invokes the script per key.
Lock owner ID privacy and masking
py_src/taskito/locks.py, docs/content/docs/guides/reliability/locking.mdx
DistributedLock.info() masks returned owner_id with "***" for non-owner callers; docs add an "owner token is sensitive" callout.

Payload Integrity & Size Control

Layer / File(s) Summary
SignedSerializer implementation and export
py_src/taskito/serializers.py, py_src/taskito/__init__.py, tests/core/test_serializers.py
New SignedSerializer class prepends HMAC-SHA256 digest to serialized payloads and verifies on load; validates 32+ byte keys; exported from package; tests added for round-trip and integrity failures.
Payload size enforcement at enqueue
py_src/taskito/app.py
DEFAULT_MAX_PAYLOAD_BYTES (1 MiB) and Queue.max_payload_bytes enforce payload-size caps; _check_payload_size() applied during enqueue, batched dispatch, and enqueue_many; 0 disables cap; idempotency key now uses NUL separator.
SignedSerializer and payload documentation
docs/content/docs/guides/extensibility/serializers.mdx
Documentation section for SignedSerializer with example, warning, and guidance for combining with encryption.

Webhook SSRF Closure & Validation

Layer / File(s) Summary
Delivery-time validation and redirect prevention
py_src/taskito/webhooks.py
Re-validates webhook URL at delivery, uses a no-redirect HTTP opener, and records a single failed delivery when URL validation fails.
SSRF guard and validation documentation
docs/content/docs/guides/extensibility/events-webhooks.mdx
Docs updated to state validation occurs at registration and delivery time and that redirects are not followed.

Dashboard Security (Auth, Roles, Tokens, Settings)

Layer / File(s) Summary
Configurable cookie Secure flag and TLS handling
py_src/taskito/dashboard/server.py
serve_dashboard/_make_handler accept secure_cookies (default True); cookie builder conditionally appends Secure; OAuth redirect cookies updated.
Role-based authorization and admin gating
py_src/taskito/dashboard/routes.py, py_src/taskito/dashboard/server.py
requires_admin(path, method) added; mutating /api/ routes require "admin" role except self-service paths; non-admins receive 403.
Metrics and readiness bearer token gating
py_src/taskito/dashboard/server.py
_metrics_token_ok() reads TASKITO_DASHBOARD_METRICS_TOKEN; when set, /metrics and /readiness require Authorization: Bearer <token> with constant-time comparison.
Protected auth: namespace for settings
py_src/taskito/dashboard/handlers/settings.py
_is_protected() detects auth: prefix; list/get/delete handlers hide or treat protected keys as absent; key validation rejects protected keys.
Admin password environment security
py_src/taskito/dashboard/auth.py
bootstrap_admin_from_env reads and pops TASKITO_DASHBOARD_ADMIN_PASSWORD from os.environ.
CLI flags and service binding defaults
py_src/taskito/cli.py, py_src/taskito/scaler.py
--insecure-cookies added to dashboard CLI; serve_dashboard called with secure_cookies=not args.insecure_cookies; scaler default host changed to 127.0.0.1.
Dashboard authentication and security documentation
docs/content/docs/guides/dashboard/authentication.mdx
Docs now note taskito_session is Secure, /metrics and /readiness may require bearer token, and roles are admin/viewer with viewer receiving 403 on mutating routes.
Dashboard settings API security test
tests/dashboard/test_dashboard_settings.py
Test updated to assert /api/settings never exposes auth:-prefixed keys.

Proxy & Interception Type Safety & Signing

Layer / File(s) Summary
File path allowlist bypass prevention via containment
py_src/taskito/proxies/handlers/file.py
FileHandler.reconstruct uses resolved-path containment (os.path.commonpath) via _is_allowed and opens the resolved path after validation to prevent symlink-swap and sibling-prefix bypass.
Type-import and reconstructor validation
py_src/taskito/interception/converters.py
_import_type ensures the resolved attribute is a class; reconstruct_* functions check enum/dataclass/pydantic/named-tuple shapes and raise ValueError for mismatches.
Recipe signing for str and bytes secrets
py_src/taskito/proxies/signing.py
_key_bytes normalizer accepts str or bytes; sign_recipe/verify_recipe updated to accept `str
Per-process warning for unsigned recipes
py_src/taskito/proxies/reconstruct.py
Module-level _UNSIGNED_WARNED and _warn_unsigned_once() log a single-process warning when reconstructing unsigned recipes.
Interception and proxy security tests
tests/resources/test_interception.py, tests/resources/test_proxies.py
New tests assert malicious type_path values are rejected and that file-handler sibling-prefix bypass is blocked.

Scheduler & Storage Robustness

Layer / File(s) Summary
Scheduler claim error resilience
crates/taskito-core/src/scheduler/poller.rs
claim_for_dispatch now returns Ok(false) and logs a warning on storage.claim_execution errors (skip this tick) instead of proceeding as if claimed.
Postgres SQL identifier escaping
crates/taskito-core/src/storage/postgres/mod.rs
pg_quote_ident added and used for SET search_path and CREATE SCHEMA IF NOT EXISTS to safely quote schema identifiers.
Mutex poison recovery for slot operations
crates/taskito-python/src/prefork/slot.rs
recover_poison helper unwraps poisoned mutexes; slot operations use unwrap_or_else(recover_poison) to avoid panics from poisoned locks.
Workflow fan-out child expansion cap
crates/taskito-python/src/py_queue/workflow_ops/fan_out.rs
MAX_FAN_OUT added and expand_fan_out rejects overly large child expansions with a PyValueError.

Result Reporting & Service Defaults

Layer / File(s) Summary
Error summary truncation for result reporting
py_src/taskito/result.py, tests/core/test_result.py
_summarize_error() extracts the final non-empty traceback line and truncates to _ERROR_SUMMARY_MAX (500) with ellipsis; JobResult.to_dict() returns summarized error; tests added.
Scaler loopback binding default
py_src/taskito/scaler.py
serve_scaler default host changed to 127.0.0.1 and docs updated.

Comprehensive Security Guide & Operations Docs

Layer / File(s) Summary
Production security trust model and hardening guide
docs/content/docs/guides/operations/security.mdx
New security guide covering trust model, payload auth/encryption, dashboard controls, webhook protections, proxy signing/allowlists/type restrictions, lock privacy, scaler binding, and a hardening checklist.
Operations guide metadata update
docs/content/docs/guides/operations/meta.json
Added "security" to the pages list.
Webhook test fixture SSRF allowlist adjustment
tests/observability/test_webhooks.py
webhook_server fixture updated to accept monkeypatch and set TASKITO_WEBHOOKS_ALLOW_PRIVATE=1 for loopback test server under stricter validation.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • ByteVeda/taskito#217: Both PRs modify the scheduler's claim/dispatch path in crates/taskito-core/src/scheduler/poller.rs, specifically the claim_for_dispatch function's job-claim error-handling semantics.

Suggested labels

dashboard

Poem

🐰 I hopped through code to mend a thread,
Locks now close safe where the reapers tread,
Payloads signed with a secret key,
No redirects, no poisoned spree—
Secure gardens bloom where the rabbit led.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main objective of the PR: security hardening based on audit findings, addressing multiple security vulnerabilities across the codebase.
Docstring Coverage ✅ Passed Docstring coverage is 83.51% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/security-audit-remediation

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
tests/dashboard/test_dashboard_settings.py (1)

97-105: 💤 Low value

Consider adding explicit tests for protected key rejection.

The test correctly verifies that auth: keys are hidden from the list endpoint. For completeness, consider adding tests that verify:

  1. GET /api/settings/auth:users returns 404
  2. PUT /api/settings/auth:test returns 400 with "setting key is reserved"
  3. DELETE /api/settings/auth:users returns 404
🧪 Example additional test cases
def test_get_protected_setting_returns_404(dashboard_server: tuple[AuthedClient, Queue]) -> None:
    client, _ = dashboard_server
    with pytest.raises(urllib.error.HTTPError) as exc_info:
        client.get("/api/settings/auth:users")
    assert exc_info.value.code == 404


def test_put_protected_setting_returns_400(dashboard_server: tuple[AuthedClient, Queue]) -> None:
    client, _ = dashboard_server
    with pytest.raises(urllib.error.HTTPError) as exc_info:
        client.put("/api/settings/auth:test", {"value": "hacked"})
    assert exc_info.value.code == 400


def test_delete_protected_setting_returns_404(dashboard_server: tuple[AuthedClient, Queue]) -> None:
    client, _ = dashboard_server
    with pytest.raises(urllib.error.HTTPError) as exc_info:
        client.delete("/api/settings/auth:users")
    assert exc_info.value.code == 404
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/dashboard/test_dashboard_settings.py` around lines 97 - 105, Add
explicit tests to ensure protected "auth:" keys are rejected: create three new
test functions similar to test_get_settings_hides_auth_keys that call the
settings endpoints for an auth key (e.g., GET /api/settings/auth:users, PUT
/api/settings/auth:test with a payload, and DELETE /api/settings/auth:users) and
assert the correct HTTP error codes (GET and DELETE return 404, PUT returns 400
with the "setting key is reserved" condition). Use the same dashboard_server
fixture/AuthedClient as in test_get_settings_hides_auth_keys and assert the
raised urllib.error.HTTPError.code values accordingly.
py_src/taskito/webhooks.py (1)

41-53: ⚡ Quick win

Refine urllib redirect-refusal guidance in _NoRedirect (py_src/taskito/webhooks.py:41-53)

Returning None from _NoRedirect.redirect_request already causes urllib to raise urllib.error.HTTPError for 302 redirects (tested via opener.open(.../redirect/1) yielding HTTPError code 302), so the “may vary / continue error-handling chain” concern for 302 is overstated. If you want uniform refusal behavior across other 3xx codes (301/303/307/308), add small regression tests for each status (or explicitly handle the relevant http_error_3xx methods).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@py_src/taskito/webhooks.py` around lines 41 - 53, The comment says returning
None from _NoRedirect.redirect_request already causes urllib to raise HTTPError
for 302 but may not uniformly refuse all 3xx codes; update the implementation or
tests accordingly: either add explicit
http_error_3xx/http_error_301/http_error_303/http_error_307/http_error_308
handlers on the _NoRedirect class to consistently refuse all 3xx responses, or
add small regression tests exercising opener.open(...) via _NO_REDIRECT_OPENER
against endpoints returning 301/303/307/308 to assert HTTPError is raised;
reference _NoRedirect.redirect_request and the module-level _NO_REDIRECT_OPENER
when adding the handlers/tests so behavior is uniform and covered.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@py_src/taskito/proxies/reconstruct.py`:
- Around line 27-42: The unsigned-recipe warning is never emitted because the
code only handles the signed path where signing_secret is present; update the
branch that currently checks signing_secret (the same place handling the signed
verification path) to add an else branch that calls _warn_unsigned_once() when
signing_secret is absent so the warning is emitted exactly once; keep using the
existing _warn_unsigned_once() helper (and do not duplicate its logic) and
ensure you call it in the same function that currently processes the signed path
(the code block that references signing_secret) so unsigned reconstructions are
logged.

---

Nitpick comments:
In `@py_src/taskito/webhooks.py`:
- Around line 41-53: The comment says returning None from
_NoRedirect.redirect_request already causes urllib to raise HTTPError for 302
but may not uniformly refuse all 3xx codes; update the implementation or tests
accordingly: either add explicit
http_error_3xx/http_error_301/http_error_303/http_error_307/http_error_308
handlers on the _NoRedirect class to consistently refuse all 3xx responses, or
add small regression tests exercising opener.open(...) via _NO_REDIRECT_OPENER
against endpoints returning 301/303/307/308 to assert HTTPError is raised;
reference _NoRedirect.redirect_request and the module-level _NO_REDIRECT_OPENER
when adding the handlers/tests so behavior is uniform and covered.

In `@tests/dashboard/test_dashboard_settings.py`:
- Around line 97-105: Add explicit tests to ensure protected "auth:" keys are
rejected: create three new test functions similar to
test_get_settings_hides_auth_keys that call the settings endpoints for an auth
key (e.g., GET /api/settings/auth:users, PUT /api/settings/auth:test with a
payload, and DELETE /api/settings/auth:users) and assert the correct HTTP error
codes (GET and DELETE return 404, PUT returns 400 with the "setting key is
reserved" condition). Use the same dashboard_server fixture/AuthedClient as in
test_get_settings_hides_auth_keys and assert the raised
urllib.error.HTTPError.code values accordingly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c9061650-318e-4fcf-9bcb-b656de46fec9

📥 Commits

Reviewing files that changed from the base of the PR and between 957e67f and a4730b8.

📒 Files selected for processing (33)
  • crates/taskito-core/src/scheduler/poller.rs
  • crates/taskito-core/src/storage/postgres/mod.rs
  • crates/taskito-core/src/storage/redis_backend/locks.rs
  • crates/taskito-python/src/prefork/slot.rs
  • crates/taskito-python/src/py_queue/workflow_ops/fan_out.rs
  • docs/content/docs/guides/dashboard/authentication.mdx
  • docs/content/docs/guides/extensibility/events-webhooks.mdx
  • docs/content/docs/guides/extensibility/serializers.mdx
  • docs/content/docs/guides/operations/meta.json
  • docs/content/docs/guides/operations/security.mdx
  • docs/content/docs/guides/reliability/locking.mdx
  • py_src/taskito/__init__.py
  • py_src/taskito/app.py
  • py_src/taskito/cli.py
  • py_src/taskito/dashboard/auth.py
  • py_src/taskito/dashboard/handlers/settings.py
  • py_src/taskito/dashboard/routes.py
  • py_src/taskito/dashboard/server.py
  • py_src/taskito/interception/converters.py
  • py_src/taskito/locks.py
  • py_src/taskito/proxies/handlers/file.py
  • py_src/taskito/proxies/reconstruct.py
  • py_src/taskito/proxies/signing.py
  • py_src/taskito/result.py
  • py_src/taskito/scaler.py
  • py_src/taskito/serializers.py
  • py_src/taskito/webhooks.py
  • tests/core/test_result.py
  • tests/core/test_serializers.py
  • tests/dashboard/test_dashboard_settings.py
  • tests/observability/test_webhooks.py
  • tests/resources/test_interception.py
  • tests/resources/test_proxies.py

Comment thread py_src/taskito/proxies/reconstruct.py
@pratyush618 pratyush618 self-assigned this May 30, 2026
@pratyush618 pratyush618 merged commit 0f65cce into master May 30, 2026
23 checks passed
@pratyush618 pratyush618 deleted the fix/security-audit-remediation branch May 30, 2026 19:18
@pratyush618 pratyush618 mentioned this pull request May 30, 2026
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant