fix(logs): run PII redaction over HTTP and fix Presidio provisioning#5143
fix(logs): run PII redaction over HTTP and fix Presidio provisioning#5143TheodoreSpeaks wants to merge 3 commits into
Conversation
- resolve the guardrails venv via candidate paths and fail fast instead of silently falling back to system python3 (the misleading "Presidio not installed" that broke redaction and the guardrails block in deployed runtimes) - install the en_core_web_lg spaCy model in setup.sh and app.Dockerfile - route log redaction through an internal /api/guardrails/mask-batch endpoint so Presidio always runs in the app container, including async executions that persist inside the trigger.dev runtime
|
The latest updates on your projects. Learn more about Vercel for GitHub. |
PR SummaryMedium Risk Overview Execution log redaction now calls Presidio provisioning: Tests cover the new route and HTTP client; Reviewed by Cursor Bugbot for commit d0e573b. Bugbot is set up for automated code reviews on this repo. Configure here. |
Greptile SummaryThis PR fixes a production regression where PII log-redaction was scrubbing everything to
Confidence Score: 5/5Safe to merge — the change reroutes PII masking through a well-guarded HTTP endpoint and tightens venv provisioning without introducing new failure modes. The HTTP client correctly chunks by bytes and count, mints fresh JWTs per chunk, and carries an abort signal. The fail-safe chain (non-2xx → throw → REDACTION_FAILED) prevents PII from ever being persisted on failure. Unit tests cover auth rejection, chunking, error propagation, and empty input. No files require special attention. Important Files Changed
Sequence Diagram%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
participant TD as trigger.dev runtime
participant NX as Next.js app container
participant PY as Presidio (Python venv)
Note over TD,PY: Log persist path (both runtimes)
TD->>NX: POST /api/guardrails/mask-batch
Note right of TD: internal JWT, chunked ≤2000 items / 256KB
activate NX
NX->>NX: checkInternalAuth()
NX->>NX: parseRequest() — Zod validation
NX->>PY: spawn venv/bin/python3 validate_pii.py
PY-->>NX: "__SIM_RESULT__={masked:[...]}"
NX-->>TD: "200 { masked: [...] }"
deactivate NX
Note over TD: maskPIIBatchViaHttp merges chunk results, preserves order
Note over NX,PY: On venv missing: resolveGuardrailsPython throws
Note over NX,PY: route returns structured 500
Note over NX,PY: caller scrubs to [REDACTION_FAILED]
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
participant TD as trigger.dev runtime
participant NX as Next.js app container
participant PY as Presidio (Python venv)
Note over TD,PY: Log persist path (both runtimes)
TD->>NX: POST /api/guardrails/mask-batch
Note right of TD: internal JWT, chunked ≤2000 items / 256KB
activate NX
NX->>NX: checkInternalAuth()
NX->>NX: parseRequest() — Zod validation
NX->>PY: spawn venv/bin/python3 validate_pii.py
PY-->>NX: "__SIM_RESULT__={masked:[...]}"
NX-->>TD: "200 { masked: [...] }"
deactivate NX
Note over TD: maskPIIBatchViaHttp merges chunk results, preserves order
Note over NX,PY: On venv missing: resolveGuardrailsPython throws
Note over NX,PY: route returns structured 500
Note over NX,PY: caller scrubs to [REDACTION_FAILED]
Reviews (4): Last reviewed commit: "fix(guardrails): mint internal token per..." | Re-trigger Greptile |
|
@greptile review |
|
@BugBot review |
- chunk maskPIIBatchViaHttp by count (2000) and bytes (256KB) so large executions split across requests and never hit the contract's 100k cap - add AbortSignal.timeout(45s) per request so a slow/unreachable app container aborts and the caller scrubs, instead of hanging the trigger.dev job - catch maskPIIBatch failures in the route: log and return a structured 500 (broken venv fails loudly server-side; caller still scrubs, no leak) - add mask-client tests (order across chunks, count split, non-2xx, empty)
|
@greptile review |
|
@BugBot review |
A single token (5min TTL) could expire mid-batch when a large execution fans out into many sequential chunk requests; mint one per request instead.
|
@greptile review |
|
@BugBot review |
There was a problem hiding this comment.
✅ Bugbot reviewed your changes and found no new issues!
Comment @cursor review or bugbot run to trigger another review on this PR
Reviewed by Cursor Bugbot for commit d0e573b. Configure here.
Summary
[REDACTION_FAILED]in staging because Presidio is not actually runnable in the deployed runtimes (the guardrails PII block hit the same wall but fail-opens, so it looked fine).isolated-vm.ts) and throw instead of silently falling back to systempython3— that fallback produced the misleading "Presidio not installed".en_core_web_lg(which the defaultAnalyzerEngine()loads) insetup.shanddocker/app.Dockerfile; it was never provisioned in the image.POST /api/guardrails/mask-batchendpoint (internal-auth) instead of spawning Python in-process, so Presidio always runs in the app container — including async executions that persist inside the trigger.dev runtime, where there is no Python.Type of Change
Testing
pii-redaction.test.ts+ newmask-batch/route.test.ts(9 passing)bun run check:api-validation:strictpasses (route baseline 859→860)bun run lintcleansimbuild: compiled + TypeScript cleanChecklist