Skip to content

fix(ci): restore CI security scanning pipeline consistency#1085

Closed
Wikid82 wants to merge 128 commits into
mainfrom
development
Closed

fix(ci): restore CI security scanning pipeline consistency#1085
Wikid82 wants to merge 128 commits into
mainfrom
development

Conversation

@Wikid82

@Wikid82 Wikid82 commented Jun 17, 2026

Copy link
Copy Markdown
Owner

Summary

  • Phase 1 (pre-PR): Programmatically dismissed 4 zombie code scanning alerts (#1459–#1462) via the GitHub Code Scanning REST API. All four had most_recent_instance.state: fixed but remained state: open since 2026-05-27 due to a known GitHub deduplication engine gap. Dismissed with won't fix + comment referencing PR Weekly: Promote nightly to main (2026-05-25) #1038 and the May 25 image rebuild.

  • Commit 1 — docker-build.yml + security-pr.yml:

    • Add trivyignores: '.trivyignore' to both the table-output and SARIF Trivy scan steps in docker-build.yml, so suppressed CVEs are excluded consistently with the PR scan steps that already had this setting.
    • Align the github/codeql-action/upload-sarif SHA in security-pr.yml from the stale pre-v4.36.2 pin (34950e1b) to the canonical v4.36.2 SHA (8aad20d1) used by every other security workflow.
  • Commit 2 — security-weekly-rebuild.yml:

    • Add trivyignores: '.trivyignore' to the SARIF Trivy scan step.
    • Add id: upload-trivy-weekly to the upload step and a new "Verify SARIF was uploaded" step that fails the workflow visibly if the SARIF file is absent, the upload step did not succeed, or the file is not valid JSON — preventing the silent-discard failure mode that caused zombie alerts Feb–May 2026.

Test plan

  • Verify alerts #1459–#1462 show state: dismissed in the GitHub Security tab Closed section
  • Confirm no open alerts show "last updated May 2026"
  • After merge, trigger a docker-build.yml run on main and confirm the Trivy steps complete with trivyignores honoured and a new analysis record created for category trivy-image-scan
  • After next weekly rebuild run (or workflow_dispatch), confirm "Verify SARIF was uploaded" step passes and SARIF_RESULT_COUNT appears in the job summary
  • Run gh api /repos/Wikid82/Charon/code-scanning/alerts?state=open --paginate | jq '[.[] | select(.most_recent_instance.state == "fixed")] | length' — expected: 0

Wikid82 and others added 30 commits June 2, 2026 11:08
Propagate changes from main into development
chore(deps): update github-actions-non-major
…ntries

Renovate's automated update regenerated package-lock.json incorrectly,
omitting top-level node_modules entries for eslint and vite. This caused
npm ci to fail in CI during dependency installation. Regenerating with
Node v22.22.1 and npm v11.16.0 restores the correct entries.
The supply-chain Grype scan last ran on Feb 4, 2026 due to a cascade of
compounding failures. This commit resolves all root causes:

- Twelve .trivyignore CVE suppressions expired between Apr 30 and May 25,
  causing the Trivy PR gate to block all PR merges and starve the pipeline
  of push events. All entries extended 60–90 days with appropriate review
  comments; no entry exceeds Sep 1, 2026.

- Ten .grype.yaml suppressions also expired in May, meaning Grype scans
  that did run would immediately fail on HIGH findings and produce no fresh
  SARIF. All entries extended with matching dates.

- The supply-chain-pr.yml job condition had a dead workflow_run branch and
  was missing the push and schedule event names, silently skipping the
  verify-supply-chain job on every push to main. Added push and schedule to
  the condition.

- Added a weekly schedule trigger (Mondays at 02:00 UTC) so scans run
  regardless of PR activity. Added development to push branches to match
  docker-build.yml scope.

- Removed continue-on-error: true from the SARIF upload step so upload
  failures surface as visible workflow failures rather than silent no-ops.

- Simplified concurrency.group to remove dead workflow_run expressions.

Refs: GitHub Code Scanning "last scanned Feb 4, 2026" alert
Add anti-FOUC inline script to index.html that applies the stored theme
class synchronously before React mounts. Switch ThemeContext to useLayoutEffect
for synchronous class application, add explicit light-mode CSS overrides, update
CSP to allowlist the inline script hash, and add a Playwright regression suite.
Update GO_VERSION from 1.26.3 to 1.26.4 in all 9 CI workflow files and
fix go.goroot in .vscode/settings.json to point to /usr/local/go where
1.26.4 is installed, replacing the missing sdk/go1.26.4 path.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Switch setup-go from go-version env var to go-version-file: backend/go.mod
so the action reads the required version directly from go.mod instead of
relying on a cached toolchain version that may lag behind. Change
GOTOOLCHAIN from auto to local across all workflows so Go uses exactly the
version installed by setup-go without attempting auto-downloads that can
silently fall back to an older release.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Upgrades github.com/buger/jsonparser to v1.1.2 in the CrowdSec
dependency patch block to fix a panic in Delete() caused by a
negative slice index on malformed JSON input. Affects both the
crowdsec and cscli binaries.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
chore(deps): update go-non-major to v1.75.0
chore(deps): update go-non-major to v1.2.0
Wikid82 and others added 29 commits June 11, 2026 23:55
* chore(ci): bump e2e workflow Node to 24.12.0 and track NODE_VERSION via Renovate

Node 20 is EOL and will be unsupported by npm 12. Adds a Renovate custom
manager so all workflow NODE_VERSION pins receive update PRs.

* fix(security): disable dependency install scripts for all npm installs

Adopts npm v12's secure default today: every npm ci/install call site
(CI workflows, Dockerfile, Makefile, scripts, package.json pre-hooks)
now passes --ignore-scripts, and unrs-resolver's postinstall is
explicitly denied via allowScripts (it ships prebuilt binaries; the
script is only a fallback build). Verified: clean installs, frontend
build, type-check, and full unit suite all pass with scripts disabled.

---------

Co-authored-by: GitHub Actions <actions@github.com>
* chore(deps): update npm-non-major to ^10.5.0

* fix: regenerate frontend lock file to restore missing eslint@10.5.0 entries

Renovate's automated update removed top-level node_modules entries for
eslint@10.5.0 (and transitive deps eslint-visitor-keys, ignore) from
frontend/package-lock.json, causing all CI jobs to fail at npm ci.

---------

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: GitHub Actions <actions@github.com>
Automated checksum update for GeoLite2-Country.mmdb database.

Old: abce3a42f4f6bfb2c90cded582341da6764f5e152782ce6c832bc8fa1d873778
New: 11b88595d026953920668d91f6d531057b397f05170237fc98a13a8b051ab861

Auto-generated by: .github/workflows/update-geolite2.yml

Co-authored-by: Wikid82 <176516789+Wikid82@users.noreply.github.com>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
* feat(models): add RequestLog model and AutoMigrate registration

Adds RequestLog struct to record proxied HTTP requests for the enhanced
dashboard statistics feature (issue #25). Includes BeforeCreate hook for
UUID generation, compound (host_id, timestamp) indexes, and GDPR-safe
pseudonymised client IP hashing. Registers model in AutoMigrate.

* feat(services): add StatsIngester for log fan-out and batch DB writes

- Add stats_types.go with StatsPushData, HostStat, StatusStat,
  StatsPushMessage, and BroadcastHub interface (avoids import cycles)
- Add StatsIngester: channel buffer=1000, batch flush at 100 entries
  or 500ms interval via CreateInBatches; atomic dropped-count tracking
- Hash client IPs with SHA-256 (first 16 bytes, hex) for GDPR safety
- Add LogWatcher.RegisterIngester() + fan-out in broadcast()
- TDD: 6 tests covering count-flush, timer-flush, back-pressure,
  graceful Stop drain, IP hashing determinism, and fan-out wiring

* feat(services): add StatsService with aggregation queries and TTL cache

Adds StatsService providing GetSummary (30s TTL cache), GetTopHosts,
GetStatusDistribution, GetTrafficVolume, and GetCertExpiry with input
validation allowlists. Extends stats_types.go with StatsSummary,
TrafficBucket, and CertExpiry types.

* feat(frontend/api): add stats API client and TypeScript type definitions

Add typed API functions and interfaces for all 8 stats endpoints (summary,
top-hosts, status-distribution, traffic-volume, cert-expiry, requests, health,
and WebSocket hub) with full Vitest test coverage (33 tests).

* feat(frontend/hooks): add useStats and useStatsWebSocket hooks

Add six TanStack Query hooks (useStatsSummary, useTopHosts,
useStatusDistribution, useTrafficVolume, useCertExpiry, useStatsHealth)
with stable query keys and appropriate polling intervals. Add
useStatsWebSocket hook that tracks live summary updates via the stats
WebSocket and disables REST polling when connected. Full Vitest coverage
for all hooks (22 tests).

Also remove unicorn/no-array-for-each ESLint rule removed in unicorn v66.

* feat(frontend/components): add stats chart and widget components

Add 8 pure presentational components under frontend/src/components/stats/:
- RequestCountWidget: 3-stat card for 24h/7d/30d request counts
- TopHostsChart: horizontal bar chart (recharts BarChart)
- StatusDistributionChart: donut chart with accessible HTML summary list
- TrafficVolumeChart: line chart with KB/MB Y-axis formatting
- CertExpiryList: accessible table with red/amber/green day-based color coding
- ServiceHealthWidget: WebSocket live/offline indicator + dropped-event warning
- PeriodSelector: controlled radio button group for 24h/7d/30d
- BucketSelector: controlled radio button group for 1h/6h/1d

All components are pure (no data fetching), strictly typed with no `any` types,
and keyboard accessible. Includes 58 Vitest unit tests covering loading states,
data rendering, color coding, and interaction callbacks.

* feat(frontend/dashboard): integrate stats sections into Dashboard page

Add a responsive Statistics section to the Dashboard page below the
existing content. Uses useStatsWebSocket for live updates, useState for
period/bucket controls, and the six stats hooks + eight stats components
(RequestCountWidget, ServiceHealthWidget, CertExpiryList,
TrafficVolumeChart, TopHostsChart, StatusDistributionChart,
PeriodSelector, BucketSelector). Layout is mobile-first with single
column on small screens, 2-col on sm/md, 3-col top row on lg.
Adds dashboard.statistics and dashboard.trafficVolume i18n keys to all
five locale files. Expands Dashboard tests from 3 to 12 cases.

* test(e2e): add Playwright tests for enhanced dashboard statistics

- tests/stats.spec.ts: 12 E2E tests covering all 9 required scenarios
  (stats heading, period selector, bucket selector, request count widget,
  service health widget, cert expiry section, traffic/top-hosts/status
  distribution chart containers) plus accessibility radio-count assertion
- backend/internal/api/handlers/stats_api_integration_test.go: adds
  TestStatsAPI_CertExpiry_366Days_Returns400 to cover the upper-bound
  validation (within_days > 365 returns HTTP 400); simplifies function
  signature to remove unused return value
- backend/internal/services/stats_ingester_test.go: adds
  TestStatsIngester_RegisterHub and TestStatsIngester_ToRequestLog_InvalidTimestamp
  to cover the RegisterHub wiring path and the timestamp parse-error fallback

All 12 E2E tests pass against the running E2E container at :8080.
Backend unit tests pass (88.4% coverage, above 87% minimum).
Frontend tests pass (87.86% statement coverage, above 85% minimum).
GORM scan: 0 CRITICAL/HIGH findings.

* docs: update ARCHITECTURE.md and features.md for stats subsystem

* feat(api): add stats handler and WebSocket hub for dashboard stats

* test: fix patch coverage for stats subsystem

Adds targeted tests to cover all previously uncovered patch lines:

Backend:
- stats_ws_hub_test.go (new): full hub coverage — constructor, non-blocking
  broadcast, ctx cancel exit, client broadcast, slow-client drop, client
  unregister, StatsWS upgrade-error path, StatsWS nil-hub close
- stats_handler_test.go: error-path 500s for all six handlers, non-integer
  within_days → 400, invalid limit param silently ignored
- stats_ingester_test.go: Stop flushes batches > batchSize; Run drains big
  batch on ctx cancel (covers batchSize branch in drain loop)
- stats_service_test.go: GetTrafficVolume 6h and 1d buckets; GetSummary DB error

Frontend:
- StatusDistributionChart: extended recharts mock calls Pie label/Tooltip
  content; adds 1xx test to cover statusClass "other" return
- TrafficVolumeChart: mock calls YAxis tickFormatter with MB/KB/B values
  and Tooltip content to cover formatBytes branches
- TopHostsChart: mock calls Tooltip content including hostname ?? label fallback
- CertExpiryList: adds undefined-data test to cover (data ?? []) branch
- useStatsWebSocket: adds non-stats_update message test for the else branch

* feat(frontend): add widget tooltips, top-hosts color coding, and hide/show controls

Adds ELI5 info tooltips to all 6 dashboard stats widgets, a color-coded
legend for the Top Hosts chart, and a per-widget visibility toggle
persisted in localStorage so users can hide widgets they don't need.

* fix(frontend): add aria-expanded to sidebar accordion buttons

Adding the Dashboard "Customize" button (which also carries
aria-expanded) shifted DOM order and caused the WebKit navigation E2E
test to target it instead of the sidebar, since the sidebar's
collapsible accordion buttons never actually exposed aria-expanded.
Add the missing attribute to the real sidebar toggles and scope the
test to the sidebar so it tests what it claims to.

* fix(api): join proxy_hosts to populate Top Hosts hostname

GetTopHosts only selected host_id and a count, never the hostname, so
every entry in the Top Hosts legend and tooltip rendered with a blank
or duplicate label. With every chart category collapsing to the same
empty string, Recharts merged the bars and hovering any bar showed the
same (highest) data point. Join proxy_hosts by UUID to populate the
real hostname, falling back to host_id if the host has since been
deleted.

* fix(database): run SQLite integrity check in background to avoid blocking startup

PRAGMA quick_check was running synchronously in Connect() and could take
well over a minute on larger databases (observed 93.5s), despite being
intended as a non-blocking, warn-only check.

* fix(database): give startup integrity check its own connection

The previous fix moved PRAGMA quick_check to a goroutine, but the main
pool is capped at one connection (SQLite single-writer constraint), so
the background check still held that connection for the full scan and
blocked AutoMigrate behind it. Open a dedicated connection for the
check so it no longer serializes against the rest of startup.

* fix(api): use proxy host Name instead of domain names for Top Hosts

GetTopHosts joined proxy_hosts.domain_names, showing raw domains in
the legend/tooltip instead of the user-assigned host Name. Switch the
join to proxy_hosts.name to match what users configured.

* fix(database): silence record-not-found noise in GORM query logs

Optional lookups (e.g. caddy.keepalive_idle/keepalive_count settings)
return ErrRecordNotFound when unset, which call sites already handle
via `if err == nil` fallbacks. GORM's default logger still logged
these as errors on every startup. Configure IgnoreRecordNotFoundError
so only real query errors and slow queries are logged.

* fix(api): match Top Hosts by domain instead of ProxyHost UUID

RequestLog.HostID stores the raw Host header seen by the proxy (a
domain), not the ProxyHost UUID, so the previous join on
proxy_hosts.uuid = request_logs.host_id never matched and silently
fell back to showing the domain. Build a domain -> name lookup from
proxy_hosts.domain_names (which can hold several comma-separated
domains per host) and resolve hostnames against that instead.

* feat(stats): add warmup guide and improve empty state UX

- Log successful LogWatcher startup with path info
- Add deployment/warmup guide for stats feature (docs)
- Improve TrafficVolumeChart empty state with helpful messaging
- Add data collection info to tooltip when no data available

* test(TrafficVolumeChart): update empty state text assertions

* fix(stats): replace CSS variable with hex literal in TrafficVolumeChart

SVG presentation attributes cannot resolve CSS custom properties defined
as space-separated RGB triplets (Tailwind v4 token format). Replace
`var(--color-brand-500)` with `LINE_COLOR = '#3b82f6'` so Recharts
renders a visible line stroke.

Add unit tests asserting stroke is a valid hex value and tooltip renders
correctly. Add Playwright step verifying recharts-line-curve path exists
when data is present.

* fix(stats): add LineChart mock and update QA report for feature/stats

* test: fix recharts mock to drop importActual for Vitest ESM compat

* chore(deps): bump go-toml to v2.4.0 and prometheus/common to v0.69.0

---------

Co-authored-by: GitHub Actions <actions@github.com>
…m overrides

Force js-yaml to >=4.2.0 and markdown-it to >=14.2.0 via npm overrides to address
GHSA-h67p-54hq-rp68 and GHSA-6v5v-wf23-fmfq in markdownlint-cli2 transitive deps.
Also correct lint:md scripts to use valid markdownlint-cli2 negated glob syntax
instead of the unsupported --ignore flag.
Add trivyignores: '.trivyignore' to the main-image table and SARIF Trivy
scan steps in docker-build.yml so suppressed CVEs are excluded from both
log output and SARIF uploads, consistent with the PR scan steps.

Align the upload-sarif action SHA in security-pr.yml from the stale
pre-v4.36.2 pin to the canonical 8aad20d SHA (v4.36.2) used by every
other security workflow in this repository.
Add trivyignores: '.trivyignore' to the SARIF scan step in
security-weekly-rebuild.yml so accepted/pending CVE suppressions are
excluded from the weekly Security tab upload, matching all other scan
workflows.

Add id: upload-trivy-weekly to the upload step and a new
'Verify SARIF was uploaded' step that validates the SARIF file exists,
that the upload succeeded, and that the file is parseable JSON. Fails the
workflow visibly instead of silently discarding SARIF upload errors,
preventing the pipeline from going stale undetected as happened Feb-May 2026.
@Wikid82 Wikid82 closed this Jun 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants