test: add summary-cache pytest-benchmark suite with CI regression gate by clean6378-max-it · Pull Request #120 · cppalliance/cppa-cursor-browser

clean6378-max-it · 2026-06-24T19:20:01Z

Closes #115

Summary

Adds a pytest-benchmark suite for services/summary_cache.py with synthetic fixtures and a CI regression gate. No production code changes.

New tests/benchmarks/test_summary_cache_bench.py measures cache hit, cache miss, fingerprint cost (10/50/200 workspace entries), and round-trip write+read latency
New tests/benchmarks/conftest.py builds synthetic workspace trees and isolates cache files to temp dirs (no real Cursor storage)
benchmarks/baselines.json records initial gated means (1.5× slack from local reference run; refresh from ubuntu-latest CI artifact after first green job)
scripts/check_benchmark_regression.py fails CI when any benchmark mean exceeds baseline by >20%
pyproject.toml: add pytest-benchmark dev dep; default --benchmark-skip so normal pytest stays fast
.github/workflows/tests.yml: new benchmarks job on ubuntu-latest

Sprint item #7 (June Week 4, Wednesday). Sequences before Thursday #8 (unified perf benchmark suite).

Test plan

pytest tests/benchmarks/test_summary_cache_bench.py --benchmark-only -o addopts=
python scripts/check_benchmark_regression.py benchmark-results.json benchmarks/baselines.json
pytest tests/benchmarks/test_summary_cache_bench.py -o addopts= (benchmarks skipped via --benchmark-skip)
python -m unittest discover tests -q
Confirm benchmarks CI job is green on ubuntu-latest

Summary by CodeRabbit

New Features
- Added CI performance benchmarking for summary-cache (cache hit/miss, workspace fingerprinting, and cache round-trips).
Bug Fixes
- Updated test execution to avoid inheriting unrelated pytest options.
Chores
- Introduced a benchmark regression gate with updated baselines and CI artifact handling; refreshed benchmark baseline data and ignored generated benchmark outputs.
- Enabled pytest-benchmark in dev tooling and configured pytest benchmark defaults.
Tests
- Added new benchmark test coverage and a dedicated unit test suite for the regression gate logic.

coderabbitai · 2026-06-24T19:20:19Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds summary-cache benchmarks, refreshed baselines, a regression-check script, and CI workflow steps to run, validate, and upload benchmark results.

Changes

Summary Cache Benchmark Validation

Layer / File(s)	Summary
Benchmark config and fixtures `pyproject.toml`, `requirements-lock.txt`, `tests/benchmarks/conftest.py`	Adds `pytest-benchmark` to development dependencies, configures pytest benchmark defaults and marker registration, and adds synthetic workspace fixtures for summary-cache benchmarks.
Benchmark cases `tests/benchmarks/test_summary_cache_bench.py`	Adds benchmark cases for cache hits, cache misses, workspace fingerprinting at 10/50/200 entries, and a set-then-get round trip.
Regression script and baselines `benchmarks/baselines.json`, `scripts/check_benchmark_regression.py`, `tests/test_check_benchmark_regression.py`	Refreshes stored benchmark timings and adds a CLI that loads benchmark results, compares means to stored baselines, and reports regressions or malformed inputs, with test coverage for comparison and validation paths.
CI benchmark job `.github/workflows/tests.yml`, `.gitignore`	Adds a workflow job that runs the benchmarks, checks regression output, uploads `benchmark-results.json`, and ignores generated benchmark artifacts.

Sequence Diagram(s)

sequenceDiagram
  participant benchmarks_job as benchmarks job
  participant pytest as pytest
  participant regression_script as scripts/check_benchmark_regression.py
  participant upload_artifact as actions/upload-artifact

  benchmarks_job->>pytest: run tests/benchmarks/ with --benchmark-only
  pytest-->>benchmarks_job: benchmark-results.json
  benchmarks_job->>regression_script: compare benchmark-results.json to benchmarks/baselines.json
  regression_script-->>benchmarks_job: exit code 0, 1, or 2
  benchmarks_job->>upload_artifact: upload benchmark-results.json with if: always()

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

cppalliance/cppa-cursor-browser#19 — Both PRs modify .github/workflows/tests.yml, so they may overlap in CI job structure and workflow behavior.

Suggested reviewers

wpak-ai
bradjin8

Poem

(_/)
A rabbit hopped through benchmark snow,
With cache hits warm and misses slow.
Baselines glowed in CI light,
And regressions stayed out of sight.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 18.18% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly summarizes the main change: adding a summary-cache benchmark suite and CI regression gate.
Linked Issues check	✅ Passed	The PR matches the issue goals: synthetic summary-cache benchmarks, recorded baselines, and a CI gate for >20% regressions.
Out of Scope Changes check	✅ Passed	The changes are focused on benchmark validation and CI support; no unrelated feature or production code changes stand out.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch test/summary-cache-benchmarks

_{Comment @coderabbitai help to get the list of available commands.}

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

scripts/check_benchmark_regression.py (1)
95-99: 🩺 Stability & Availability | 🔵 Trivial | ⚡ Quick win

A missing benchmark result silently passes the gate.

When a baseline benchmark has no current result, the script only prints a WARN and continues. If a gated benchmark is renamed, removed, or fails to run, the regression gate will report success and the performance coverage silently disappears. Consider treating a missing gated benchmark as a failure (or a separate exit code) so coverage gaps are surfaced.
♻️ Possible approach
     failures: list[str] = []
+    missing: list[str] = []
     for name, base in baseline_means.items():
         cur = flat.get(name)
         if cur is None:
-            print(f"WARN: no current result for baseline {name!r}; skipping")
+            print(f"MISSING: no current result for baseline {name!r}")
+            missing.append(name)
             continue
Then fail (or warn-as-error) when missing is non-empty, per your policy.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/check_benchmark_regression.py` around lines 95 - 99, The benchmark
gating in check_benchmark_regression.py currently skips missing current results
in the baseline loop, which can let renamed, removed, or failing benchmarks pass
unnoticed. Update the logic around the baseline_means/flat comparison so missing
gated benchmarks are collected and treated as a failure (or a distinct non-zero
exit condition) instead of only printing a warning. Make the behavior explicit
in the final exit path of the script so coverage gaps are surfaced alongside the
existing regression checks.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@benchmarks/baselines.json`:
- Around line 1-15: The benchmark baselines in the summary-cache group are still
using Windows-derived values with 1.5x slack, but the regression gate runs on
ubuntu-latest. Refresh the values from a green ubuntu-latest benchmark artifact
and keep the _note consistent with the source platform; if platform-matched
baselines are not yet available, make the gate non-blocking until they are.
Focus on the baselines.json entries under groups.summary-cache so the
check_benchmark_regression.py comparisons are against the same OS.

In `@pyproject.toml`:
- Around line 39-44: The pytest configuration currently applies --benchmark-skip
globally in [tool.pytest.ini_options], which breaks the regular integration
pytest run because pytest-benchmark is not installed there. Update the test
setup so the benchmark plugin is available wherever this addopts setting is
used, or move the --benchmark-skip option to only the benchmark-specific job;
check the pytest configuration and the workflows that invoke python -m pytest,
especially the benchmark-related markers and options.

---

Nitpick comments:
In `@scripts/check_benchmark_regression.py`:
- Around line 95-99: The benchmark gating in check_benchmark_regression.py
currently skips missing current results in the baseline loop, which can let
renamed, removed, or failing benchmarks pass unnoticed. Update the logic around
the baseline_means/flat comparison so missing gated benchmarks are collected and
treated as a failure (or a distinct non-zero exit condition) instead of only
printing a warning. Make the behavior explicit in the final exit path of the
script so coverage gaps are surfaced alongside the existing regression checks.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1cc8514e-6127-42e8-95a6-134382f2ed25

📥 Commits

Reviewing files that changed from the base of the PR and between 793b6f2 and c126227.

📒 Files selected for processing (7)

.github/workflows/tests.yml
.gitignore
benchmarks/baselines.json
pyproject.toml
scripts/check_benchmark_regression.py
tests/benchmarks/conftest.py
tests/benchmarks/test_summary_cache_bench.py

clean6378-max-it · 2026-06-24T19:44:57Z

@CodeRabbit, review this PR in more depth

coderabbitai · 2026-06-24T19:47:28Z

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
echo "=== .github/workflows/tests.yml ==="
cat -n .github/workflows/tests.yml

echo ""
echo "=== pyproject.toml ==="
cat -n pyproject.toml

echo ""
echo "=== scripts/check_benchmark_regression.py ==="
cat -n scripts/check_benchmark_regression.py

echo ""
echo "=== tests/benchmarks/conftest.py ==="
cat -n tests/benchmarks/conftest.py

echo ""
echo "=== tests/benchmarks/test_summary_cache_bench.py ==="
cat -n tests/benchmarks/test_summary_cache_bench.py

echo ""
echo "=== benchmarks/baselines.json ==="
cat -n benchmarks/baselines.json

echo ""
echo "=== .gitignore (last 20 lines) ==="
tail -20 .gitignore

echo ""
echo "=== requirements-lock.txt ==="
cat -n requirements-lock.txt

echo ""
echo "=== services/summary_cache.py ==="
cat -n services/summary_cache.py

echo ""
echo "=== Directory tree of tests/benchmarks/ ==="
tree tests/benchmarks/ 2>/dev/null || find tests/benchmarks/ -type f

echo ""
echo "=== Directory tree of scripts/ ==="
tree scripts/ 2>/dev/null || find scripts/ -type f

echo ""
echo "=== benchmarks/ directory ==="
tree benchmarks/ 2>/dev/null || find benchmarks/ -type f

Length of output: 44058

@clean6378-max-it Here is a deep review of PR #120.

Deep Review: PR `#120` — pytest-benchmark suite for `summary_cache.py`

The PR is well-structured overall — no production code is touched, the fixture isolation via monkeypatch is correct, and the regression gate script has solid error-handling. However, there are several issues ranging from critical to minor that should be addressed before merge.

🔴 Critical

1. Baseline name format may never match pytest-benchmark output (`benchmarks/baselines.json` + `scripts/check_benchmark_regression.py`)

The regression gate in check_regression() (Line 96) looks up current results by name from the baselines. The baselines use short names like test_summary_cache_hit and test_fingerprint_workspace_entries[10].

pytest-benchmark emits the name field as the short test node name (e.g. test_summary_cache_hit) in some configurations, but in others it emits the full node ID including the module path (tests/benchmarks/test_summary_cache_bench.py::test_summary_cache_hit). If the format doesn't match, every baseline entry will show as MISSING and the gate will always fail (exit code 1) — or always pass if the lookup silently misses.

You should verify the exact format by inspecting an actual benchmark-results.json produced by the CI job and cross-referencing the name field against the keys in baselines.json. Consider adding a normalization step in load_results() that strips the module prefix:

# In load_results(), after extracting `name`:
# Normalize "tests/benchmarks/test_x.py::test_foo[param]" → "test_foo[param]"
name = str(name).rsplit("::", 1)[-1]

2. Baselines appear manually fabricated — future-dated and unverified (`benchmarks/baselines.json`)

"updated": "2026-06-25T12:00:00Z",

The timestamp is in the future and the _note claims these values come from "ubuntu-latest CI benchmark-results.json", but there is no CI artifact proving that. Values like 6.3e-05 for both cache hit and cache miss (identical to 6 decimal places) suggest they were hand-written, not measured. If these baselines are too tight for a slower runner, the gate will spuriously fail; if too loose, the gate provides no real protection.

Recommendation: Do not commit hand-crafted baselines. Run the benchmarks in CI first, capture the artifact, update the file from the actual numbers, and commit that as a follow-up.

3. `benchmarks` CI job has no `needs:` dependency on `unittest` (`.github/workflows/tests.yml` Line 219)

The benchmark job runs in parallel with unit tests. If unit tests fail due to a production code regression, benchmarks can still be reported as "passing" — giving a misleading green signal.

# Add this to the benchmarks job:
needs: [unittest]

🟡 Medium

4. `addopts = "--benchmark-disable"` is missing from `pyproject.toml` (Line 39–43)

The PR objectives state: "adds default benchmark skipping for normal test runs." But the [tool.pytest.ini_options] section has no addopts at all:

[tool.pytest.ini_options]
testpaths = ["tests"]
markers = [
    "benchmark: performance benchmarks (pytest-benchmark)",
]

Since testpaths includes tests/, tests/benchmarks/ is collected on every plain pytest invocation. Without --benchmark-disable, pytest-benchmark will execute benchmark functions normally (one call each, with timing overhead). This slows down the developer loop and contradicts the stated goal.

[tool.pytest.ini_options]
testpaths = ["tests"]
addopts = "--benchmark-disable"
markers = [
    "benchmark: performance benchmarks (pytest-benchmark)",
]

Note: the CI benchmark job already overrides this with -o addopts=, so adding it won't break CI.

5. `sys.path` manipulation in `conftest.py` is fragile (Lines 12–14)

REPO_ROOT = os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
if REPO_ROOT not in sys.path:
    sys.path.insert(0, REPO_ROOT)

This is three levels of dirname that breaks silently if the file moves. The idiomatic solution is to install the package in editable mode in CI (pip install -e .) and remove the manual path insertion entirely. Alternatively, the [tool.pytest.ini_options] section can set pythonpath = ["."] (pytest ≥ 7.0):

[tool.pytest.ini_options]
pythonpath = ["."]

6. `pytest-benchmark` not pinned in `requirements-lock.txt` (`.github/workflows/tests.yml` Line 236)

python -m pip install 'pytest>=8,<9' 'pytest-benchmark>=4,<5'

All other runtime deps are resolved from requirements-lock.txt for determinism, but pytest-benchmark is installed ad-hoc. A transitive dep (e.g. py-cpuinfo, attrs) could change versions silently. pytest-benchmark should be pinned in the lock file, or at minimum pinned to a specific patch version in the workflow.

7. `stale_fingerprint` relies on an undocumented assumption (`tests/benchmarks/conftest.py` Line 88–89)

`@pytest.fixture`
def stale_fingerprint(workspace_fingerprint: dict[str, Any]) -> dict[str, Any]:
    return {**workspace_fingerprint, "global_db_mtime_ns": 9_999_999_999}

This works because global_db_path=None causes _file_mtime_ns(None) to return None, making the real fingerprint's global_db_mtime_ns equal to None. Setting it to 9_999_999_999 creates a mismatch. But this is completely implicit — if fingerprint_workspace_storage ever changes its handling of None paths, this fixture silently stops simulating a miss.

A more robust approach is to use a sentinel that always differs:

`@pytest.fixture`
def stale_fingerprint(workspace_fingerprint: dict[str, Any]) -> dict[str, Any]:
    """Return a fingerprint guaranteed to differ from the stored one."""
    return {**workspace_fingerprint, "rules_digest": "deadbeefdeadbeef"}

This works regardless of what global_db_mtime_ns is.

🔵 Minor

8. Missing `tests/benchmarks/init.py`

tests/benchmarks/
├── conftest.py
└── test_summary_cache_bench.py

The directory has no __init__.py. While modern pytest handles this correctly, the rest of tests/ likely uses __init__.py files (consistent with python -m unittest discover). Adding one avoids potential namespace collisions if identically-named modules exist elsewhere.

9. `load_baseline_means` silently skips non-dict group values (`scripts/check_benchmark_regression.py` Line 69)

if not isinstance(value, dict):
    continue  # silently skips

A typo in baselines.json (e.g. a string value for a group) is silently ignored, which could cause a group's benchmarks to never be gated. Consider issuing a warning or raising BenchmarkDataError.

10. Redundant `str()` cast in `load_results` (`scripts/check_benchmark_regression.py` Line 43)

name = str(name)  # name was already set from entry["name"] above

name is set on Line 37 directly from entry["name"], which would have been accepted as str already. The redundant cast can be removed.

11. Missing blank line before first test function (`tests/benchmarks/test_summary_cache_bench.py` Line 16)

PEP 8 requires two blank lines between the import block and the first top-level definition:

from services.summary_cache import (
    fingerprint_workspace_storage,
    get_cached_projects,
    set_cached_projects,
)
                                         # ← two blank lines needed here
`@pytest.mark.benchmark`(group="summary-cache")
def test_summary_cache_hit(...):

12. No unit tests for `check_benchmark_regression.py`

The regression gate script has good internal logic (load_results, load_baseline_means, check_regression) but zero test coverage. Given that this script is the only thing blocking regressions in CI, a small test file (e.g. tests/test_check_benchmark_regression.py) covering the happy path, missing-benchmark, and threshold-exceeded cases would greatly increase confidence.

Summary table

Severity	Issue	File
🔴 Critical	Baseline name format may mismatch pytest-benchmark output	`baselines.json`, `check_benchmark_regression.py`
🔴 Critical	Baselines are future-dated and appear manually fabricated	`baselines.json`
🔴 Critical	`benchmarks` job has no `needs: [unittest]`	`tests.yml`
🟡 Medium	Missing `addopts = "--benchmark-disable"`	`pyproject.toml`
🟡 Medium	Fragile `sys.path` manipulation	`conftest.py`
🟡 Medium	`pytest-benchmark` not pinned in lock file	`tests.yml`, `requirements-lock.txt`
🟡 Medium	`stale_fingerprint` relies on implicit `None` assumption	`conftest.py`
🔵 Minor	Missing `__init__.py` in `tests/benchmarks/`	—
🔵 Minor	Silent skip of non-dict group values	`check_benchmark_regression.py`
🔵 Minor	Redundant `str()` cast	`check_benchmark_regression.py`
🔵 Minor	Missing blank line after imports	`test_summary_cache_bench.py`
🔵 Minor	No unit tests for the regression gate script	—

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@scripts/check_benchmark_regression.py`:
- Around line 17-19: The normalize_benchmark_name helper currently uses
rsplit("::", 1), which can incorrectly strip parts of parametrized benchmark
names when parameter values include "::". Update normalize_benchmark_name to
split on the first "::" only so pytest node prefixes are removed without
truncating names like test_x[param::v], and keep the behavior consistent for
both short and full benchmark names.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c745c7f7-791c-4802-9b4a-d7dc19639498

📥 Commits

Reviewing files that changed from the base of the PR and between e4d79a6 and 83bb248.

📒 Files selected for processing (8)

.github/workflows/tests.yml
benchmarks/baselines.json
pyproject.toml
scripts/check_benchmark_regression.py
tests/benchmarks/__init__.py
tests/benchmarks/conftest.py
tests/benchmarks/test_summary_cache_bench.py
tests/test_check_benchmark_regression.py

✅ Files skipped from review due to trivial changes (1)

benchmarks/baselines.json

🚧 Files skipped from review as they are similar to previous changes (2)

tests/benchmarks/test_summary_cache_bench.py
.github/workflows/tests.yml

…, min-rounds)

wpak-ai

Update the docstrings and you're ready to go.

…, min-rounds)

test: add summary-cache pytest-benchmark suite with CI regression gate

c126227

coderabbitai Bot reviewed Jun 24, 2026

View reviewed changes

Comment thread benchmarks/baselines.json

Comment thread pyproject.toml

fix(ci): unblock unittest matrix and seed ubuntu benchmark baselines

e4d79a6

fix(test): address PR #120 benchmark review (gate, CI, baselines)

83bb248

coderabbitai Bot reviewed Jun 24, 2026

View reviewed changes

Comment thread scripts/check_benchmark_regression.py Outdated

fix(test): normalize benchmark names without truncating param values

b1c2f93

bradjin8 requested changes Jun 24, 2026

View reviewed changes

clean6378-max-it added 2 commits June 26, 2026 00:03

fix(ci): address PR #120 review (gate tests in matrix, benchmark-skip…

f93160f

…, min-rounds)

Fix: Restored the standard # via flask comment

be3f46c

clean6378-max-it requested a review from bradjin8 June 25, 2026 16:23

bradjin8 approved these changes Jun 25, 2026

View reviewed changes

bradjin8 requested a review from wpak-ai June 25, 2026 18:34

cppalliance deleted a comment from coderabbitai Bot Jun 25, 2026

wpak-ai approved these changes Jun 25, 2026

View reviewed changes

Comment thread .github/workflows/tests.yml Outdated

Comment thread tests/benchmarks/conftest.py Outdated

Fix issue referral numbers

33a780e

clean6378-max-it merged commit ecd08c6 into master Jun 25, 2026
17 checks passed

clean6378-max-it added a commit that referenced this pull request Jun 25, 2026

fix(test): address PR #120 benchmark review (gate, CI, baselines)

bfa0e43

clean6378-max-it added a commit that referenced this pull request Jun 25, 2026

fix(ci): address PR #120 review (gate tests in matrix, benchmark-skip…

aafe4ab

…, min-rounds)

clean6378-max-it deleted the test/summary-cache-benchmarks branch June 25, 2026 19:37

clean6378-max-it mentioned this pull request Jun 25, 2026

ci: unified benchmark suite with full baselines and regression gate #121

Merged

9 tasks

Uh oh!

Conversation

clean6378-max-it commented Jun 24, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

clean6378-max-it commented Jun 24, 2026

Uh oh!

coderabbitai Bot commented Jun 24, 2026

Deep Review: PR #120 — pytest-benchmark suite for summary_cache.py

🔴 Critical

1. Baseline name format may never match pytest-benchmark output (benchmarks/baselines.json + scripts/check_benchmark_regression.py)

2. Baselines appear manually fabricated — future-dated and unverified (benchmarks/baselines.json)

3. benchmarks CI job has no needs: dependency on unittest (.github/workflows/tests.yml Line 219)

🟡 Medium

4. addopts = "--benchmark-disable" is missing from pyproject.toml (Line 39–43)

5. sys.path manipulation in conftest.py is fragile (Lines 12–14)

6. pytest-benchmark not pinned in requirements-lock.txt (.github/workflows/tests.yml Line 236)

7. stale_fingerprint relies on an undocumented assumption (tests/benchmarks/conftest.py Line 88–89)

🔵 Minor

8. Missing tests/benchmarks/__init__.py

9. load_baseline_means silently skips non-dict group values (scripts/check_benchmark_regression.py Line 69)

10. Redundant str() cast in load_results (scripts/check_benchmark_regression.py Line 43)

11. Missing blank line before first test function (tests/benchmarks/test_summary_cache_bench.py Line 16)

12. No unit tests for check_benchmark_regression.py

Summary table

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wpak-ai left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

clean6378-max-it commented Jun 24, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 24, 2026 •

edited

Loading

Deep Review: PR `#120` — pytest-benchmark suite for `summary_cache.py`

1. Baseline name format may never match pytest-benchmark output (`benchmarks/baselines.json` + `scripts/check_benchmark_regression.py`)

2. Baselines appear manually fabricated — future-dated and unverified (`benchmarks/baselines.json`)

3. `benchmarks` CI job has no `needs:` dependency on `unittest` (`.github/workflows/tests.yml` Line 219)

4. `addopts = "--benchmark-disable"` is missing from `pyproject.toml` (Line 39–43)

5. `sys.path` manipulation in `conftest.py` is fragile (Lines 12–14)

6. `pytest-benchmark` not pinned in `requirements-lock.txt` (`.github/workflows/tests.yml` Line 236)

7. `stale_fingerprint` relies on an undocumented assumption (`tests/benchmarks/conftest.py` Line 88–89)

8. Missing `tests/benchmarks/init.py`

9. `load_baseline_means` silently skips non-dict group values (`scripts/check_benchmark_regression.py` Line 69)

10. Redundant `str()` cast in `load_results` (`scripts/check_benchmark_regression.py` Line 43)

11. Missing blank line before first test function (`tests/benchmarks/test_summary_cache_bench.py` Line 16)

12. No unit tests for `check_benchmark_regression.py`