Skip to content

feat: integrate openms-insight into quantms#28

Open
hjn0415a wants to merge 2 commits into
OpenMS:mainfrom
hjn0415a:feature/integrate-openms-insight
Open

feat: integrate openms-insight into quantms#28
hjn0415a wants to merge 2 commits into
OpenMS:mainfrom
hjn0415a:feature/integrate-openms-insight

Conversation

@hjn0415a

@hjn0415a hjn0415a commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Implement 5 new Streamlit analysis pipeline pages powered by the openms-insight engine to handle downstream proteomics workflows:

  • Data Filtering (filtering.py):

    • Support Low Abundance, Low Repeatability, and Low Variance filters using Polars LazyFrame.
  • Missing Value Imputation (imputation.py):

    • Connect upstream filtered datasets using session state fallback.
    • Provide group-aware MAR (mean/median) and technical dropout MNAR (row/global minimum) methods.
  • Data Normalization & Scaling (normalization.py):

    • Build a 3-step preprocessing chain: Mathematical transformation, column-wise sample normalization (PQn, Quantile, Reference Feature), and row-wise scaling.
  • Statistical Inference (statistical.py):

    • Dynamically route test methods (limma-like, Welch, ANOVA) based on detected group counts.
    • Integrate multiple testing corrections (BH, Bonferroni).
  • GO Enrichment Analysis (enrichment.py):

    • Fetch foreground lists dynamically via P-value and Log2FC constraints.
    • Query MyGene.info API and render interactive Plotly charts across BP, CC, and MF tabs.

Summary by CodeRabbit

  • New Features
    • Added dedicated workflow pages for data filtering, imputation, normalization, statistical inference, and GO enrichment.
    • Updated Results navigation to include GO enrichment (and kept spectral library).
  • Bug Fixes
    • Strengthened handling of missing prerequisite results (e.g., statistics) with clearer messaging and safe stops.
  • Refactor
    • Updated results visualization pipeline: volcano and heatmap now use interactive components; PCA generation is delegated to a dedicated PCA plot tool.
    • Streamlined abundance results to focus on protein-level display without extra statistical columns.
  • Chores
    • Added polars to required dependencies.

Implement 5 new Streamlit analysis pipeline pages powered by the openms-insight engine to handle downstream proteomics workflows:

- Data Filtering (filtering.py):
  * Support Low Abundance, Low Repeatability, and Low Variance filters using Polars LazyFrame.

- Missing Value Imputation (imputation.py):
  * Connect upstream filtered datasets using session state fallback.
  * Provide group-aware MAR (mean/median) and technical dropout MNAR (row/global minimum) methods.

- Data Normalization & Scaling (normalization.py):
  * Build a 3-step preprocessing chain: Mathematical transformation, column-wise sample normalization (PQn, Quantile, Reference Feature), and row-wise scaling.

- Statistical Inference (statistical.py):
  * Dynamically route test methods (limma-like, Welch, ANOVA) based on detected group counts.
  * Integrate multiple testing corrections (BH, Bonferroni).

- GO Enrichment Analysis (enrichment.py):
  * Fetch foreground lists dynamically via P-value and Log2FC constraints.
  * Query MyGene.info API and render interactive Plotly charts across BP, CC, and MF tabs.
@coderabbitai

coderabbitai Bot commented Jun 25, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

The PR changes abundance loading to return a pivot table and group map, adds filtering/imputation/normalization/statistical/enrichment pages, and rewires heatmap/PCA/volcano results to updated data sources and OpenMS Insight components. The results navigation also replaces Proteomics LFQ with Pathway Analysis.

Changes

Abundance preprocessing, analysis, and results

Layer / File(s) Summary
Abundance loading and protein table
src/common/results_helpers.py, content/results_abundance.py
load_abundance_data now returns pivot_df and clean_group_map, and the Protein-Level table rebuilds Intensity from sample columns instead of showing the old statistical fields.
Filtering and imputation pages
content/filtering.py, content/imputation.py
The filtering page writes filtered_df from the abundance pivot table, and the imputation page uses filtered_df or the pivot table to produce imputed_df.
Normalization, statistics, and enrichment
content/normalization.py, content/statistical.py, content/enrichment.py
The normalization page selects imputed_df, filtered_df, or the pivot table before storing normalized_df; the statistical page selects normalized_df, imputed_df, filtered_df, or the pivot table before storing statistics_df; the enrichment page reads statistics_df to render GO term results.
Results visualizations and navigation
content/results_heatmap.py, content/results_pca.py, content/results_volcano.py, quantms_protein_heatmap/manifest.json, quantms_volcano_plot/manifest.json, requirements.txt, app.py
Heatmap, PCA, and volcano pages now consume the updated abundance/statistics artifacts and render OpenMS Insight components; the manifests and polars dependency support those components, and the results navigation now includes Pathway Analysis instead of Proteomics LFQ.

Sequence Diagram(s)

sequenceDiagram
  participant FilteringPage as content/filtering.py
  participant ImputationPage as content/imputation.py
  participant NormalizationPage as content/normalization.py
  participant StatisticalPage as content/statistical.py
  participant EnrichmentPage as content/enrichment.py
  participant SessionState as st.session_state
  participant EnrichmentEngine as calculate_go_enrichment
  FilteringPage->>SessionState: store filtered_df
  ImputationPage->>SessionState: store imputed_df
  NormalizationPage->>SessionState: store normalized_df
  StatisticalPage->>SessionState: store statistics_df
  EnrichmentPage->>SessionState: read statistics_df
  EnrichmentPage->>EnrichmentEngine: calculate_go_enrichment(...)
  EnrichmentEngine-->>EnrichmentPage: status and GO results
Loading

Poem

A bunny hopped through data bright,
From pivots grew a richer sight.
With filters, folds, and GO terms too,
New charts and pages sprang anew.
🐇✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main change: integrating OpenMS-Insight into QuantMS.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 OpenGrep (1.23.0)
content/results_pca.py

┌──────────────┐
│ Opengrep CLI │
└──────────────┘

�[32m✔�[39m �[1mOpengrep OSS�[0m
�[32m✔�[39m Basic security coverage for first-party code vulnerabilities.

[00.16][ERROR]: unable to find a config; path .coderabbit-opengrep-fallback.yml does not exist

content/results_volcano.py

┌──────────────┐
│ Opengrep CLI │
└──────────────┘

�[32m✔�[39m �[1mOpengrep OSS�[0m
�[32m✔�[39m Basic security coverage for first-party code vulnerabilities.

[00.17][ERROR]: unable to find a config; path .coderabbit-opengrep-fallback.yml does not exist

quantms_volcano_plot/manifest.json

┌──────────────┐
│ Opengrep CLI │
└──────────────┘

�[32m✔�[39m �[1mOpengrep OSS�[0m
�[32m✔�[39m Basic security coverage for first-party code vulnerabilities.

[00.12][ERROR]: unable to find a config; path .coderabbit-opengrep-fallback.yml does not exist

  • 4 others

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🧹 Nitpick comments (5)
content/imputation.py (1)

65-134: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Use an @st.fragment for the imputation controls/results section.

This page adds interactive widgets plus button-driven rendering without fragment scoping, so each change reruns the full page. As per coding guidelines, **/*.py: Use @st.fragment decorator for interactive UI updates without full page reloads.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@content/imputation.py` around lines 65 - 134, The imputation controls/results
block in content/imputation.py should be scoped into a Streamlit fragment so
widget changes and the Apply Imputation button do not rerun the whole page. Wrap
the interactive section that builds metadata_pl, renders the selectbox/radio
controls, and handles the imputation execution around impute_mar and
impute_smallest_value with an `@st.fragment-decorated` function, then call that
fragment from the page so the UI updates stay localized.

Source: Coding guidelines

content/normalization.py (1)

79-176: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Put the normalization UI into an @st.fragment.

This workflow is entirely widget-driven, but it is still executed at page scope, so every interaction reruns the whole page. As per coding guidelines, **/*.py: Use @st.fragment decorator for interactive UI updates without full page reloads.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@content/normalization.py` around lines 79 - 176, The normalization widget
workflow in the current page-scope block should be moved into an `@st.fragment` so
interactions only rerun that UI segment. Wrap the UI and button-driven pipeline
logic around the selectboxes, st.text_input, and the "Apply Normalization
Pipelines" flow in a fragment function, then call that fragment from the page
body so the rest of the page is not re-executed on every widget change.

Source: Coding guidelines

content/statistical.py (1)

85-158: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Extract the statistical controls/results into an @st.fragment.

This page adds interactive analysis widgets without fragment scoping, so the entire page reruns on every control change. As per coding guidelines, **/*.py: Use @st.fragment decorator for interactive UI updates without full page reloads.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@content/statistical.py` around lines 85 - 158, The statistical controls and
results block in the main Streamlit flow reruns the entire page on every widget
change; move the interactive UI and execution logic into an `@st.fragment` so
updates are scoped locally. Wrap the sections that build metadata_pl, render the
selectboxes, and handle the Run Statistical Analysis button/results inside a
fragment function, keeping the existing calculate_statistical_tests and
adjust_fdr_lazy calls together there.

Source: Coding guidelines

content/filtering.py (1)

58-163: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Move the interactive filter block into an @st.fragment.

The controls and result view are rendered at module scope, so every widget interaction reruns the whole page instead of just this analysis section. As per coding guidelines, **/*.py: Use @st.fragment decorator for interactive UI updates without full page reloads.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@content/filtering.py` around lines 58 - 163, The filter UI and result
rendering in the filtering section are still running at module scope, so every
widget change reruns the whole page; wrap this interactive block in an
`@st.fragment` to keep updates scoped. Move the controls, Apply Filter button, and
filtered result display into a fragment function in content/filtering.py, and
keep the existing filter_method, filter_low_abundance, filter_low_repeatability,
filter_low_variance, and filtered_df logic inside that fragment so the behavior
stays the same but only this section refreshes.

Source: Coding guidelines

content/enrichment.py (1)

50-140: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Use an @st.fragment for the enrichment controls and result tabs.

This page is another interactive analysis surface implemented at top level, so the whole page reruns on every control change. As per coding guidelines, **/*.py: Use @st.fragment decorator for interactive UI updates without full page reloads.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@content/enrichment.py` around lines 50 - 140, The enrichment controls and
result tabs are implemented directly in the top-level Streamlit flow, causing
full-page reruns on every widget change. Refactor the UI block around the
threshold inputs, the run button, and the tabbed GO results into a dedicated
function and decorate it with `@st.fragment` so only that section updates
interactively. Keep the existing logic in calculate_go_enrichment, the
target_p_col/p_label setup, and the tabs rendering inside the fragment.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@content/enrichment.py`:
- Around line 38-39: The fallback navigation in the `st.page_link` call
currently points to the wrong page module, which can break the prerequisite flow
if that target does not exist. Update the link in `content/enrichment.py` to use
the same registered statistics page path that `app.py` registers for the
statistics screen, keeping the label and icon unchanged.
- Around line 45-54: The enrichment setup currently validates only ProteinName,
but the later call path also depends on log2FC and a valid p-value column. In
content/enrichment.py, extend the existing preflight checks near target_p_col
selection to verify final_statistics_report contains log2FC plus either p-adj or
p-value before reaching calculate_go_enrichment. If any required column is
missing, show a clear st.error and st.stop early so the page fails fast with a
useful message.

In `@content/filtering.py`:
- Around line 142-145: The filter update in filtering.py should also invalidate
dependent analysis state so downstream pages stop using stale data. In the
filtering flow where filtered_df is written in the filtering logic, clear or
remove the existing imputed_df, normalized_df, and statistics_df entries from
st.session_state whenever the filter result changes. Make the change in the same
place that updates filtered_df so normalization and statistical paths that read
from imputed_df/normalized_df in content/normalization.py and
content/statistical.py will recompute from the new filtered data instead of
reusing old cached values.

In `@content/imputation.py`:
- Around line 124-128: After re-imputing in the imputation flow, clear any
downstream derived session-state artifacts that depend on the old data, not just
overwrite imputed_df. Update the logic around imputed_lazy.collect().to_pandas()
and the st.session_state assignments so normalized_df and statistics_df are
removed or reset whenever imputed_df is refreshed, ensuring later steps like
statistical and enrichment processing do not consume stale outputs.

In `@content/normalization.py`:
- Around line 166-170: The normalization step updates normalized_df but leaves
any existing statistics_df in session state, so downstream enrichment can use
stale statistics after a new normalization run. Update the normalization flow in
the final checkpointing block that sets st.session_state["normalized_df"] to
also clear or invalidate st.session_state["statistics_df"] whenever the
normalized data changes, so content/enrichment.py will not reuse mismatched
statistics.

In `@content/statistical.py`:
- Around line 103-157: The `anova` option in the `selected_method` flow is
incompatible with the downstream `log2FC`-based schema used after
`calculate_statistical_tests` and by consumers like `content/enrichment.py`.
Update the 3+ group branch in `content/statistical.py` so `method_options` only
includes methods that still produce a single per-row `log2FC`/p-value contract,
or change the downstream contract and documentation if ANOVA must remain. Make
sure the selected method path, result table messaging, and any dependent
filtering logic stay consistent with the same output columns.
- Around line 100-115: The statistical test selector in the group_count == 2
branch currently exposes "paired" even though no pair metadata is captured, so
remove that option from method_options in the selectbox flow and keep only
unpaired methods like "limma_like" and "welch"; update any related help_text or
validation in content/statistical.py so the UI no longer suggests paired
analysis until explicit pair assignment support exists.

In `@src/common/results_helpers.py`:
- Around line 234-246: The sample grouping logic in the results helper is
truncating to only the first two groups, which drops any additional groups
before downstream processing. Update the grouping and sample selection in the
results-building flow around the unique symbols df, sample_group_df,
group1_samples, group2_samples, and all_samples so it preserves every group
present in the data instead of hard-coding groups[0] and groups[1]. Make sure
the pivot/multi-group path receives the full set of samples and groups, with no
silent filtering when more than two groups exist.

---

Nitpick comments:
In `@content/enrichment.py`:
- Around line 50-140: The enrichment controls and result tabs are implemented
directly in the top-level Streamlit flow, causing full-page reruns on every
widget change. Refactor the UI block around the threshold inputs, the run
button, and the tabbed GO results into a dedicated function and decorate it with
`@st.fragment` so only that section updates interactively. Keep the existing logic
in calculate_go_enrichment, the target_p_col/p_label setup, and the tabs
rendering inside the fragment.

In `@content/filtering.py`:
- Around line 58-163: The filter UI and result rendering in the filtering
section are still running at module scope, so every widget change reruns the
whole page; wrap this interactive block in an `@st.fragment` to keep updates
scoped. Move the controls, Apply Filter button, and filtered result display into
a fragment function in content/filtering.py, and keep the existing
filter_method, filter_low_abundance, filter_low_repeatability,
filter_low_variance, and filtered_df logic inside that fragment so the behavior
stays the same but only this section refreshes.

In `@content/imputation.py`:
- Around line 65-134: The imputation controls/results block in
content/imputation.py should be scoped into a Streamlit fragment so widget
changes and the Apply Imputation button do not rerun the whole page. Wrap the
interactive section that builds metadata_pl, renders the selectbox/radio
controls, and handles the imputation execution around impute_mar and
impute_smallest_value with an `@st.fragment-decorated` function, then call that
fragment from the page so the UI updates stay localized.

In `@content/normalization.py`:
- Around line 79-176: The normalization widget workflow in the current
page-scope block should be moved into an `@st.fragment` so interactions only rerun
that UI segment. Wrap the UI and button-driven pipeline logic around the
selectboxes, st.text_input, and the "Apply Normalization Pipelines" flow in a
fragment function, then call that fragment from the page body so the rest of the
page is not re-executed on every widget change.

In `@content/statistical.py`:
- Around line 85-158: The statistical controls and results block in the main
Streamlit flow reruns the entire page on every widget change; move the
interactive UI and execution logic into an `@st.fragment` so updates are scoped
locally. Wrap the sections that build metadata_pl, render the selectboxes, and
handle the Run Statistical Analysis button/results inside a fragment function,
keeping the existing calculate_statistical_tests and adjust_fdr_lazy calls
together there.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 3f986db5-e0e1-48bf-91e8-54d28dc2a8c8

📥 Commits

Reviewing files that changed from the base of the PR and between 83646ec and f2e0acb.

📒 Files selected for processing (8)
  • app.py
  • content/enrichment.py
  • content/filtering.py
  • content/imputation.py
  • content/normalization.py
  • content/results_abundance.py
  • content/statistical.py
  • src/common/results_helpers.py

Comment thread content/enrichment.py
Comment on lines +38 to +39
st.page_link(
"content/results_statistics.py", label="Go to Statistical Inference", icon="🔬"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Point this fallback link at the registered statistics page.

app.py registers content/statistical.py, but this button sends users to content/results_statistics.py. If that file is absent, the prerequisite flow dead-ends here.

Proposed fix
     st.page_link(
-        "content/results_statistics.py", label="Go to Statistical Inference", icon="🔬"
+        "content/statistical.py", label="Go to Statistical Inference", icon="🔬"
     )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
st.page_link(
"content/results_statistics.py", label="Go to Statistical Inference", icon="🔬"
st.page_link(
"content/statistical.py", label="Go to Statistical Inference", icon="🔬"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@content/enrichment.py` around lines 38 - 39, The fallback navigation in the
`st.page_link` call currently points to the wrong page module, which can break
the prerequisite flow if that target does not exist. Update the link in
`content/enrichment.py` to use the same registered statistics page path that
`app.py` registers for the statistics screen, keeping the label and icon
unchanged.

Comment thread content/enrichment.py
Comment on lines +45 to +54
id_col = "ProteinName"
if id_col not in final_statistics_report.columns:
st.error(f"❌ Structural Error: Column '{id_col}' is missing from the active matrix context.")
st.stop()

# --- SECTION 1: Parameter Setup & Dynamic Cutoff Labels ---
st.subheader("Configure Enrichment Thresholds")

# Check if target p-value should be adjusted or raw based on previous selections (Fallback safely to 'p-adj')
target_p_col = "p-adj" if "p-adj" in final_statistics_report.columns else "p-value"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Validate every column the enrichment call depends on.

This only checks ProteinName, but the execution path also requires log2FC and one of p-adj / p-value. Right now the page can pass an incompatible statistics table into calculate_go_enrichment and fail much later with a less useful error.

Proposed fix
-# Identify target identifier columns dynamically
-id_col = "ProteinName"
-if id_col not in final_statistics_report.columns:
-    st.error(f"❌ Structural Error: Column '{id_col}' is missing from the active matrix context.")
-    st.stop()
+required_cols = {"ProteinName", "log2FC"}
+missing_cols = required_cols - set(final_statistics_report.columns)
+if missing_cols:
+    st.error(
+        f"❌ Structural Error: Missing required columns: {', '.join(sorted(missing_cols))}."
+    )
+    st.stop()
+
+id_col = "ProteinName"
 
 # --- SECTION 1: Parameter Setup & Dynamic Cutoff Labels ---
 st.subheader("Configure Enrichment Thresholds")
 
-# Check if target p-value should be adjusted or raw based on previous selections (Fallback safely to 'p-adj')
-target_p_col = "p-adj" if "p-adj" in final_statistics_report.columns else "p-value"
+if "p-adj" in final_statistics_report.columns:
+    target_p_col = "p-adj"
+elif "p-value" in final_statistics_report.columns:
+    target_p_col = "p-value"
+else:
+    st.error("❌ Structural Error: Missing both 'p-adj' and 'p-value'.")
+    st.stop()
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
id_col = "ProteinName"
if id_col not in final_statistics_report.columns:
st.error(f"❌ Structural Error: Column '{id_col}' is missing from the active matrix context.")
st.stop()
# --- SECTION 1: Parameter Setup & Dynamic Cutoff Labels ---
st.subheader("Configure Enrichment Thresholds")
# Check if target p-value should be adjusted or raw based on previous selections (Fallback safely to 'p-adj')
target_p_col = "p-adj" if "p-adj" in final_statistics_report.columns else "p-value"
required_cols = {"ProteinName", "log2FC"}
missing_cols = required_cols - set(final_statistics_report.columns)
if missing_cols:
st.error(
f"❌ Structural Error: Missing required columns: {', '.join(sorted(missing_cols))}."
)
st.stop()
id_col = "ProteinName"
# --- SECTION 1: Parameter Setup & Dynamic Cutoff Labels ---
st.subheader("Configure Enrichment Thresholds")
if "p-adj" in final_statistics_report.columns:
target_p_col = "p-adj"
elif "p-value" in final_statistics_report.columns:
target_p_col = "p-value"
else:
st.error("❌ Structural Error: Missing both 'p-adj' and 'p-value'.")
st.stop()
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@content/enrichment.py` around lines 45 - 54, The enrichment setup currently
validates only ProteinName, but the later call path also depends on log2FC and a
valid p-value column. In content/enrichment.py, extend the existing preflight
checks near target_p_col selection to verify final_statistics_report contains
log2FC plus either p-adj or p-value before reaching calculate_go_enrichment. If
any required column is missing, show a clear st.error and st.stop early so the
page fails fast with a useful message.

Comment thread content/filtering.py
Comment on lines +142 to +145
# Collect the evaluated lazy graph and convert back to Pandas for visualization
filtered_df = filtered_lazy.collect().to_pandas()
st.session_state["filtered_df"] = filtered_df

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win

Invalidate downstream analysis state when the filter output changes.

After Line 144 writes a new filtered_df, the old imputed_df, normalized_df, and statistics_df remain live. content/normalization.py prefers imputed_df over filtered_df, and content/statistical.py prefers normalized_df/imputed_df, so later pages can keep running on stale pre-filter data.

Proposed fix
     # Collect the evaluated lazy graph and convert back to Pandas for visualization
     filtered_df = filtered_lazy.collect().to_pandas()
+    for key in ("imputed_df", "normalized_df", "statistics_df"):
+        st.session_state.pop(key, None)
     st.session_state["filtered_df"] = filtered_df
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Collect the evaluated lazy graph and convert back to Pandas for visualization
filtered_df = filtered_lazy.collect().to_pandas()
st.session_state["filtered_df"] = filtered_df
# Collect the evaluated lazy graph and convert back to Pandas for visualization
filtered_df = filtered_lazy.collect().to_pandas()
for key in ("imputed_df", "normalized_df", "statistics_df"):
st.session_state.pop(key, None)
st.session_state["filtered_df"] = filtered_df
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@content/filtering.py` around lines 142 - 145, The filter update in
filtering.py should also invalidate dependent analysis state so downstream pages
stop using stale data. In the filtering flow where filtered_df is written in the
filtering logic, clear or remove the existing imputed_df, normalized_df, and
statistics_df entries from st.session_state whenever the filter result changes.
Make the change in the same place that updates filtered_df so normalization and
statistical paths that read from imputed_df/normalized_df in
content/normalization.py and content/statistical.py will recompute from the new
filtered data instead of reusing old cached values.

Comment thread content/imputation.py
Comment on lines +124 to +128
# Resolve lazy graph optimization tree and push to display data frame structure
imputed_df = imputed_lazy.collect().to_pandas()

# 💾 Save current output into Session State for down-stream processing (Normalization, Statistics)
st.session_state["imputed_df"] = imputed_df

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win

Clear derived normalization/statistics outputs after re-imputation.

When Line 128 overwrites imputed_df, the old normalized_df and statistics_df are left intact. content/statistical.py will keep preferring normalized_df, and content/enrichment.py can still read stale statistics_df, so the pipeline becomes internally inconsistent.

Proposed fix
     # Resolve lazy graph optimization tree and push to display data frame structure
     imputed_df = imputed_lazy.collect().to_pandas()

     # 💾 Save current output into Session State for down-stream processing (Normalization, Statistics)
+    for key in ("normalized_df", "statistics_df"):
+        st.session_state.pop(key, None)
     st.session_state["imputed_df"] = imputed_df
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Resolve lazy graph optimization tree and push to display data frame structure
imputed_df = imputed_lazy.collect().to_pandas()
# 💾 Save current output into Session State for down-stream processing (Normalization, Statistics)
st.session_state["imputed_df"] = imputed_df
# Resolve lazy graph optimization tree and push to display data frame structure
imputed_df = imputed_lazy.collect().to_pandas()
# 💾 Save current output into Session State for down-stream processing (Normalization, Statistics)
for key in ("normalized_df", "statistics_df"):
st.session_state.pop(key, None)
st.session_state["imputed_df"] = imputed_df
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@content/imputation.py` around lines 124 - 128, After re-imputing in the
imputation flow, clear any downstream derived session-state artifacts that
depend on the old data, not just overwrite imputed_df. Update the logic around
imputed_lazy.collect().to_pandas() and the st.session_state assignments so
normalized_df and statistics_df are removed or reset whenever imputed_df is
refreshed, ensuring later steps like statistical and enrichment processing do
not consume stale outputs.

Comment thread content/normalization.py
Comment on lines +166 to +170
# Finalize and collect pipeline query graph optimizations
normalized_df = processing_lazy.collect().to_pandas()

# 💾 Save processing checkpoint inside Session State for Downstream (Statistics Block)
st.session_state["normalized_df"] = normalized_df

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win

Invalidate stale statistics after normalization changes.

Line 170 updates normalized_df, but any existing statistics_df survives. Since content/enrichment.py reads statistics_df directly from session state, users can see enrichment results for a different normalization run until they manually rerun statistics.

Proposed fix
         # Finalize and collect pipeline query graph optimizations
         normalized_df = processing_lazy.collect().to_pandas()

         # 💾 Save processing checkpoint inside Session State for Downstream (Statistics Block)
+        st.session_state.pop("statistics_df", None)
         st.session_state["normalized_df"] = normalized_df
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Finalize and collect pipeline query graph optimizations
normalized_df = processing_lazy.collect().to_pandas()
# 💾 Save processing checkpoint inside Session State for Downstream (Statistics Block)
st.session_state["normalized_df"] = normalized_df
# Finalize and collect pipeline query graph optimizations
normalized_df = processing_lazy.collect().to_pandas()
# 💾 Save processing checkpoint inside Session State for Downstream (Statistics Block)
st.session_state.pop("statistics_df", None)
st.session_state["normalized_df"] = normalized_df
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@content/normalization.py` around lines 166 - 170, The normalization step
updates normalized_df but leaves any existing statistics_df in session state, so
downstream enrichment can use stale statistics after a new normalization run.
Update the normalization flow in the final checkpointing block that sets
st.session_state["normalized_df"] to also clear or invalidate
st.session_state["statistics_df"] whenever the normalized data changes, so
content/enrichment.py will not reuse mismatched statistics.

Comment thread content/statistical.py
Comment on lines +100 to +115
if group_count == 2:
method_options = ["limma_like", "welch", "paired"]
help_text = "'limma_like' uses Empirical Bayes variance shrinking. 'welch' is for unequal variances. 'paired' is for dependent samples."
elif group_count >= 3:
method_options = ["limma_like", "anova"]
help_text = "'limma_like' supports multi-group design matrices. 'anova' computes standard row-wise One-way ANOVA."
else:
st.error("❌ Statistical testing requires at least 2 unique sample groups.")
st.stop()

selected_method = st.selectbox(
"Select Statistical Test",
options=method_options,
index=0,
help=help_text
)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Remove the paired-test option until pair metadata exists.

Nothing in this flow captures pair assignments, so paired can only pair samples by incidental column order. That makes the result statistically wrong for ordinary unpaired cohorts.

Proposed fix
     if group_count == 2:
-        method_options = ["limma_like", "welch", "paired"]
-        help_text = "'limma_like' uses Empirical Bayes variance shrinking. 'welch' is for unequal variances. 'paired' is for dependent samples."
+        method_options = ["limma_like", "welch"]
+        help_text = "'limma_like' uses Empirical Bayes variance shrinking. 'welch' is for unequal variances."
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if group_count == 2:
method_options = ["limma_like", "welch", "paired"]
help_text = "'limma_like' uses Empirical Bayes variance shrinking. 'welch' is for unequal variances. 'paired' is for dependent samples."
elif group_count >= 3:
method_options = ["limma_like", "anova"]
help_text = "'limma_like' supports multi-group design matrices. 'anova' computes standard row-wise One-way ANOVA."
else:
st.error("❌ Statistical testing requires at least 2 unique sample groups.")
st.stop()
selected_method = st.selectbox(
"Select Statistical Test",
options=method_options,
index=0,
help=help_text
)
if group_count == 2:
method_options = ["limma_like", "welch"]
help_text = "'limma_like' uses Empirical Bayes variance shrinking. 'welch' is for unequal variances."
elif group_count >= 3:
method_options = ["limma_like", "anova"]
help_text = "'limma_like' supports multi-group design matrices. 'anova' computes standard row-wise One-way ANOVA."
else:
st.error("❌ Statistical testing requires at least 2 unique sample groups.")
st.stop()
selected_method = st.selectbox(
"Select Statistical Test",
options=method_options,
index=0,
help=help_text
)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@content/statistical.py` around lines 100 - 115, The statistical test selector
in the group_count == 2 branch currently exposes "paired" even though no pair
metadata is captured, so remove that option from method_options in the selectbox
flow and keep only unpaired methods like "limma_like" and "welch"; update any
related help_text or validation in content/statistical.py so the UI no longer
suggests paired analysis until explicit pair assignment support exists.

Comment thread content/statistical.py
Comment on lines +103 to +157
elif group_count >= 3:
method_options = ["limma_like", "anova"]
help_text = "'limma_like' supports multi-group design matrices. 'anova' computes standard row-wise One-way ANOVA."
else:
st.error("❌ Statistical testing requires at least 2 unique sample groups.")
st.stop()

selected_method = st.selectbox(
"Select Statistical Test",
options=method_options,
index=0,
help=help_text
)

with col2:
st.markdown("### 🛡️ 2. Multiple Testing Correction (FDR)")
selected_fdr = st.selectbox(
"Select FDR Adjustment Strategy",
options=["BH", "Bonferroni", "None"],
index=0,
help="'BH' (Benjamini-Hochberg) controls False Discovery Rate. 'Bonferroni' is strict Family-Wise Error Rate control."
)

# --- SECTION 3: Statistical Query Execution ---
st.markdown("<br>", unsafe_allow_html=True)
if st.button("Run Statistical Analysis", type="primary"):

# Convert active pandas dataframe into polars lazyframe graph
stats_lazy = pl.from_pandas(base_df).lazy()

try:
# Execute Chain 1: Calculate core statistics (Adds log2FC, stat, p-value)
stats_lazy = calculate_statistical_tests(
quantification_data=stats_lazy,
metadata=metadata_pl,
method=selected_method
)

# Execute Chain 2: Adjust Multiple Testing (Adds p-adj)
stats_lazy = adjust_fdr_lazy(
quantification_data=stats_lazy,
strategy=selected_fdr
)

# Resolve lazy graph optimization tree and bring back to pandas memory
statistics_df = stats_lazy.collect().to_pandas()

# 💾 Save processing checkpoint inside Session State for Downstream (e.g., Volcano plot, Volcano/Heatmap UI)
st.session_state["statistics_df"] = statistics_df

st.success(f"Successfully calculated **{selected_method}** test with **{selected_fdr}** FDR correction!")

# Display the finalized statistics table view
st.subheader("Statistical Analysis Results")
st.markdown(f"Generated framework containing columns: `ProteinName`, `log2FC`, `stat`, `p-value`, `p-adj`, `PeptideSequence`")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win

The ANOVA path does not match the downstream log2FC contract.

For 3+ groups, this page allows anova, but Line 157 documents a log2FC output and content/enrichment.py filters on |log2FC|. A one-way ANOVA does not produce a single fold-change per protein, so this branch is incompatible with the current downstream schema.

🧰 Tools
🪛 Ruff (0.15.18)

[error] 157-157: f-string without any placeholders

Remove extraneous f prefix

(F541)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@content/statistical.py` around lines 103 - 157, The `anova` option in the
`selected_method` flow is incompatible with the downstream `log2FC`-based schema
used after `calculate_statistical_tests` and by consumers like
`content/enrichment.py`. Update the 3+ group branch in `content/statistical.py`
so `method_options` only includes methods that still produce a single per-row
`log2FC`/p-value contract, or change the downstream contract and documentation
if ANOVA must remain. Make sure the selected method path, result table
messaging, and any dependent filtering logic stay consistent with the same
output columns.

Comment on lines 234 to +246
groups = sorted(df["Group"].unique())

if len(groups) < 2:
return None

group1, group2 = groups[:2]

# Compute statistics per protein
stats_rows = []
for protein, protein_df in df.groupby("ProteinName"):
g1_vals = protein_df[protein_df["Group"] == group1]["Intensity"].values
g2_vals = protein_df[protein_df["Group"] == group2]["Intensity"].values

if len(g1_vals) < 2 or len(g2_vals) < 2:
pval = np.nan
else:
_, pval = ttest_ind(g1_vals, g2_vals, equal_var=False)

mean_g1 = np.mean(g1_vals) if len(g1_vals) > 0 else np.nan
mean_g2 = np.mean(g2_vals) if len(g2_vals) > 0 else np.nan

log2fc = np.log2(mean_g2 / mean_g1) if mean_g1 > 0 else np.nan

stats_rows.append({
"ProteinName": protein,
"log2FC": log2fc,
"p-value": pval,
})

stats_df = pd.DataFrame(stats_rows)

if not stats_df.empty:
mask = stats_df["p-value"].notna()
if mask.any():
_, p_adj, _, _ = multipletests(stats_df.loc[mask, "p-value"], method="fdr_bh")
stats_df.loc[mask, "p-adj"] = p_adj
else:
stats_df["p-adj"] = np.nan

# Order samples by group (group2 first, then group1)
# 3. Define ordering and build sample arrays
sample_group_df = df[["Sample", "Group"]].drop_duplicates()
group2_samples = sample_group_df[sample_group_df["Group"] == group2]["Sample"].tolist()
group1_samples = sample_group_df[sample_group_df["Group"] == group1]["Sample"].tolist()
all_samples = group2_samples + group1_samples

# Build pivot table
group1_samples = sample_group_df[sample_group_df["Group"] == groups[0]][
"Sample"
].tolist()
group2_samples = sample_group_df[sample_group_df["Group"] == groups[1]][
"Sample"
].tolist()
all_samples = group1_samples + group2_samples

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win

Don't truncate the pivot table to the first two groups.

all_samples is built from groups[0] and groups[1] only, so any third-or-later group is dropped from pivot_df. That silently removes data before it reaches the multi-group paths introduced in this PR.

Proposed fix
-    groups = sorted(df["Group"].unique())
+    groups = sorted(df["Group"].unique())
     if len(groups) < 2:
         return None

     # 3. Define ordering and build sample arrays
     sample_group_df = df[["Sample", "Group"]].drop_duplicates()
-    group1_samples = sample_group_df[sample_group_df["Group"] == groups[0]][
-        "Sample"
-    ].tolist()
-    group2_samples = sample_group_df[sample_group_df["Group"] == groups[1]][
-        "Sample"
-    ].tolist()
-    all_samples = group1_samples + group2_samples
+    all_samples = []
+    for group in groups:
+        all_samples.extend(
+            sample_group_df.loc[sample_group_df["Group"] == group, "Sample"].tolist()
+        )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
groups = sorted(df["Group"].unique())
if len(groups) < 2:
return None
group1, group2 = groups[:2]
# Compute statistics per protein
stats_rows = []
for protein, protein_df in df.groupby("ProteinName"):
g1_vals = protein_df[protein_df["Group"] == group1]["Intensity"].values
g2_vals = protein_df[protein_df["Group"] == group2]["Intensity"].values
if len(g1_vals) < 2 or len(g2_vals) < 2:
pval = np.nan
else:
_, pval = ttest_ind(g1_vals, g2_vals, equal_var=False)
mean_g1 = np.mean(g1_vals) if len(g1_vals) > 0 else np.nan
mean_g2 = np.mean(g2_vals) if len(g2_vals) > 0 else np.nan
log2fc = np.log2(mean_g2 / mean_g1) if mean_g1 > 0 else np.nan
stats_rows.append({
"ProteinName": protein,
"log2FC": log2fc,
"p-value": pval,
})
stats_df = pd.DataFrame(stats_rows)
if not stats_df.empty:
mask = stats_df["p-value"].notna()
if mask.any():
_, p_adj, _, _ = multipletests(stats_df.loc[mask, "p-value"], method="fdr_bh")
stats_df.loc[mask, "p-adj"] = p_adj
else:
stats_df["p-adj"] = np.nan
# Order samples by group (group2 first, then group1)
# 3. Define ordering and build sample arrays
sample_group_df = df[["Sample", "Group"]].drop_duplicates()
group2_samples = sample_group_df[sample_group_df["Group"] == group2]["Sample"].tolist()
group1_samples = sample_group_df[sample_group_df["Group"] == group1]["Sample"].tolist()
all_samples = group2_samples + group1_samples
# Build pivot table
group1_samples = sample_group_df[sample_group_df["Group"] == groups[0]][
"Sample"
].tolist()
group2_samples = sample_group_df[sample_group_df["Group"] == groups[1]][
"Sample"
].tolist()
all_samples = group1_samples + group2_samples
groups = sorted(df["Group"].unique())
if len(groups) < 2:
return None
# 3. Define ordering and build sample arrays
sample_group_df = df[["Sample", "Group"]].drop_duplicates()
all_samples = []
for group in groups:
all_samples.extend(
sample_group_df.loc[sample_group_df["Group"] == group, "Sample"].tolist()
)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/common/results_helpers.py` around lines 234 - 246, The sample grouping
logic in the results helper is truncating to only the first two groups, which
drops any additional groups before downstream processing. Update the grouping
and sample selection in the results-building flow around the unique symbols df,
sample_group_df, group1_samples, group2_samples, and all_samples so it preserves
every group present in the data instead of hard-coding groups[0] and groups[1].
Make sure the pivot/multi-group path receives the full set of samples and
groups, with no silent filtering when more than two groups exist.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (4)
requirements.txt (1)

152-153: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Drop the duplicate polars requirement.

polars is already pinned with a floor at Line 144. Keeping a second bare entry here is redundant and makes future version updates easier to miss.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@requirements.txt` around lines 152 - 153, Remove the redundant bare polars
entry from the requirements list, since polars is already specified earlier with
a version floor. Keep the pinned occurrence only and update the dependency block
so the requirements remain singular and easier to maintain.
content/results_volcano.py (2)

44-84: 🚀 Performance & Scalability | 🔵 Trivial | ⚡ Quick win

Wrap the threshold controls and render path in @st.fragment.

Both sliders currently rerun the full page and rebuild the component on every change. A fragment keeps those interactions scoped to the plot.

As per coding guidelines, **/*.py: Use @st.fragment decorator for interactive UI updates without full page reloads.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@content/results_volcano.py` around lines 44 - 84, The threshold sliders and
VolcanoPlot render path are triggering full page reruns instead of a scoped
update. Move the UI block in results_volcano.py into a Streamlit fragment by
wrapping the section that creates fc_thresh, p_thresh, VolcanoPlot, and the call
through volcano_plot_component in a function decorated with `@st.fragment`, so
slider changes only refresh the plot area and not the entire page.

Source: Coding guidelines


61-73: 🎯 Functional Correctness | 🔵 Trivial | ⚡ Quick win

Set an adjusted-p-value Y label explicitly.

This plot is driven by p-adj, but the generated quantms_volcano_plot/manifest.json still labels the Y axis as -log10(p-value). That will misstate what the chart is showing unless you override the label here.

Possible fix
 volcano_plot_component = VolcanoPlot(
     cache_id="quantms_volcano_plot",
     data=volcano_pl_lazy,
     log2fc_column="log2FC",
     pvalue_column="p-adj",        
     label_column="ProteinName",   
+    y_label="-log10(adjusted p-value)",
     up_color="`#E74C3C`",          
     down_color="`#3498DB`",        
     ns_color="`#95A5A6`",          
     show_threshold_lines=True,
     threshold_line_style="dash",
 )
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@content/results_volcano.py` around lines 61 - 73, The VolcanoPlot setup is
using p-adj as the p-value source but still leaves the generated Y-axis label
misleading. Update the VolcanoPlot constructor in results_volcano.py to
explicitly set the Y-axis label for adjusted p-values, using the component’s
label override option rather than the default -log10(p-value) text. Keep the
existing cache_id, pvalue_column, and other settings unchanged.
content/results_heatmap.py (1)

45-97: 🚀 Performance & Scalability | 🔵 Trivial | ⚡ Quick win

Wrap the slider-driven render path in @st.fragment.

Changing top_n reruns the full page, including data prep and component setup. Moving this block into a fragment keeps the update local and matches the repo pattern.

As per coding guidelines, **/*.py: Use @st.fragment decorator for interactive UI updates without full page reloads.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@content/results_heatmap.py` around lines 45 - 97, The `top_n` slider-driven
heatmap render path in `results_heatmap.py` is causing a full-page rerun instead
of a local UI update. Wrap the block that builds `var_series`, `top_proteins`,
`heatmap_z`, `melted_df`, and renders `Heatmap`/`state_manager` in an
`@st.fragment`-decorated function so the `st.slider` interaction updates only
that section. Keep the existing logic and symbols like `top_n`, `Heatmap`, and
`state_manager`, but move them into the fragment to match the repo’s interactive
update pattern.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@content/results_heatmap.py`:
- Around line 45-97: The `top_n` slider-driven heatmap render path in
`results_heatmap.py` is causing a full-page rerun instead of a local UI update.
Wrap the block that builds `var_series`, `top_proteins`, `heatmap_z`,
`melted_df`, and renders `Heatmap`/`state_manager` in an
`@st.fragment`-decorated function so the `st.slider` interaction updates only
that section. Keep the existing logic and symbols like `top_n`, `Heatmap`, and
`state_manager`, but move them into the fragment to match the repo’s interactive
update pattern.

In `@content/results_volcano.py`:
- Around line 44-84: The threshold sliders and VolcanoPlot render path are
triggering full page reruns instead of a scoped update. Move the UI block in
results_volcano.py into a Streamlit fragment by wrapping the section that
creates fc_thresh, p_thresh, VolcanoPlot, and the call through
volcano_plot_component in a function decorated with `@st.fragment`, so slider
changes only refresh the plot area and not the entire page.
- Around line 61-73: The VolcanoPlot setup is using p-adj as the p-value source
but still leaves the generated Y-axis label misleading. Update the VolcanoPlot
constructor in results_volcano.py to explicitly set the Y-axis label for
adjusted p-values, using the component’s label override option rather than the
default -log10(p-value) text. Keep the existing cache_id, pvalue_column, and
other settings unchanged.

In `@requirements.txt`:
- Around line 152-153: Remove the redundant bare polars entry from the
requirements list, since polars is already specified earlier with a version
floor. Keep the pinned occurrence only and update the dependency block so the
requirements remain singular and easier to maintain.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: ffb4a024-5385-44da-b83e-14d74207c3d2

📥 Commits

Reviewing files that changed from the base of the PR and between f2e0acb and ec4abbd.

⛔ Files ignored due to path filters (3)
  • quantms_protein_heatmap/preprocessed/level_0.parquet is excluded by !**/*.parquet
  • quantms_protein_heatmap/preprocessed/level_1.parquet is excluded by !**/*.parquet
  • quantms_volcano_plot/preprocessed/volcanoData.parquet is excluded by !**/*.parquet
📒 Files selected for processing (8)
  • app.py
  • content/results_heatmap.py
  • content/results_pca.py
  • content/results_volcano.py
  • quantms_protein_heatmap/manifest.json
  • quantms_volcano_plot/manifest.json
  • requirements.txt
  • src/common/results_helpers.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/common/results_helpers.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant