Release/5.70.2#69
Merged
Merged
Conversation
fix(project-settings): add toast success + Sync OS -> enterprise See merge request dkinternal/testgen/dataops-testgen!534
- create_connection - update_connection - test_connection
feat(mcp): add tools to manage database connections See merge request dkinternal/testgen/dataops-testgen!529
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
feat(mcp): accept hygiene issue id in source data tools (TG-1087) See merge request dkinternal/testgen/dataops-testgen!533
- create_table_group - update_table_group - add_tables_to_group - preview_table_group Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
feat(mcp): add tools to manage table groups See merge request dkinternal/testgen/dataops-testgen!530
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
fix: harden profiling/test-execution against NULL connection.max_query_chars (TG-1114) See merge request dkinternal/testgen/dataops-testgen!536
…hygiene - Label run identifiers by kind (Test Run / Profiling Run) in MCP output instead of "Job ID" - Single JOB_STATUS_LABEL in common.enums; run-summary models alias it - Strip enum descriptions from the OpenAPI schema so enum docstrings don't leak into the public docs - Add PublicJobKey so the API exposes only externally-triggerable job kinds; remove dead backfill job source - Generalize an OAuth route docstring; drop an incidental JobKey re-export TG-1116 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Miscellaneous cleanup: MCP run-kind labels, shared run-status labels, API schema hygiene See merge request dkinternal/testgen/dataops-testgen!537
Add a job_execution relationship to TestRun/ProfilingRun and migrate the entity-instance lifecycle readers (notification trigger logic, monitor end-time, CLI run summary) to read status/timestamps through it. Unify the profiling-run email onto the job-execution-sourced context (lowercase JobStatus + status_label) and retire the now-dead format_status helper. No behavior change; prepares for run-table deduplication (TG-1047). TG-1115 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add a Target Project dropdown to the Copy/Move Test dialog so tests can be copied or moved into table groups in another project. Switching the project re-scopes the Target Table Group and Test Suite options and resets downstream selections. - Adds Authentication.get_projects_with_permission as the hook that populates the dropdown. Default returns all projects. - on_copy_confirmed and on_move_confirmed re-resolve the target table group's project_code and revalidate the user's permission server-side before mutating, since the frontend selection is not trustworthy. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
feat(test-definitions): support copy/move across projects (TG-1075) See merge request dkinternal/testgen/dataops-testgen!531
…d-job-execution-relationship-to-run-models
- get_failure_summary: validate group_by against the allowed set instead of falling through to a generic error - list_test_suites: use the standard not-found-or-not-accessible wording on access denial; keep the specific message for the access-granted empty case - get_failure_summary/get_failure_trend: reject a cross-project or inaccessible test suite / table group instead of silently returning empty; brings get_failure_trend up to the same validation bar - MCP auth boundary: surface a clean authentication error when the token's user no longer exists or the token was revoked, instead of a generic "unexpected error" plus an error-level traceback. Introduce an AuthError domain exception raised by decode_jwt_token/authorize_token; REST's 401 path is unchanged TG-1119 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add a stable_passes counter to RunDiff, incremented when a shared test definition is Passed in both the baseline and target runs, and render a Stable passes row in the compare_test_runs summary table. The count is the largest bucket in a healthy suite and the natural denominator for the failure counts. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
fix(mcp): consistent error surfacing and scope validation See merge request dkinternal/testgen/dataops-testgen!541
…p-compare-runs-stable-pass-count
…d-job-execution-relationship-to-run-models
…enterprise' feat(mcp): surface stable-pass count in compare_test_runs See merge request dkinternal/testgen/dataops-testgen!542
…d-job-execution-relationship-to-run-models
CAT measures substitute baseline params (counts, averages, standard deviations) and tolerances directly into SQL. When a test definition has an empty value, the substitution produced invalid SQL such as CAST( AS FLOAT) — erroring the test and poisoning the aggregated CAT batch (forcing a slow single rerun). Add a shared null_if_empty() helper and apply it to the numeric baseline params and tolerances in both the execution query builder and the source data lookup service. BASELINE_VALUE (used as a quoted literal, a number, or an IN-list) and Freshness_Trend's BASELINE_SUM (a quoted timestamp the template handles via NULLIF) are left as-is, since "NULL" is not valid SQL in those contexts. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…s it
SQL Server and PostgreSQL reject TABLESAMPLE on views ("can only be used with
local tables" / "tables and materialized views"), so profiling a table group
containing a view with sampling enabled errored those columns. Add a
samples_views flavor capability (False for mssql/postgresql) and an object_type
signal from those flavors' DDF, and skip sampling for views on those flavors —
they are profiled in full.
Also compute the per-run sampling params once and share them between column
profiling and frequency analysis. Frequency analysis previously never sampled
(the secondary query's TABLESAMPLE branch was unreachable); it now samples the
same tables, with the view-skip applied. Sample-scale frequency counts are left
unscaled.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- Data Catalog: show 'Object Type' (title-cased) as the first attribute on the table Characteristics card, mirroring 'Data Type' for columns. Sourced from data_table_chars.object_type via get_tables_by_condition. - Table group form: under Sampling Parameters, show a yellow warning icon + note that views are profiled in full, only for flavors whose sample clause skips views (SQL Server, PostgreSQL) and only when sampling is enabled. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ls' into 'enterprise' refactor(runs): read run lifecycle through a job_execution relationship See merge request dkinternal/testgen/dataops-testgen!540
Address review feedback on MR !547:
- Narrow mssql sampleable_object_types to {TABLE} — SQL Server has no materialized
views, so the MATERIALIZED_VIEW entry never matched.
- Annotate object_type as ObjectType (not str) on ColumnChars and the DataTable
model/overview, per the StrEnum-everywhere convention.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… Settings tabs The dialog had grown long and led with optional configuration ahead of the core test parameters. Restructure it: - Keep the test type name, description, and usage notes as a header above the content. - Add "Parameters" and "Settings" tabs. Parameters holds schema/table/column and the test-type parameters; Settings holds the description override, the external URL / custom metadata, the active/lock flags, and the severity/observability/impact-dimension overrides. - Drop the now-redundant test type name/description that repeated inside the form (via a new opt-in hideHeader on TestDefinitionForm; the monitors editor keeps its header). - Add an optional external activeTab state to the shared Tabs component and hoist it to the dialog so the selected tab survives the form's re-render on each field change. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The tab panels render into the Tabs component's content container, which has no flex gap, so the per-field spacing the dialog had as a flex-column was lost. Wrap each tab's fields in a flex-column fx-gap-3 container to restore it. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…nderline Address review feedback on the click-through link: - Link referenced Tooltip without importing it, so any Link given a `tooltip` (the test-results external-link cell) threw at render. Import it from tooltip.js. - In the result test-definition summary, the external URL used `underline` plus a full-width link, so the hover underline spanned the panel and read as a divider. Drop the underline (the open_in_new icon is enough affordance) and size the link to content (max-width instead of width) so long URLs still wrap. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…rise' TG-1134: External URL and custom metadata on test definitions See merge request dkinternal/testgen/dataops-testgen!576
…nto 'enterprise' feat(oauth): personal access tokens and token management UI See merge request dkinternal/testgen/dataops-testgen!573
Three new MCP write tools in core and one in the enterprise project-management plugin, so an LLM can manage projects and test suites without falling back to the UI: - update_project (core, administer permission): updates the full per-project settings surface exposed in the UI — display name, weighted DQ scoring toggle, DataOps Observability URL + key, data retention enable/days, and retention schedule cron + timezone. Side effects mirror the UI: weights flip triggers a background score recalc, retention toggles upsert/delete the scheduled cleanup job. observability_api_key is consumed but never echoed back (mcp-patterns secrets rule). - create_test_suite (core, edit permission): under a table group. Accepts the full UI-editable surface in one call — name, description, default severity, observability export toggle, DataOps Observability component fields, and scoring exclusion — so the LLM doesn't need a follow-up update_test_suite to finish configuring a new suite. Always non-monitor. - update_test_suite (core, edit permission): same surface, partial updates. Resolves through the existing monitor-filtered resolve_test_suite so a monitor-suite ID surfaces the unified "not found or not accessible" wording. Empty-string args on NullIfEmptyString columns normalize to None on the way in so the diff never reports a phantom change. create_project (enterprise plugin, global_admin) registers through the TG-1124 plugin MCP-tools hook in testgen_project_management. Mirrors the UI's "Add Project" handler: persists the row and schedules the data-retention cleanup job at the project's defaults so MCP-created projects behave the same as UI-created ones. Project creation is enterprise-only because it belongs to the project-management feature. global_admin is a User flag, not a per-project role, so the core mcp_permission decorator was extended to special-case it. Every other call site stays unchanged. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…-1070 review S5) The diff-table render loop was duplicated three places (table_groups, projects, test_suites). The verify_access + Project.get + None-check trio in update_project also duplicated the resolve_* idiom already used for connections, table groups, and test suites. Both moves were flagged in the TG-1070 review. - render_diff_table(doc, before, after, *, attrs, labels, secret_attrs, value_renderer) in common.py owns the changed-attrs iteration, label lookup, secret redaction, and table emit. Each tool retains its own _DIFF_ORDER / _DIFF_LABELS and its own snapshot function (different shapes per entity). - resolve_project(project_code) joins the existing resolve_* family. Uses Project.get(code, Project.project_code.in_(allowed_codes)) so the unified not-found-or-not-accessible error path matches resolve_test_suite and resolve_table_group: out-of-scope projects never reveal whether they exist. update_connection keeps its own custom diff renderer — the "[secret] (rotated)" cue it emits for secret writes is semantically distinct from the generic "[secret]" redaction and isn't worth bending the helper to model both. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
feat(mcp): add CRUD write tools for projects and test suites (TG-1070) See merge request dkinternal/testgen/dataops-testgen!570
Adds a data_classification tag field to the data model, supporting three-level inheritance (column → table → table_group) throughout the application, consistent with existing tag fields. - Migration 0193: ADD COLUMN data_classification to table_groups, data_table_chars, data_column_chars, score_definition_results_breakdown; recreate 8 scoring views with COALESCE inheritance - Schema setup: ADD COLUMN and COALESCE in all 8 standard views - ORM: data_classification added to TableGroup model and ScoreDefinitionBreakdownItem / ScoreCategory / SCORE_CATEGORIES - Queries: COALESCE inheritance in profiling anomaly, test result, and scoring queries; added to TAG_FIELDS and scoring categories list - Data Catalog: fix tag inheritance in list view (join table_groups in get_table_group_columns); fix CSV export to COALESCE from table_groups; add seeded filter defaults (Public/Internal/Confidential/Restricted) via merge_tag_defaults; Excel export column - Table Group Form: Data Classification input in TaggingForm - JS: TAG_KEYS/TAG_HELP in metadata_tags.js; three-level fallback chain in data_catalog.js - Reports: data_classification in PDF hygiene and test result report metadata - Import: "data classification" header mapped in import_metadata_dialog - Tests: 5 unit tests for merge_tag_defaults dedupe logic Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> test: update TAG_FIELDS count to 9 after adding data_classification Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> fix(TG-1085): address round-2 review comments
feat(TG-1085): add data_classification tag field See merge request dkinternal/testgen/dataops-testgen!547
… schema-change audit (TG-1092)
Three drill-down tools on top of the L1 group / table inventory:
- list_monitor_events(table_group_id, table_name, monitor_type, monitor_id?,
include_predictions?, limit?, page?) — per-table event history scoped to one
monitor type. monitor_id is required when monitor_type="metric" (Metric is
the only multi-instance type — singletons reject monitor_id with a clear
pointer). Per-type column shapes:
* Volume: Time | Status | Row count | Lower bound | Upper bound
* Freshness: Time | Status | Update detected | Detail
(parsed from the SQL template's structured message)
* Schema: Time | Status | Table change | Columns added | Columns dropped | Columns modified
* Metric: Time | Status | Value | Lower bound | Upper bound
(metric name in the heading, not as a column)
include_predictions=True appends a separate `## Forecast` section listing
future timestamps with predicted bounds — never interleaved into the
historical events. Schema short-circuits to "not applicable"; non-
Prediction-Model monitors render a "not available" note.
- list_monitors(table_group_id, table_name) — configured monitors for a
table. Per row: monitor_id, type (Title Case, reuses _MONITOR_LABEL),
metric_name, threshold mode (derived from history_calculation:
"PREDICT" → Prediction; non-empty otherwise and not Freshness →
Historical; empty → Static), bounds, and the metric expression for
Metric monitors. Sensitivity is surfaced as a top-level "Prediction
model sensitivity" field.
- list_monitor_schema_changes(table_group_id, table_name?, since?,
limit?, page?) — newest-first audit log of column-level schema events
(added / dropped / modified), independent of monitor-run results.
All three follow the L1 contract: gated on view, resolve_monitored_table_group
returns the literal "This table group is not monitored." when the group has
no linked monitor suite, and never expose internal test_type codes,
test_definition_id, or test_suite_id on the surface.
Backing model methods:
- DataStructureLog ORM wraps the existing data_structure_log audit table
(no ORM mapping until now) and exposes list_for_table_group(*, table_name,
since, until, page, limit). The dashboard's get_data_structure_logs
delegates here.
- TestDefinition.list_monitor_configs_for_table returns MonitorConfig with
threshold_mode derived per row.
- TestDefinition.get_singleton_monitor(suite, table, type) for the non-
metric forecast lookup path.
- forecast_points_from_prediction(prediction, sensitivity) — standalone
helper that reads forecast rows from the prediction JSONB using epoch-ms
keys with numeric comparison (matches the dashboard's reader).
- TestResult.list_monitor_events_for_table runs the dashboard's CTE under
the ORM with optional monitor_type filtering. ORDER BY includes
results.id NULLS LAST, active_runs.id as a stable tiebreaker so the
Python-side pagination doesn't duplicate or skip rows on test_time ties.
- TestResult.list_metric_monitor_events — separate, simpler query path
for Metric (no run-by-type CROSS JOIN, no synthesized pending rows).
The dashboard's per-type events transform stays in monitors_dashboard.py
because its shape is bespoke to the chart payload; centralizing just the
SQL would still require a non-trivial adapter on the dashboard side.
monitor_id joins the Followable IDs table as an alias for the underlying
test_definition.id UUID — the abstraction stays stable if monitors move
to a dedicated table later.
Status helpers (_summary_status, _format_monitor_cell, _format_schema_cell,
_event_status) return Title Case ("Error", "Pending", "Training",
"Anomaly", "Ok"), matching the rest of the MCP layer. parse_monitor_type
accepts either case so values round-trip.
_parse_kv_pairs splits input_parameters on ";" (matches the writer side
at execute_tests_query.py:321 and the dashboard's dict_from_kv).
Closes TG-1092
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…tools (TG-1092) Review follow-ups on the monitor L2 read tools: - list_monitor_events: validate the metric monitor_id as a UUID and scope the lookup to the suite + table + Metric_Trend, so a monitor_id belonging to a metric on another table or suite can't be rendered under the wrong heading. - Forecasts now mirror the dashboard exactly. The forecast logic (predicted next-update window, coupled baseline-then-refresh band) moved into the UI-agnostic common.monitor_forecast module shared by the dashboard and the MCP tool. Volume/Metric monitors coupled to a Freshness monitor render the same band the UI plots; Freshness renders its predicted next-update window. No internal coupling terminology is exposed. - Threshold-mode labels are a ThresholdMode StrEnum; DataStructureLog .list_for_table_group takes *clauses like its siblings (the dashboard keeps its exclusive lower-bound schema-change window). - Type the forecast renderer, fix stale docstrings, and make the forecast-pending note accurate (a forecast appears once the Prediction Model has trained on enough history, not after the first run). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…otes (TG-1092) Addresses review findings on the forecast work: - Render the forecast only on the first page — it is forward-looking and page-independent, but its anchor (the latest event) was read from the current page, producing a stale forecast on page >= 2. - Match the dashboard's coupled-forecast precondition: a freshness-coupled Volume/Metric monitor with no configured tolerance has no band on either surface (it was entering the coupled path on the coupling flag alone). - Resolve schedule holidays only after the next-update-window guards pass, restoring the dashboard's short-circuit (no calendar work for non-Prediction monitors); next_update_window now takes the test suite. - Exclude Error-status freshness events from the next-update anchor, matching the dashboard. - Use an accurate note when a forecast is absent for a non-training reason (elapsed window / no baseline) instead of telling a trained monitor to keep training. - Fix the dashboard unit test that imported the relocated forecast helper (this broke the Python Tests CI job), and add coverage for the coupled-without-tolerance case. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
feat(mcp): add monitor L2 read tools — per-table events, configs, and schema-change audit (TG-1092) See merge request dkinternal/testgen/dataops-testgen!563
…entation send_event enqueues to a bounded queue drained by a daemon worker batching up to 50 events; every MCP tool call auto-emits an mcp-tool-call event, with bounded shutdown drain. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
MCP & UI: async batched Mixpanel telemetry (TG-1076) See merge request dkinternal/testgen/dataops-testgen!571
The freshness-gated forecast band collapsed to a zero-width point at the flat anchor (lower=upper=baseline), rendering as an hourglass for a future next-update window and as a band detached from history for an imminent one. Carry the next-refresh tolerance across the whole forecast so the band stays a continuous width that connects to the historical band; the mean line still holds flat at baseline then steps. Display-only — no effect on anomaly evaluation. TG-1131 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
fix: misc qa fixes See merge request dkinternal/testgen/dataops-testgen!579
…chedule id display (TG-1135) - list_table_groups: join job_executions on profiling_runs.id (the shared-PK FK) instead of a non-existent profiling_runs.job_execution_id column, which raised a 500. - get_profiling_run: render the hygiene breakdown from HygieneIssue.count_for_run so Potential PII stays separate from the "possible" likelihood bucket, matching the REST profiling-run issue_counts. - create_test_run_schedule / create_profiling_schedule: flush after save so the default-generated JobSchedule id is populated before rendering (was "Schedule ID: —"). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01HE8KakxW43wFn3pW5cqV5b
…ene rendering (TG-1135) get_profiling_run now reads the hygiene breakdown from HygieneIssue.count_for_run instead of the select_summary buckets, so the unit test mocks count_for_run and asserts the separated likelihood / Potential PII rendering. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01HE8KakxW43wFn3pW5cqV5b
fix(mcp): list_table_groups 500, profiling-run PII counts, schedule id display See merge request dkinternal/testgen/dataops-testgen!580
datakitchen-devops
approved these changes
Jun 30, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Features
Bug Fixes
Refactors