fix: restart MCP container on KB config change by moizpgedge · Pull Request #414 · pgEdge/control-plane

moizpgedge · 2026-06-18T17:32:11Z

KB config changes (path, provider, model, key) had no effect after an update: config.yaml is bind-mounted so SIGHUP-only reload kept the old KB path, causing "file not found" / SQLite errors. Embed a PGEDGE_CONFIG_VERSION hash in the MCP container env (like RAG) so a config change forces a restart and re-initializes the KB.

Also refresh the service instance's UpdatedAt before an update redeploy so the monitor doesn't briefly mark it failed during the restart.

coderabbitai · 2026-06-18T17:32:26Z

Warning

Review limit reached

@moizpgedge, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 45 minutes and 56 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits.

🚦 How do rate limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan refill rate.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, the refill rate gradually slows as usage increases. The highest same-day bursts are limited more strictly.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 96d63a95-a2f2-43d2-9abf-e59438f17d53

📥 Commits

Reviewing files that changed from the base of the PR and between cef2f2d and 5726453.

📒 Files selected for processing (3)

server/internal/orchestrator/swarm/service_instance.go
server/internal/orchestrator/swarm/service_spec.go
server/internal/orchestrator/swarm/service_spec_test.go

📝 Walkthrough

Walkthrough

Introduces a shared serviceConfigHash helper that replaces ragConfigHash, and uses it to inject PGEDGE_CONFIG_VERSION into MCP container environment variables, causing Docker Swarm to restart MCP containers on config changes. ServiceInstanceResource.Update now refreshes the service instance state to running before redeployment to prevent false failure detection during the restart window.

Changes

MCP config-driven Swarm restart and pre-redeploy state refresh

Layer / File(s)	Summary
`serviceConfigHash` helper and `PGEDGE_CONFIG_VERSION` for MCP and RAG `server/internal/orchestrator/swarm/service_spec.go`, `server/internal/orchestrator/swarm/service_spec_test.go`	Renames `ragConfigHash` to `serviceConfigHash` and uses it for both MCP and RAG service specs. MCP container specs now include `PGEDGE_CONFIG_VERSION` computed from `ServiceSpec.Config`, so any config change causes a Swarm TaskTemplate update and container restart. Tests are updated to expect the env var on MCP specs and to call `serviceConfigHash`; a new test `TestServiceContainerSpec_MCPHasConfigVersionEnv` asserts the env var value tracks `kb_database_host_path` mutations.
Pre-redeploy state refresh and changelogs `server/internal/orchestrator/swarm/service_instance.go`, `changes/unreleased/Fixed-20260618-160000.yaml`, `changes/unreleased/Fixed-20260618-170000.yaml`	`ServiceInstanceResource.Update` sets the service instance state to `running` (best-effort) immediately before calling `deploy`, so the health monitor's grace period covers the container restart window and instances are not marked as failed. Two changelog entries document the MCP KB config restart fix and the false-failure prevention during update-driven restarts.

Poem

A hoppy fix for the MCP's plight,
Config hash changes now restart things right!
No more SIGHUP left all alone,
The instance state's refreshed before redeployment's zone.
🐇✨ Swarm restarts, grace periods hold—
KB settings take effect, as they were told!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'fix: restart MCP container on KB config change' clearly and specifically summarizes the main change in the changeset—implementing a mechanism to restart the MCP container when KB configuration changes.
Description check	✅ Passed	The PR description includes a clear summary of the problem and solution, explains the changes made, and includes release note entries. While the formal checklist items are not explicitly shown, the core required information is present.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch Bug/PLAT-655/kb_database_host_path-change-not-applied-on-update

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codacy-production · 2026-06-18T17:33:30Z

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

🟢 Metrics 0 duplication

Metric Results

Duplication 0

View in Codacy

_{NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer}
_{TIP This summary will be updated as you push new changes.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@server/internal/orchestrator/swarm/service_instance.go`:
- Around line 147-154: The error handling in the pre-redeploy state-refresh
block is swallowing failures silently, making regressions opaque when the
injector or state-update calls fail. Instead of ignoring the error from
SetServiceInstanceState with underscore, capture the error and log it using
structured logging with zerolog when it occurs. Additionally, log a message when
the injector invocation fails (the first err from do.Invoke). Keep the operation
non-blocking so redeploy continues, but ensure both the injector failure and the
SetServiceInstanceState error are logged with appropriate context so
restart-window issues can be debugged effectively.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: fdc38449-72f9-4392-a553-82e10f30b06f

📥 Commits

Reviewing files that changed from the base of the PR and between 847bb00 and cef2f2d.

📒 Files selected for processing (5)

changes/unreleased/Fixed-20260618-160000.yaml
changes/unreleased/Fixed-20260618-170000.yaml
server/internal/orchestrator/swarm/service_instance.go
server/internal/orchestrator/swarm/service_spec.go
server/internal/orchestrator/swarm/service_spec_test.go

coderabbitai · 2026-06-18T17:36:26Z

+	// Refresh UpdatedAt before redeploying so the monitor's running-state grace
+	// period covers the restart window. An update that changes the container
+	// (e.g. a config change) restarts it, during which the container has no
+	// status; without bumping UpdatedAt the monitor would briefly mark the
+	// instance failed mid-restart even though it recovers seconds later.
+	if dbSvc, err := do.Invoke[*database.Service](rc.Injector); err == nil {
+		_ = dbSvc.SetServiceInstanceState(ctx, s.DatabaseID, s.ServiceInstanceID, database.ServiceInstanceStateRunning)
+	}


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Log best-effort state-refresh failures instead of swallowing them silently.

This block intentionally keeps redeploy non-blocking, but ignoring both injector/state-update errors makes restart-window regressions opaque when the pre-refresh fails.

As per coding guidelines, server/internal/**/*.go: “Use structured error logging with zerolog; map domain-specific errors to HTTP status codes via Goa”.

Proposed fix

// Refresh UpdatedAt before redeploying so the monitor's running-state grace // period covers the restart window. An update that changes the container // (e.g. a config change) restarts it, during which the container has no // status; without bumping UpdatedAt the monitor would briefly mark the // instance failed mid-restart even though it recovers seconds later. if dbSvc, err := do.Invoke[*database.Service](rc.Injector); err == nil { - _ = dbSvc.SetServiceInstanceState(ctx, s.DatabaseID, s.ServiceInstanceID, database.ServiceInstanceStateRunning) + if err := dbSvc.SetServiceInstanceState(ctx, s.DatabaseID, s.ServiceInstanceID, database.ServiceInstanceStateRunning); err != nil { + if logger, logErr := do.Invoke[zerolog.Logger](rc.Injector); logErr == nil { + logger.Warn(). + Err(err). + Str("service_instance_id", s.ServiceInstanceID). + Str("database_id", s.DatabaseID). + Msg("failed to refresh service instance running state before redeploy") + } + } }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

// Refresh UpdatedAt before redeploying so the monitor's running-state grace

// period covers the restart window. An update that changes the container

// (e.g. a config change) restarts it, during which the container has no

// status; without bumping UpdatedAt the monitor would briefly mark the

// instance failed mid-restart even though it recovers seconds later.

if dbSvc, err := do.Invoke[*database.Service](rc.Injector); err == nil {

_ = dbSvc.SetServiceInstanceState(ctx, s.DatabaseID, s.ServiceInstanceID, database.ServiceInstanceStateRunning)

}

// Refresh UpdatedAt before redeploying so the monitor's running-state grace

// period covers the restart window. An update that changes the container

// (e.g. a config change) restarts it, during which the container has no

// status; without bumping UpdatedAt the monitor would briefly mark the

// instance failed mid-restart even though it recovers seconds later.

if dbSvc, err := do.Invoke[*database.Service](rc.Injector); err == nil {

if err := dbSvc.SetServiceInstanceState(ctx, s.DatabaseID, s.ServiceInstanceID, database.ServiceInstanceStateRunning); err != nil {

if logger, logErr := do.Invoke[zerolog.Logger](rc.Injector); logErr == nil {

logger.Warn().

Err(err).

Str("service_instance_id", s.ServiceInstanceID).

Str("database_id", s.DatabaseID).

Msg("failed to refresh service instance running state before redeploy")

}

}

}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@server/internal/orchestrator/swarm/service_instance.go` around lines 147 - 154, The error handling in the pre-redeploy state-refresh block is swallowing failures silently, making regressions opaque when the injector or state-update calls fail. Instead of ignoring the error from SetServiceInstanceState with underscore, capture the error and log it using structured logging with zerolog when it occurs. Additionally, log a message when the injector invocation fails (the first err from do.Invoke). Keep the operation non-blocking so redeploy continues, but ensure both the injector failure and the SetServiceInstanceState error are logged with appropriate context so restart-window issues can be debugged effectively.

Source: Coding guidelines

KB config changes (path, provider, model, key) had no effect after an update: config.yaml is bind-mounted so SIGHUP-only reload kept the old KB path, causing "file not found" / SQLite errors. Embed a PGEDGE_CONFIG_VERSION hash in the MCP container env (like RAG) so a config change forces a restart and re-initializes the KB. Also refresh the service instance's UpdatedAt before an update redeploy so the monitor doesn't briefly mark it failed during the restart.

* fix: restart MCP container on KB config change KB config changes (path, provider, model, key) had no effect after an update: config.yaml is bind-mounted so SIGHUP-only reload kept the old KB path, causing "file not found" / SQLite errors. Embed a PGEDGE_CONFIG_VERSION hash in the MCP container env (like RAG) so a config change forces a restart and re-initializes the KB. Also refresh the service instance's UpdatedAt before an update redeploy so the monitor doesn't briefly mark it failed during the restart.

moizpgedge requested review from jason-lynch and usman-muzaffar555 June 18, 2026 17:32

coderabbitai Bot reviewed Jun 18, 2026

View reviewed changes

moizpgedge force-pushed the Bug/PLAT-655/kb_database_host_path-change-not-applied-on-update branch from cef2f2d to d6278f8 Compare June 18, 2026 17:38

small fix

5726453

jason-lynch approved these changes Jun 18, 2026

View reviewed changes

moizpgedge merged commit 0a7390c into main Jun 18, 2026
2 of 3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: restart MCP container on KB config change#414

fix: restart MCP container on KB config change#414
moizpgedge merged 2 commits into
mainfrom
Bug/PLAT-655/kb_database_host_path-change-not-applied-on-update

moizpgedge commented Jun 18, 2026

Uh oh!

coderabbitai Bot commented Jun 18, 2026 •

edited

Loading

Review limit reached

Walkthrough

Changes

Poem

❌ Failed checks (1 warning)

Uh oh!

codacy-production Bot commented Jun 18, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Jun 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

moizpgedge commented Jun 18, 2026

Uh oh!

coderabbitai Bot commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review limit reached

Walkthrough

Changes

Poem

❌ Failed checks (1 warning)

Uh oh!

codacy-production Bot commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Up to standards ✅

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai Bot commented Jun 18, 2026 •

edited

Loading

codacy-production Bot commented Jun 18, 2026 •

edited

Loading