Skip to content

fix: restart MCP container on KB config change#414

Merged
moizpgedge merged 2 commits into
mainfrom
Bug/PLAT-655/kb_database_host_path-change-not-applied-on-update
Jun 18, 2026
Merged

fix: restart MCP container on KB config change#414
moizpgedge merged 2 commits into
mainfrom
Bug/PLAT-655/kb_database_host_path-change-not-applied-on-update

Conversation

@moizpgedge

Copy link
Copy Markdown
Contributor

KB config changes (path, provider, model, key) had no effect after an update: config.yaml is bind-mounted so SIGHUP-only reload kept the old KB path, causing "file not found" / SQLite errors. Embed a PGEDGE_CONFIG_VERSION hash in the MCP container env (like RAG) so a config change forces a restart and re-initializes the KB.

Also refresh the service instance's UpdatedAt before an update redeploy so the monitor doesn't briefly mark it failed during the restart.

@coderabbitai

coderabbitai Bot commented Jun 18, 2026

Copy link
Copy Markdown

Review Change Stack

Warning

Review limit reached

@moizpgedge, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 45 minutes and 56 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits.

🚦 How do rate limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan refill rate.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, the refill rate gradually slows as usage increases. The highest same-day bursts are limited more strictly.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 96d63a95-a2f2-43d2-9abf-e59438f17d53

📥 Commits

Reviewing files that changed from the base of the PR and between cef2f2d and 5726453.

📒 Files selected for processing (3)
  • server/internal/orchestrator/swarm/service_instance.go
  • server/internal/orchestrator/swarm/service_spec.go
  • server/internal/orchestrator/swarm/service_spec_test.go
📝 Walkthrough

Walkthrough

Introduces a shared serviceConfigHash helper that replaces ragConfigHash, and uses it to inject PGEDGE_CONFIG_VERSION into MCP container environment variables, causing Docker Swarm to restart MCP containers on config changes. ServiceInstanceResource.Update now refreshes the service instance state to running before redeployment to prevent false failure detection during the restart window.

Changes

MCP config-driven Swarm restart and pre-redeploy state refresh

Layer / File(s) Summary
serviceConfigHash helper and PGEDGE_CONFIG_VERSION for MCP and RAG
server/internal/orchestrator/swarm/service_spec.go, server/internal/orchestrator/swarm/service_spec_test.go
Renames ragConfigHash to serviceConfigHash and uses it for both MCP and RAG service specs. MCP container specs now include PGEDGE_CONFIG_VERSION computed from ServiceSpec.Config, so any config change causes a Swarm TaskTemplate update and container restart. Tests are updated to expect the env var on MCP specs and to call serviceConfigHash; a new test TestServiceContainerSpec_MCPHasConfigVersionEnv asserts the env var value tracks kb_database_host_path mutations.
Pre-redeploy state refresh and changelogs
server/internal/orchestrator/swarm/service_instance.go, changes/unreleased/Fixed-20260618-160000.yaml, changes/unreleased/Fixed-20260618-170000.yaml
ServiceInstanceResource.Update sets the service instance state to running (best-effort) immediately before calling deploy, so the health monitor's grace period covers the container restart window and instances are not marked as failed. Two changelog entries document the MCP KB config restart fix and the false-failure prevention during update-driven restarts.

Poem

A hoppy fix for the MCP's plight,
Config hash changes now restart things right!
No more SIGHUP left all alone,
The instance state's refreshed before redeployment's zone.
🐇✨ Swarm restarts, grace periods hold—
KB settings take effect, as they were told!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'fix: restart MCP container on KB config change' clearly and specifically summarizes the main change in the changeset—implementing a mechanism to restart the MCP container when KB configuration changes.
Description check ✅ Passed The PR description includes a clear summary of the problem and solution, explains the changes made, and includes release note entries. While the formal checklist items are not explicitly shown, the core required information is present.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch Bug/PLAT-655/kb_database_host_path-change-not-applied-on-update

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codacy-production

codacy-production Bot commented Jun 18, 2026

Copy link
Copy Markdown

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

🟢 Metrics 0 duplication

Metric Results
Duplication 0

View in Codacy

NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@server/internal/orchestrator/swarm/service_instance.go`:
- Around line 147-154: The error handling in the pre-redeploy state-refresh
block is swallowing failures silently, making regressions opaque when the
injector or state-update calls fail. Instead of ignoring the error from
SetServiceInstanceState with underscore, capture the error and log it using
structured logging with zerolog when it occurs. Additionally, log a message when
the injector invocation fails (the first err from do.Invoke). Keep the operation
non-blocking so redeploy continues, but ensure both the injector failure and the
SetServiceInstanceState error are logged with appropriate context so
restart-window issues can be debugged effectively.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: fdc38449-72f9-4392-a553-82e10f30b06f

📥 Commits

Reviewing files that changed from the base of the PR and between 847bb00 and cef2f2d.

📒 Files selected for processing (5)
  • changes/unreleased/Fixed-20260618-160000.yaml
  • changes/unreleased/Fixed-20260618-170000.yaml
  • server/internal/orchestrator/swarm/service_instance.go
  • server/internal/orchestrator/swarm/service_spec.go
  • server/internal/orchestrator/swarm/service_spec_test.go

Comment on lines +147 to +154
// Refresh UpdatedAt before redeploying so the monitor's running-state grace
// period covers the restart window. An update that changes the container
// (e.g. a config change) restarts it, during which the container has no
// status; without bumping UpdatedAt the monitor would briefly mark the
// instance failed mid-restart even though it recovers seconds later.
if dbSvc, err := do.Invoke[*database.Service](rc.Injector); err == nil {
_ = dbSvc.SetServiceInstanceState(ctx, s.DatabaseID, s.ServiceInstanceID, database.ServiceInstanceStateRunning)
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Log best-effort state-refresh failures instead of swallowing them silently.

This block intentionally keeps redeploy non-blocking, but ignoring both injector/state-update errors makes restart-window regressions opaque when the pre-refresh fails.

As per coding guidelines, server/internal/**/*.go: “Use structured error logging with zerolog; map domain-specific errors to HTTP status codes via Goa”.

Proposed fix
 	// Refresh UpdatedAt before redeploying so the monitor's running-state grace
 	// period covers the restart window. An update that changes the container
 	// (e.g. a config change) restarts it, during which the container has no
 	// status; without bumping UpdatedAt the monitor would briefly mark the
 	// instance failed mid-restart even though it recovers seconds later.
 	if dbSvc, err := do.Invoke[*database.Service](rc.Injector); err == nil {
-		_ = dbSvc.SetServiceInstanceState(ctx, s.DatabaseID, s.ServiceInstanceID, database.ServiceInstanceStateRunning)
+		if err := dbSvc.SetServiceInstanceState(ctx, s.DatabaseID, s.ServiceInstanceID, database.ServiceInstanceStateRunning); err != nil {
+			if logger, logErr := do.Invoke[zerolog.Logger](rc.Injector); logErr == nil {
+				logger.Warn().
+					Err(err).
+					Str("service_instance_id", s.ServiceInstanceID).
+					Str("database_id", s.DatabaseID).
+					Msg("failed to refresh service instance running state before redeploy")
+			}
+		}
 	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Refresh UpdatedAt before redeploying so the monitor's running-state grace
// period covers the restart window. An update that changes the container
// (e.g. a config change) restarts it, during which the container has no
// status; without bumping UpdatedAt the monitor would briefly mark the
// instance failed mid-restart even though it recovers seconds later.
if dbSvc, err := do.Invoke[*database.Service](rc.Injector); err == nil {
_ = dbSvc.SetServiceInstanceState(ctx, s.DatabaseID, s.ServiceInstanceID, database.ServiceInstanceStateRunning)
}
// Refresh UpdatedAt before redeploying so the monitor's running-state grace
// period covers the restart window. An update that changes the container
// (e.g. a config change) restarts it, during which the container has no
// status; without bumping UpdatedAt the monitor would briefly mark the
// instance failed mid-restart even though it recovers seconds later.
if dbSvc, err := do.Invoke[*database.Service](rc.Injector); err == nil {
if err := dbSvc.SetServiceInstanceState(ctx, s.DatabaseID, s.ServiceInstanceID, database.ServiceInstanceStateRunning); err != nil {
if logger, logErr := do.Invoke[zerolog.Logger](rc.Injector); logErr == nil {
logger.Warn().
Err(err).
Str("service_instance_id", s.ServiceInstanceID).
Str("database_id", s.DatabaseID).
Msg("failed to refresh service instance running state before redeploy")
}
}
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/internal/orchestrator/swarm/service_instance.go` around lines 147 -
154, The error handling in the pre-redeploy state-refresh block is swallowing
failures silently, making regressions opaque when the injector or state-update
calls fail. Instead of ignoring the error from SetServiceInstanceState with
underscore, capture the error and log it using structured logging with zerolog
when it occurs. Additionally, log a message when the injector invocation fails
(the first err from do.Invoke). Keep the operation non-blocking so redeploy
continues, but ensure both the injector failure and the SetServiceInstanceState
error are logged with appropriate context so restart-window issues can be
debugged effectively.

Source: Coding guidelines

KB config changes (path, provider, model, key) had no effect after an
update: config.yaml is bind-mounted so SIGHUP-only reload kept the old
KB path, causing "file not found" / SQLite errors. Embed a
PGEDGE_CONFIG_VERSION hash in the MCP container env (like RAG) so a
config change forces a restart and re-initializes the KB.

Also refresh the service instance's UpdatedAt before an update redeploy
so the monitor doesn't briefly mark it failed during the restart.
@moizpgedge moizpgedge force-pushed the Bug/PLAT-655/kb_database_host_path-change-not-applied-on-update branch from cef2f2d to d6278f8 Compare June 18, 2026 17:38
@moizpgedge moizpgedge merged commit 0a7390c into main Jun 18, 2026
2 of 3 checks passed
moizpgedge added a commit that referenced this pull request Jun 18, 2026
* fix: restart MCP container on KB config change

KB config changes (path, provider, model, key) had no effect after an
update: config.yaml is bind-mounted so SIGHUP-only reload kept the old
KB path, causing "file not found" / SQLite errors. Embed a
PGEDGE_CONFIG_VERSION hash in the MCP container env (like RAG) so a
config change forces a restart and re-initializes the KB.

Also refresh the service instance's UpdatedAt before an update redeploy
so the monitor doesn't briefly mark it failed during the restart.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants