fix: restart MCP container on KB config change#414
Conversation
|
Warning Review limit reached
More reviews will be available in 45 minutes and 56 seconds. Learn how PR review limits work. Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file). ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits. 🚦 How do rate limits work?CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan refill rate. For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, the refill rate gradually slows as usage increases. The highest same-day bursts are limited more strictly. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (3)
📝 WalkthroughWalkthroughIntroduces a shared ChangesMCP config-driven Swarm restart and pre-redeploy state refresh
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Up to standards ✅🟢 Issues
|
| Metric | Results |
|---|---|
| Duplication | 0 |
NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@server/internal/orchestrator/swarm/service_instance.go`:
- Around line 147-154: The error handling in the pre-redeploy state-refresh
block is swallowing failures silently, making regressions opaque when the
injector or state-update calls fail. Instead of ignoring the error from
SetServiceInstanceState with underscore, capture the error and log it using
structured logging with zerolog when it occurs. Additionally, log a message when
the injector invocation fails (the first err from do.Invoke). Keep the operation
non-blocking so redeploy continues, but ensure both the injector failure and the
SetServiceInstanceState error are logged with appropriate context so
restart-window issues can be debugged effectively.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: fdc38449-72f9-4392-a553-82e10f30b06f
📒 Files selected for processing (5)
changes/unreleased/Fixed-20260618-160000.yamlchanges/unreleased/Fixed-20260618-170000.yamlserver/internal/orchestrator/swarm/service_instance.goserver/internal/orchestrator/swarm/service_spec.goserver/internal/orchestrator/swarm/service_spec_test.go
| // Refresh UpdatedAt before redeploying so the monitor's running-state grace | ||
| // period covers the restart window. An update that changes the container | ||
| // (e.g. a config change) restarts it, during which the container has no | ||
| // status; without bumping UpdatedAt the monitor would briefly mark the | ||
| // instance failed mid-restart even though it recovers seconds later. | ||
| if dbSvc, err := do.Invoke[*database.Service](rc.Injector); err == nil { | ||
| _ = dbSvc.SetServiceInstanceState(ctx, s.DatabaseID, s.ServiceInstanceID, database.ServiceInstanceStateRunning) | ||
| } |
There was a problem hiding this comment.
Log best-effort state-refresh failures instead of swallowing them silently.
This block intentionally keeps redeploy non-blocking, but ignoring both injector/state-update errors makes restart-window regressions opaque when the pre-refresh fails.
As per coding guidelines, server/internal/**/*.go: “Use structured error logging with zerolog; map domain-specific errors to HTTP status codes via Goa”.
Proposed fix
// Refresh UpdatedAt before redeploying so the monitor's running-state grace
// period covers the restart window. An update that changes the container
// (e.g. a config change) restarts it, during which the container has no
// status; without bumping UpdatedAt the monitor would briefly mark the
// instance failed mid-restart even though it recovers seconds later.
if dbSvc, err := do.Invoke[*database.Service](rc.Injector); err == nil {
- _ = dbSvc.SetServiceInstanceState(ctx, s.DatabaseID, s.ServiceInstanceID, database.ServiceInstanceStateRunning)
+ if err := dbSvc.SetServiceInstanceState(ctx, s.DatabaseID, s.ServiceInstanceID, database.ServiceInstanceStateRunning); err != nil {
+ if logger, logErr := do.Invoke[zerolog.Logger](rc.Injector); logErr == nil {
+ logger.Warn().
+ Err(err).
+ Str("service_instance_id", s.ServiceInstanceID).
+ Str("database_id", s.DatabaseID).
+ Msg("failed to refresh service instance running state before redeploy")
+ }
+ }
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // Refresh UpdatedAt before redeploying so the monitor's running-state grace | |
| // period covers the restart window. An update that changes the container | |
| // (e.g. a config change) restarts it, during which the container has no | |
| // status; without bumping UpdatedAt the monitor would briefly mark the | |
| // instance failed mid-restart even though it recovers seconds later. | |
| if dbSvc, err := do.Invoke[*database.Service](rc.Injector); err == nil { | |
| _ = dbSvc.SetServiceInstanceState(ctx, s.DatabaseID, s.ServiceInstanceID, database.ServiceInstanceStateRunning) | |
| } | |
| // Refresh UpdatedAt before redeploying so the monitor's running-state grace | |
| // period covers the restart window. An update that changes the container | |
| // (e.g. a config change) restarts it, during which the container has no | |
| // status; without bumping UpdatedAt the monitor would briefly mark the | |
| // instance failed mid-restart even though it recovers seconds later. | |
| if dbSvc, err := do.Invoke[*database.Service](rc.Injector); err == nil { | |
| if err := dbSvc.SetServiceInstanceState(ctx, s.DatabaseID, s.ServiceInstanceID, database.ServiceInstanceStateRunning); err != nil { | |
| if logger, logErr := do.Invoke[zerolog.Logger](rc.Injector); logErr == nil { | |
| logger.Warn(). | |
| Err(err). | |
| Str("service_instance_id", s.ServiceInstanceID). | |
| Str("database_id", s.DatabaseID). | |
| Msg("failed to refresh service instance running state before redeploy") | |
| } | |
| } | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@server/internal/orchestrator/swarm/service_instance.go` around lines 147 -
154, The error handling in the pre-redeploy state-refresh block is swallowing
failures silently, making regressions opaque when the injector or state-update
calls fail. Instead of ignoring the error from SetServiceInstanceState with
underscore, capture the error and log it using structured logging with zerolog
when it occurs. Additionally, log a message when the injector invocation fails
(the first err from do.Invoke). Keep the operation non-blocking so redeploy
continues, but ensure both the injector failure and the SetServiceInstanceState
error are logged with appropriate context so restart-window issues can be
debugged effectively.
Source: Coding guidelines
KB config changes (path, provider, model, key) had no effect after an update: config.yaml is bind-mounted so SIGHUP-only reload kept the old KB path, causing "file not found" / SQLite errors. Embed a PGEDGE_CONFIG_VERSION hash in the MCP container env (like RAG) so a config change forces a restart and re-initializes the KB. Also refresh the service instance's UpdatedAt before an update redeploy so the monitor doesn't briefly mark it failed during the restart.
cef2f2d to
d6278f8
Compare
* fix: restart MCP container on KB config change KB config changes (path, provider, model, key) had no effect after an update: config.yaml is bind-mounted so SIGHUP-only reload kept the old KB path, causing "file not found" / SQLite errors. Embed a PGEDGE_CONFIG_VERSION hash in the MCP container env (like RAG) so a config change forces a restart and re-initializes the KB. Also refresh the service instance's UpdatedAt before an update redeploy so the monitor doesn't briefly mark it failed during the restart.
KB config changes (path, provider, model, key) had no effect after an update: config.yaml is bind-mounted so SIGHUP-only reload kept the old KB path, causing "file not found" / SQLite errors. Embed a PGEDGE_CONFIG_VERSION hash in the MCP container env (like RAG) so a config change forces a restart and re-initializes the KB.
Also refresh the service instance's UpdatedAt before an update redeploy so the monitor doesn't briefly mark it failed during the restart.