Stop zero-traffic App Engine versions after production deploys#3719
Open
MaxGhenis wants to merge 1 commit into
Open
Stop zero-traffic App Engine versions after production deploys#3719MaxGhenis wants to merge 1 commit into
MaxGhenis wants to merge 1 commit into
Conversation
Flexible-environment versions keep their VMs running 24/7 while in SERVING state, even at 0% traffic. Each release leaked a staging and a prod version (4vCPU/24GB, ~$278/month each); 41 had accumulated by July 2026 (~$11k/month). Adds a post-promote job that stops SERVING versions beyond the newest two per prefix, never touching versions that hold traffic. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #3719 +/- ##
=======================================
Coverage 79.72% 79.72%
=======================================
Files 70 70
Lines 4326 4326
Branches 807 807
=======================================
Hits 3449 3449
Misses 657 657
Partials 220 220 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The problem
App Engine flexible environment keeps a VM running 24/7 for every version in SERVING state, even at a 0% traffic split — and our deploy workflow never stops old versions. Every release therefore leaked two 4vCPU/24GB VMs (staging + prod, ~$278/month each).
Impact on the GCP bill (billing account 0160DF-370818-B14FEA):
As of tonight the service had 42 SERVING versions (41 with zero traffic), the oldest from April 22 — a ~$10.6k/month run rate on this repo alone. policyengine-household-api had the same pattern (17 zombies, two dating to August 2025).
The fix
Adds a
stop-old-app-engine-versionsjob that runs afterpromote-production:prod-*,staging-*) — configurable viaKEEP_PER_PREFIX.gcloud app versions start), so rollback via the console keeps working.Steady state becomes ≤4 idle-capable versions (~$1.1k/month ceiling) instead of unbounded growth.
Manual remediation already done (2026-07-02)
prod-2393+staging-2393+ prior pair).min-instances=0on two leftovertesting-codex-*Cloud Run gateways there.Follow-ups
🤖 Generated with Claude Code