build: pin DuckDB 1.5.4 (merged pg_duckdb #1025), fix Azure orphan reclaim#22
Conversation
|
Warning Review limit reached
More reviews will be available in 36 minutes and 30 seconds. Learn how PR review limits work. Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file). ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits. 🚦 How do rate limits work?CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan refill rate. For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, the refill rate gradually slows as usage increases. The highest same-day bursts are limited more strictly. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughBumps the patched DuckDB/pg_duckdb stack from version 1.5.3 to 1.5.4 across Dockerfiles, entrypoint, workflow, and all documentation (switching the pg_duckdb source pin from a moving PR HEAD to a fixed commit SHA). Separately, adds ChangesDuckDB 1.5.4 Version Bump
Compactor Flat-Walk Orphan Deletion
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Up to standards ✅🟢 Issues
|
| Metric | Results |
|---|---|
| Complexity | 27 |
| Duplication | 0 |
NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
docs/installation.md (1)
1-242:⚠️ Potential issue | 🟡 MinorAdd
mkdocs build --strictvalidation to CI.Per coding guidelines,
docs/**must build withmkdocs build --strictwith no warnings. Currently this validation is absent from CI (only Go linting, tests, and license checks run). Add a CI step to validate the documentation builds successfully, or require local validation before merge.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/installation.md` around lines 1 - 242, The documentation build validation with `mkdocs build --strict` is missing from the CI pipeline. Add a new CI step in `.github/workflows/ci.yml` that runs the mkdocs strict build check to validate that all documentation files in the docs directory build without warnings. This step should run as part of the pre-commit gate (similar to the gofmt, golangci-lint, and unit test steps already present) to catch documentation issues before merge, and should also be included in the local validation script `./run-ci-local.sh`.Source: Coding guidelines
🧹 Nitpick comments (2)
cmd/compactor/flatwalk.go (2)
31-64: ⚡ Quick winUse a context parameter instead of
context.Background()in the iteration loop.Line 47 uses
context.Background()inside the loop, which means long-running listings cannot be cancelled. TheWalkDirmethod should accept a context (or the method signature should be changed toWalkDirCtx) to allow proper cancellation of the bucket listing operation.However, since
iceio.ListableIO.WalkDirhas a fixed signature without context, the pragmatic fix is to capture and use the context from the outer scope if available (e.g., store it inflatWalkIO).♻️ Proposed fix: capture context in flatWalkIO
type flatWalkIO struct { iceio.IO + ctx context.Context } func (f flatWalkIO) WalkDir(root string, fn stdfs.WalkDirFunc) error { bucket, err := bucketOf(f.IO) if err != nil { // Non-blob backend (e.g. local FS): defer to the wrapped walk. if lw, ok := f.IO.(iceio.ListableIO); ok { return lw.WalkDir(root, fn) } return err } u, err := url.Parse(root) if err != nil { return fmt.Errorf("invalid URL %s: %w", root, err) } prefix := strings.TrimPrefix(u.Path, "/") iter := bucket.List(&blob.ListOptions{Prefix: prefix}) // empty Delimiter => flat for { - obj, err := iter.Next(context.Background()) + obj, err := iter.Next(f.ctx) if err == io.EOF { break }Then in
withFlatWalk:- fsF := func(context.Context) (iceio.IO, error) { return flatWalkIO{realIO}, nil } + fsF := func(c context.Context) (iceio.IO, error) { return flatWalkIO{realIO, c}, nil }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cmd/compactor/flatwalk.go` around lines 31 - 64, The WalkDir method in flatWalkIO uses context.Background() when calling iter.Next(), which prevents cancellation of long-running bucket listing operations. To fix this, add a context field to the flatWalkIO struct to store the execution context, then modify the iter.Next() call on the iteration loop to use this stored context instead of context.Background(). Ensure the context is properly initialized when flatWalkIO instances are created, likely in the withFlatWalk initialization logic.
66-86: 💤 Low valueReflection on internal fields is fragile; consider adding a version check or runtime test.
The
bucketOffunction relies on iceberg-go's internal struct having aBucketfield. If iceberg-go changes this in a future version, the compactor will fail at runtime with a confusing error. The comment acknowledges this mirrors iceberg-go's own pattern, which is reasonable.Consider adding a comment with the iceberg-go version this was validated against, or add a build-time/init-time sanity check if you want early failure detection.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cmd/compactor/flatwalk.go` around lines 66 - 86, The bucketOf function relies on reflection to access internal iceberg-go fields without version control or validation, making it fragile to future breaking changes. To fix this, add a comment near the function indicating which version of iceberg-go this code was validated against, and add a runtime sanity check (such as in an init() function or at startup) that validates the Bucket field exists on the expected FileIO struct type early in the program lifecycle, ensuring early failure detection instead of runtime errors during actual bucket operations.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@DUCKDB_1.5_PATCHED.md`:
- Around line 134-140: The postgres_scanner entry in the documentation table
shows it uses `main` @ `916d862b`, but this does not match the actual build
configuration in `docker/iceberg-azure-extension-config-v15.cmake` which pins it
to a different commit (`6b2b12cad3afef61e8a4637e714e8a88895fed1a`). Update the
postgres_scanner row in the table to reflect the actual build pin and add
clarification that it uses a specific commit (NOT `main`) because the main
branch requires a newer database-connector that is incompatible with DuckDB
v1.5.4.
---
Outside diff comments:
In `@docs/installation.md`:
- Around line 1-242: The documentation build validation with `mkdocs build
--strict` is missing from the CI pipeline. Add a new CI step in
`.github/workflows/ci.yml` that runs the mkdocs strict build check to validate
that all documentation files in the docs directory build without warnings. This
step should run as part of the pre-commit gate (similar to the gofmt,
golangci-lint, and unit test steps already present) to catch documentation
issues before merge, and should also be included in the local validation script
`./run-ci-local.sh`.
---
Nitpick comments:
In `@cmd/compactor/flatwalk.go`:
- Around line 31-64: The WalkDir method in flatWalkIO uses context.Background()
when calling iter.Next(), which prevents cancellation of long-running bucket
listing operations. To fix this, add a context field to the flatWalkIO struct to
store the execution context, then modify the iter.Next() call on the iteration
loop to use this stored context instead of context.Background(). Ensure the
context is properly initialized when flatWalkIO instances are created, likely in
the withFlatWalk initialization logic.
- Around line 66-86: The bucketOf function relies on reflection to access
internal iceberg-go fields without version control or validation, making it
fragile to future breaking changes. To fix this, add a comment near the function
indicating which version of iceberg-go this code was validated against, and add
a runtime sanity check (such as in an init() function or at startup) that
validates the Bucket field exists on the expected FileIO struct type early in
the program lifecycle, ensuring early failure detection instead of runtime
errors during actual bucket operations.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 4dd25e04-8d01-4ca8-a373-5990b31f76c0
📒 Files selected for processing (13)
.github/workflows/base-image.ymlCLAUDE.mdDUCKDB_1.5_PATCHED.mdREADME.mdcmd/compactor/flatwalk.gocmd/compactor/go.modcmd/compactor/main.godocker/Dockerfile.duckdb15docker/Dockerfile.duckdb15-basedocker/entrypoint.shdocker/iceberg-azure-extension-config-v15.cmakedocs/architecture.mddocs/installation.md
fd51dfb to
db64ef4
Compare
Pins pg_duckdb to the merged PR #1025 (
c04e6a2) and DuckDB to the v1.5.4 tag —COLDFRONT_DUCKDB_VERSION/OVERRIDE_GIT_DESCRIBE=v1.5.4, patched duckdb-iceberg rebuilt against it. Also fixes Azure orphan reclamation, where iceberg-go's hierarchical walk hitreaddir: not implementedon a directory-marker collision — the compactor now flat-lists the table location. Closes #17.