Skip to content

build: pin DuckDB 1.5.4 (merged pg_duckdb #1025), fix Azure orphan reclaim#22

Merged
vyruss merged 3 commits into
mainfrom
build/duckdb-1.5.4
Jun 19, 2026
Merged

build: pin DuckDB 1.5.4 (merged pg_duckdb #1025), fix Azure orphan reclaim#22
vyruss merged 3 commits into
mainfrom
build/duckdb-1.5.4

Conversation

@vyruss

@vyruss vyruss commented Jun 19, 2026

Copy link
Copy Markdown
Collaborator

Pins pg_duckdb to the merged PR #1025 (c04e6a2) and DuckDB to the v1.5.4 tag — COLDFRONT_DUCKDB_VERSION/OVERRIDE_GIT_DESCRIBE=v1.5.4, patched duckdb-iceberg rebuilt against it. Also fixes Azure orphan reclamation, where iceberg-go's hierarchical walk hit readdir: not implemented on a directory-marker collision — the compactor now flat-lists the table location. Closes #17.

@coderabbitai

coderabbitai Bot commented Jun 19, 2026

Copy link
Copy Markdown

Review Change Stack

Warning

Review limit reached

@vyruss, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 36 minutes and 30 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits.

🚦 How do rate limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan refill rate.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, the refill rate gradually slows as usage increases. The highest same-day bursts are limited more strictly.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: d27ced67-80f8-48a7-824c-3185d064e8bc

📥 Commits

Reviewing files that changed from the base of the PR and between f7ee473 and db64ef4.

📒 Files selected for processing (1)
  • DUCKDB_1.5_PATCHED.md
📝 Walkthrough

Walkthrough

Bumps the patched DuckDB/pg_duckdb stack from version 1.5.3 to 1.5.4 across Dockerfiles, entrypoint, workflow, and all documentation (switching the pg_duckdb source pin from a moving PR HEAD to a fixed commit SHA). Separately, adds flatwalk.go to the compactor, which overrides Iceberg-Go's hierarchical WalkDir with a flat prefix-based blob listing to fix orphan-deletion failures on GCS/Azure-like backends, and wires it into doOrphans.

Changes

DuckDB 1.5.4 Version Bump

Layer / File(s) Summary
Dockerfile.duckdb15-base build stage changes
docker/Dockerfile.duckdb15-base
Switches pg_duckdb source pin from PR #1025 HEAD to commit c04e6a2, sets OVERRIDE_GIT_DESCRIBE=v1.5.4, replaces the duckdb submodule in the iceberg-builder stage with a fresh v1.5.4 clone, and sets COLDFRONT_DUCKDB_VERSION=v1.5.4 in the runtime stage with an updated OCI label.
Entrypoint, Dockerfile.duckdb15, workflow, and cmake wiring
docker/entrypoint.sh, docker/Dockerfile.duckdb15, .github/workflows/base-image.yml, docker/iceberg-azure-extension-config-v15.cmake
Updates default COLDFRONT_DUCKDB_VERSION to v1.5.4 in entrypoint.sh and bumps version comment strings to 1.5.4 in the app Dockerfile, workflow header, and cmake file.
Documentation and reference updates
README.md, CLAUDE.md, DUCKDB_1.5_PATCHED.md, docs/architecture.md, docs/installation.md
Updates all version references, version-pin tables (commit SHAs, submodule tags), build instructions, and bare-metal prerequisites from 1.5.3 to 1.5.4.

Compactor Flat-Walk Orphan Deletion

Layer / File(s) Summary
flatWalkIO implementation and go.mod promotion
cmd/compactor/flatwalk.go, cmd/compactor/go.mod
Adds flatWalkIO overriding WalkDir with a flat bucket.List loop, a bucketOf reflection helper, flatDirEntry/flatFileInfo adapters over *blob.ListObject, and a withFlatWalk table rebuilder. Promotes gocloud.dev from indirect to direct dependency.
Compactor orphan deletion integration
cmd/compactor/main.go
doOrphans calls withFlatWalk to reassign the loaded table before orphan scanning, returning on error, so orphan deletion uses the flat traversal path.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the two main changes: DuckDB 1.5.4 pinning with pg_duckdb #1025 merged, and Azure orphan reclaim fix with flat-walk approach.
Description check ✅ Passed The description is directly related to the changeset, explaining both the version pinning and the Azure orphan reclamation fix with technical context about the flat-listing solution and issue closure.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch build/duckdb-1.5.4

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codacy-production

codacy-production Bot commented Jun 19, 2026

Copy link
Copy Markdown

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

🟢 Metrics 27 complexity · 0 duplication

Metric Results
Complexity 27
Duplication 0

View in Codacy

NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
docs/installation.md (1)

1-242: ⚠️ Potential issue | 🟡 Minor

Add mkdocs build --strict validation to CI.

Per coding guidelines, docs/** must build with mkdocs build --strict with no warnings. Currently this validation is absent from CI (only Go linting, tests, and license checks run). Add a CI step to validate the documentation builds successfully, or require local validation before merge.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/installation.md` around lines 1 - 242, The documentation build
validation with `mkdocs build --strict` is missing from the CI pipeline. Add a
new CI step in `.github/workflows/ci.yml` that runs the mkdocs strict build
check to validate that all documentation files in the docs directory build
without warnings. This step should run as part of the pre-commit gate (similar
to the gofmt, golangci-lint, and unit test steps already present) to catch
documentation issues before merge, and should also be included in the local
validation script `./run-ci-local.sh`.

Source: Coding guidelines

🧹 Nitpick comments (2)
cmd/compactor/flatwalk.go (2)

31-64: ⚡ Quick win

Use a context parameter instead of context.Background() in the iteration loop.

Line 47 uses context.Background() inside the loop, which means long-running listings cannot be cancelled. The WalkDir method should accept a context (or the method signature should be changed to WalkDirCtx) to allow proper cancellation of the bucket listing operation.

However, since iceio.ListableIO.WalkDir has a fixed signature without context, the pragmatic fix is to capture and use the context from the outer scope if available (e.g., store it in flatWalkIO).

♻️ Proposed fix: capture context in flatWalkIO
 type flatWalkIO struct {
 	iceio.IO
+	ctx context.Context
 }
 
 func (f flatWalkIO) WalkDir(root string, fn stdfs.WalkDirFunc) error {
 	bucket, err := bucketOf(f.IO)
 	if err != nil {
 		// Non-blob backend (e.g. local FS): defer to the wrapped walk.
 		if lw, ok := f.IO.(iceio.ListableIO); ok {
 			return lw.WalkDir(root, fn)
 		}
 		return err
 	}
 	u, err := url.Parse(root)
 	if err != nil {
 		return fmt.Errorf("invalid URL %s: %w", root, err)
 	}
 	prefix := strings.TrimPrefix(u.Path, "/")
 	iter := bucket.List(&blob.ListOptions{Prefix: prefix}) // empty Delimiter => flat
 	for {
-		obj, err := iter.Next(context.Background())
+		obj, err := iter.Next(f.ctx)
 		if err == io.EOF {
 			break
 		}

Then in withFlatWalk:

-	fsF := func(context.Context) (iceio.IO, error) { return flatWalkIO{realIO}, nil }
+	fsF := func(c context.Context) (iceio.IO, error) { return flatWalkIO{realIO, c}, nil }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cmd/compactor/flatwalk.go` around lines 31 - 64, The WalkDir method in
flatWalkIO uses context.Background() when calling iter.Next(), which prevents
cancellation of long-running bucket listing operations. To fix this, add a
context field to the flatWalkIO struct to store the execution context, then
modify the iter.Next() call on the iteration loop to use this stored context
instead of context.Background(). Ensure the context is properly initialized when
flatWalkIO instances are created, likely in the withFlatWalk initialization
logic.

66-86: 💤 Low value

Reflection on internal fields is fragile; consider adding a version check or runtime test.

The bucketOf function relies on iceberg-go's internal struct having a Bucket field. If iceberg-go changes this in a future version, the compactor will fail at runtime with a confusing error. The comment acknowledges this mirrors iceberg-go's own pattern, which is reasonable.

Consider adding a comment with the iceberg-go version this was validated against, or add a build-time/init-time sanity check if you want early failure detection.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cmd/compactor/flatwalk.go` around lines 66 - 86, The bucketOf function relies
on reflection to access internal iceberg-go fields without version control or
validation, making it fragile to future breaking changes. To fix this, add a
comment near the function indicating which version of iceberg-go this code was
validated against, and add a runtime sanity check (such as in an init() function
or at startup) that validates the Bucket field exists on the expected FileIO
struct type early in the program lifecycle, ensuring early failure detection
instead of runtime errors during actual bucket operations.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@DUCKDB_1.5_PATCHED.md`:
- Around line 134-140: The postgres_scanner entry in the documentation table
shows it uses `main` @ `916d862b`, but this does not match the actual build
configuration in `docker/iceberg-azure-extension-config-v15.cmake` which pins it
to a different commit (`6b2b12cad3afef61e8a4637e714e8a88895fed1a`). Update the
postgres_scanner row in the table to reflect the actual build pin and add
clarification that it uses a specific commit (NOT `main`) because the main
branch requires a newer database-connector that is incompatible with DuckDB
v1.5.4.

---

Outside diff comments:
In `@docs/installation.md`:
- Around line 1-242: The documentation build validation with `mkdocs build
--strict` is missing from the CI pipeline. Add a new CI step in
`.github/workflows/ci.yml` that runs the mkdocs strict build check to validate
that all documentation files in the docs directory build without warnings. This
step should run as part of the pre-commit gate (similar to the gofmt,
golangci-lint, and unit test steps already present) to catch documentation
issues before merge, and should also be included in the local validation script
`./run-ci-local.sh`.

---

Nitpick comments:
In `@cmd/compactor/flatwalk.go`:
- Around line 31-64: The WalkDir method in flatWalkIO uses context.Background()
when calling iter.Next(), which prevents cancellation of long-running bucket
listing operations. To fix this, add a context field to the flatWalkIO struct to
store the execution context, then modify the iter.Next() call on the iteration
loop to use this stored context instead of context.Background(). Ensure the
context is properly initialized when flatWalkIO instances are created, likely in
the withFlatWalk initialization logic.
- Around line 66-86: The bucketOf function relies on reflection to access
internal iceberg-go fields without version control or validation, making it
fragile to future breaking changes. To fix this, add a comment near the function
indicating which version of iceberg-go this code was validated against, and add
a runtime sanity check (such as in an init() function or at startup) that
validates the Bucket field exists on the expected FileIO struct type early in
the program lifecycle, ensuring early failure detection instead of runtime
errors during actual bucket operations.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 4dd25e04-8d01-4ca8-a373-5990b31f76c0

📥 Commits

Reviewing files that changed from the base of the PR and between 90bf842 and f7ee473.

📒 Files selected for processing (13)
  • .github/workflows/base-image.yml
  • CLAUDE.md
  • DUCKDB_1.5_PATCHED.md
  • README.md
  • cmd/compactor/flatwalk.go
  • cmd/compactor/go.mod
  • cmd/compactor/main.go
  • docker/Dockerfile.duckdb15
  • docker/Dockerfile.duckdb15-base
  • docker/entrypoint.sh
  • docker/iceberg-azure-extension-config-v15.cmake
  • docs/architecture.md
  • docs/installation.md

Comment thread DUCKDB_1.5_PATCHED.md
@vyruss vyruss force-pushed the build/duckdb-1.5.4 branch from fd51dfb to db64ef4 Compare June 19, 2026 17:08
@vyruss vyruss merged commit 75b8200 into main Jun 19, 2026
6 checks passed
@vyruss vyruss deleted the build/duckdb-1.5.4 branch June 19, 2026 18:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Azure ADLS Gen2 backend — 3 bugs found: app role LocalFileSystem, concurrent writer 409, compactor manifest format-version mismatch

1 participant