Skip to content

Support native bit_and / bit_or / bit_xor aggregate #2362

Description

@zhuxiangyi

Background

Auron does not implement the bitwise aggregates bit_and / bit_or / bit_xor natively, so they fall back to the generic UDAF path (a JNI call back into the JVM) and lose vectorized acceleration. These aggregates are commonly used for flag/bitmask rollups (e.g. bit_or to union feature flags across rows, bit_and for common flags) and parity/checksum use cases.

Proposal

Add native bit_and / bit_or / bit_xor aggregates:

  1. Implement a generic AggBitwise<P> in datafusion-ext-plans (agg/bitwise.rs), parameterized by the bitwise operator, with type aliases AggBitAnd / AggBitOr / AggBitXor. The accumulator is a single column of the input type: the first non-null value initializes the slot and every subsequent value is folded in with the operator. The operators are associative and commutative, so the result is independent of the visiting/merge order, and null inputs are skipped (an all-null group yields null). Only integral inputs (Int8/Int16/Int32/Int64) are accepted.
  2. Wire the three functions through the AggFunction enum, create_agg, the protobuf contract (BIT_AND / BIT_OR / BIT_XOR), the protobuf::AggFunction -> AggFunction conversion, and the window-aggregate mapping.
  3. Add the BitAndAgg / BitOrAgg / BitXorAgg expression conversions in NativeConverters.convertAggregateExpr.
  4. Declare the native aggregate buffer schema in NativeAggBase.computeNativeAggBufferDataTypes (Seq(dataType), a single column) so the partial -> shuffle -> final buffer schema matches the native side.

Scope / Non-goals

  • Integral inputs only (byte / short / int / long), matching Spark's BitAndAgg / BitOrAgg / BitXorAgg.
  • Window aggregate (bit_* over a window) reuses the same AggFunction.

Tests

  • Rust unit test agg_exec::test::test_agg_bitwise: partial -> final two-phase aggregation over a nullable integer column, verifying bit_and / bit_or / bit_xor (including null skipping).
  • Scala end-to-end test in AuronDataFrameAggregateSuite ("native bit_and / bit_or / bit_xor aggregate", spark34 + spark35): a grouped aggregate exercising the full partial -> shuffle -> final native path (including an all-null group), asserting correct values and that the plan offloads to NativeAggBase.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions