Feature: multi-term factorization across `Sum` summands by bimalgaudel · Pull Request #564 · ValeevGroup/SeQuant

bimalgaudel · 2026-06-24T21:27:31Z

Introduce opt::factorize_multiterm (multiterm.{hpp,cpp}), which pulls shared factors across the summands of a Sum (AB + AC -> A*(B + C)), generalized to N terms and to the two-sided biclique case (A+B)*(X+Y), emitting the result as a nested ExprPtr. Factorizations are applied only when cost-driven biclique selection shows a positive saving under the same cost model that drives single-term optimization.

Gate it behind a new OptimizeOptions::multiterm flag (MultiTermFactor, default Disable) so existing output is unchanged.

Wire the feature into the eval integration example: a multi_term optimization option and a print_exprs log option. node_() now optimizes the whole sum at once (reorder and multi-term act across summands, so per-term optimize() would defeat them) and can print the optimized expressions before evaluation.

Add test_multiterm.cpp and the corresponding CMake wiring.

Introduce opt::factorize_multiterm (multiterm.{hpp,cpp}), which pulls shared factors across the summands of a Sum (A*B + A*C -> A*(B + C)), generalized to N terms and to the two-sided biclique case (A+B)*(X+Y), emitting the result as a nested ExprPtr. Factorizations are applied only when cost-driven biclique selection shows a positive saving under the same cost model that drives single-term optimization. Gate it behind a new OptimizeOptions::multiterm flag (MultiTermFactor, default Disable) so existing output is unchanged. Wire the feature into the eval integration example: a `multi_term` optimization option and a `print_exprs` log option. node_() now optimizes the whole sum at once (reorder and multi-term act across summands, so per-term optimize() would defeat them) and can print the optimized expressions before evaluation. Add test_multiterm.cpp and the corresponding CMake wiring.

Copilot

Pull request overview

Adds an opt-in multi-term factorization pass to SeQuant’s optimizer to pull shared factors across Sum summands (including two-sided biclique cases), wires it into the eval integration example, and introduces dedicated unit tests.

Changes:

Introduce opt::factorize_multiterm (new multiterm.{hpp,cpp}) and gate it behind OptimizeOptions::multiterm (default Disable).
Integrate multi-term optimization and optional expression-print logging into the eval integration example configuration/driver.
Add comprehensive unit coverage for one-/two-sided folding, cost thresholding, sign/prefactor handling, bucketing, and canonicalization-phase behavior.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
`SeQuant/core/optimize/options.hpp`	Adds `MultiTermFactor` and `OptimizeOptions::multiterm` flag.
`SeQuant/core/optimize/optimize.cpp`	Wires multi-term factorization into `optimize_impl` for `Sum`.
`SeQuant/core/optimize/multiterm.hpp`	Declares the multi-term factorization API and documents behavior.
`SeQuant/core/optimize/multiterm.cpp`	Implements cost-driven biclique selection + emission logic.
`tests/unit/test_multiterm.cpp`	New unit test suite covering structural and cost-model-driven behavior.
`tests/unit/CMakeLists.txt`	Adds `test_multiterm.cpp` to unit test sources.
`tests/integration/eval/options.hpp`	Adds `multi_term` optimization flag and `print_exprs` log flag.
`tests/integration/eval/options.cpp`	Parses/prints help for the new integration options.
`tests/integration/eval/calc.inp`	Enables `multi_term` and `print_exprs` in the example input.
`tests/integration/eval/calc_info.hpp`	Optimizes entire sums at once and optionally prints optimized summands.
`CMakeLists.txt`	Adds multiterm sources to the SeQuant optimize target.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…as separate knobs

ajay-mk · 2026-06-29T13:10:47Z

@bimalgaudel Do you have a corresponding MPQC branch for this?

Krzmbrzl

Does this also work for fusing transposed tensors such as ... t{a1,a2;i1,i2} + ... t{a2,a1;i2;i1} = ... (t{a1,a2;i1,i2} + t{a2,a1;i2;i1}) (as frequently appearing after spintracing of CC equations)?
Does this also work for fusing composite expressions such as A B C D + A B E F = A B (C D + E F)?
Does this also work for something like ACE + ADE + BCE + BDE + ACF + ADF + BCF + BDF = (A + B) (C + D) (E + F) where we end up with more than two fused factors?

I think neither of these have a test-case yet, so if supported we might want to add a test case ensuring that it stays that way :)

Krzmbrzl · 2026-06-29T13:17:05Z

Any specific reason for the name change from fusion to multi-term?

multi-term is the dual of single-term.
fusion is less common of a terminology (although I know I have used 'fusion' sometimes e.g. the branch name)

It might promise more than it holds though. As a proper dual it would basically perform a global optimization instead of "only" performing fusion after single-term optimization 👀

using fusion wouldn't make it clearer.. the global optimization rather than post-sto can be implemented by saying 'optimization scope extended' (it's incremental feature-wise--although more expensive performance wise). the results of such global optimizations is still multi-term factorization.

To me, "fusion" doesn't imply anything beyond what one finds in the docs. "Multiterm" can be (wrongly) understood by itself which might create wrong hopes (exactly because a global optimization is still multi term)

Krzmbrzl · 2026-06-29T13:18:45Z

+/// the summands of a Sum (\c A*B + A*C -> A*(B + C)) using a cost-driven
+/// biclique search. Opt-in; disabled by default so existing output is
+/// unchanged.
+enum class MultiTermFactor { Enable, Disable };


Not sure whether having a boolean-like enum is really needed given that we use a struct with an expressive field name for holding the associated value.

I'd say keep it this way. Being explicit and encode the type shall help us in the long term. I do agree that Enable/Disable, On/Off, Yes/No are inherently the same idea with frictional naming convention. For that a new repo-wide refactor is preferable.

Type-safety is great if you use positional arguments in function calls bit I don't see a clear benefit in the way it is used here. It does lead to code converting the enum into a boolean instead of using a switch though. So if we ever need more entries in the enum existing code will break. And id we don't expect to ever need more values, we might as well make it a Boolean in the first place 🤔

these enums can be used beyond just as part of the optimize options (e.g. for unit testing)

But do we actually do this? To me it seems like the only way to pass this into a useful place is via the options struct. It doesn't seem useful outside the optimization function either.

Krzmbrzl · 2026-06-29T13:21:25Z

+/// Set the occupied (i) and virtual (a) extents so the cost model is
+/// well-defined, then run \p body. Tests whose expected fold depends on the
+/// occ-vs-virt ordering pass explicit extents rather than coupling to the
+/// defaults (see the cost-driven-winner test).
+template <typename F>
+void with_sized_spaces(std::size_t occ, std::size_t virt, F&& body) {
+  using namespace sequant;
+  auto ctx_resetter = set_scoped_default_context(get_default_context().clone());
+  auto reg = get_default_context().mutable_index_space_registry();
+  reg->retrieve_ptr(L"i")->approximate_size(occ);
+  reg->retrieve_ptr(L"a")->approximate_size(virt);
+  std::forward<F>(body)();
+}


I would recommend merging all tests in here into a single test-case with multiple sections. Then you can set all of this in the outermost scope in your test case and don't have to set it for every test individually using this function. That probably makes the tests look a tad tidier as well (one less nesting level).

Krzmbrzl · 2026-06-29T13:24:03Z

+    SECTION("disabled is the default and a no-op vs explicit Disable") {
+      auto const def = optimize(expr);
+      auto const dis = optimize(expr, {.multiterm = MultiTermFactor::Disable});
+      REQUIRE(latex(def) == latex(dis));


why compare the latex representation instead of the expressions themselves? The LaTeX representation leaves room for the expressions being different without this test failing.

Krzmbrzl · 2026-06-29T13:25:45Z

+      // (no contraction to split), so extract_core() returns nullopt and it is
+      // never interned, scored, or consumed. Reassembly emits untouched
+      // summands first (original order), then the folds, so the result is
+      // [D, V*(T + B)].


Suggested change

// [D, V*(T + B)].

// D + V*(T + B)

Krzmbrzl · 2026-06-29T13:28:07Z

+      // The fold is present, with a 2-term partner sum (T + B).
+      auto const factored = find_factored(res);
+      REQUIRE(factored);
+      ExprPtr partner_sum;
+      for (auto const& f : factored->as<Product>().factors())
+        if (f->is<Sum>()) partner_sum = f;
+      REQUIRE(partner_sum);
+      REQUIRE(partner_sum->size() == 2);
+
+      // The bare tensor survived verbatim as a standalone summand (not folded
+      // in), and -- with NoReorder -- precedes the fold in the output.
+      auto const& smands = res->as<Sum>().summands();
+      REQUIRE(smands.front()->is<Tensor>());
+      REQUIRE(smands.front()->as<Tensor>().label() == L"D");
+
+      REQUIRE_THAT(expand(res->clone()), EquivalentTo(expr));


Perhaps these test cases can be made more concise by requiring equality to the actual expression object we think it should become? Aka. by using the REQUIRES_THAT(res, EquivalentTo("D + V (T + B)")) (only with real tensors in there) 👀

That would also make sure that we indeed get exactly the expression we are looking for instead of something that happens to fulfill the selected properties we test for explicitly. As a somewhat contrived example, this test would still pass if D was returned with modified indices.

Plus, it seems like it would make the tests easier to read and understand.

bimalgaudel · 2026-06-29T18:37:02Z

@bimalgaudel Do you have a corresponding MPQC branch for this?

No. Should only require a couple line additions in the SeQuant engine config and its default.

evaleev

Automated review of the multi-term factorization PR. No hard correctness bug surfaced — the coefficient/phase algebra is value-preserving by construction and the nested Product(Sum) fold shapes are handled safely by binarize/reorder. The inline comments below are behavior-stability, convention, doc-drift, and test-gap items.

evaleev · 2026-06-29T19:54:36Z

+  /// part of the optimize() call and therefore only takes effect when \c
+  /// single_term is also on; with \c single_term off, optimize() is not called
+  /// and this flag has no effect.
+  bool multi_term = true;


multi_term defaults to true here while the library OptimizeOptions::multiterm correctly defaults to Disable, and calc.inp now also ships multi_term yes — so multi-term factorization is on by default for the example, flipping its evaluation output. Worse, node_() only invokes optimize() (where multiterm lives) when single_term is on, so single_term no + multi_term yes silently drops factorization with no diagnostic. Consider defaulting this false (matching the "existing output unchanged" intent) or warning on the ignored combination.

evaleev · 2026-06-29T19:54:36Z

+/// the summands of a Sum (\c A*B + A*C -> A*(B + C)) using a cost-driven
+/// biclique search. Opt-in; disabled by default so existing output is
+/// unchanged.
+enum class MultiTermFactor { Enable, Disable };


enum class MultiTermFactor { Enable, Disable } makes the value-initialized state (MultiTermFactor{}, value 0) Enable, which contradicts the adjacent "disabled by default" comment and the safer zero-is-no-op convention. The member default = Disable saves the normal path, but any future value-initialization silently enables factorization. Reorder to { Disable, Enable }.

evaleev · 2026-06-29T19:54:36Z

+
+    // reorder is independent of multiterm
+    if (reorder && result->is<Sum>())
+      return ex<Sum>(opt::reorder(result->as<Sum>()));


Switching to the single-arg opt::reorder(Sum) discards the nodes vector just built above and re-binarizes every summand inside clusters(). When do_multiterm is true each summand is binarized twice (once for factorize_multiterm, once here). The deleted code passed pre-binarized nodes to reorder(sum, nodes) specifically "so they aren't re-built inside clusters()". Correctness and the sequential-binarize invariant are preserved, but this is a measurable optimization-time regression on large CC sums.

evaleev · 2026-06-29T19:54:36Z

+    // A header reports the per-equation term counts before/after optimization
+    // (multi-term factorization can merge summands), a footer closes the block.
+    if (log_opts.print_exprs) {
+      const std::size_t nb = expr->is<Sum>() ? expr->size() : 1;


nb (pre-opt term count) is taken from the original expr, but optimization runs on trimmed = tail_factor(expr). tail_factor rebuilds the sum via ex<Sum>(Sum{...}), which can merge now-identical summands or drop a zero, so the printed # terms before/after opt header can overstate the true pre-opt count fed to optimize(). Cosmetic (header only) — use trimmed's count.

evaleev · 2026-06-29T19:54:36Z

+
+  /// Product of index extents -- the number of elements of a tensor with the
+  /// given result modes (1 for a scalar / empty index list).
+  double tensor_size(index_vector const& idxs) const {


tensor_size re-derives footprint_counter's element count by hand (the class \note acknowledges this): it returns 1 for a scalar where footprint_counter returns 0, and has no tensor-of-tensor / proto-index handling, so a CSV/PNO composite result is silently mis-sized in the saving score. Combined with CostModel ignoring volatile_weight/footprint_weight, the multi-term cost can diverge from single-term's. Impact is cost-suboptimality, never wrong output — flagging so the divergence is a conscious sign-off, and the two cost paths now have to be kept in sync by hand.

evaleev · 2026-06-29T19:54:36Z

+  };
+  set_sizes(/*occ=*/10, /*virt=*/20);
+
+  // Multi-term factorization enabled; NoReorder keeps the fold structure


Every section uses ReorderSum::NoReorder, but the default and integration path (calc_info.hpp, OptimizeOptions{}) uses Reorder, which re-binarizes the new nested Product(Sum) fold summands inside clusters() and feeds Product(Sum,Sum) eval trees to the backends for the first time. That fold-then-reorder path has zero coverage — worth one section that both folds and reorders (plus an integration smoke check).

evaleev requested a review from Copilot June 24, 2026 21:30

Copilot started reviewing on behalf of evaleev June 24, 2026 21:31 View session

Copilot AI reviewed Jun 24, 2026

View reviewed changes

Comment thread SeQuant/core/optimize/optimize.cpp Outdated

Comment thread SeQuant/core/optimize/multiterm.hpp Outdated

bimalgaudel added 3 commits June 24, 2026 20:50

optimize: fix logic to treat mulitterm factorization and reorder sum …

7fc3d96

…as separate knobs

optimize: scope multiterm cost-model parity claim to the base counter

808af40

optimize: delete obsolete opt::Fusion and its test

ea5419a

Krzmbrzl reviewed Jun 29, 2026

View reviewed changes

optimize: address multiterm PR review — tidy tests, document scope

a79d6c7

evaleev reviewed Jun 29, 2026

View reviewed changes

optimize: reorder MultiTermFactor enumerators

7117e61

Uh oh!

Conversation

bimalgaudel commented Jun 24, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

ajay-mk commented Jun 29, 2026

Uh oh!

Krzmbrzl left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bimalgaudel commented Jun 29, 2026

Uh oh!

evaleev left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants