Skip to content

Taylor series native exponential implementation#243

Open
Jutho wants to merge 4 commits into
mainfrom
jh/taylorexponentiate
Open

Taylor series native exponential implementation#243
Jutho wants to merge 4 commits into
mainfrom
jh/taylorexponentiate

Conversation

@Jutho

@Jutho Jutho commented Jun 3, 2026

Copy link
Copy Markdown
Member

This is something I started a while ago, but I need to continue working on it. Basically, for higher precision number types, the best approach for computing the exponential is just a Taylor expansion, combined with scaling and squaring (i.e. exp(A) = (exp(A/2^s))^(2^s)). But which s in combination with which order of m of the Taylor expansion is a nontrivial question, that has been studied in the literature, and also involves clever ways of computing higher order polynomials, i.e. polynomials of A of degree m, thus requiring all powers A^k, k in 0:m, without actually having to do m matrix multiplications.

@github-actions

github-actions Bot commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

Your PR no longer requires formatting changes. Thank you for your contribution!

@lkdvos lkdvos force-pushed the jh/taylorexponentiate branch from 84e07c2 to 08d3a71 Compare July 1, 2026 18:00
@lkdvos

lkdvos commented Jul 1, 2026

Copy link
Copy Markdown
Member

@Jutho I hope you don't mind but I took some time to rebase this PR now that @sanderdemeyer's PR is finished.
Doing some benchmarks on my machine, it seems like this is actually faster in most cases, so I even switched out the default to simply use this.
Balancing seems to not cost too much so I've opted to have that on by default to be cautious.

Benchmark: MatrixFunctionViaTaylor vs LinearAlgebra.exp

Timings via BenchmarkTools (@belapsed, best of 50 samples for BLAS floats, 5 for BigFloat) on random randn matrices. Tayl/LA and nobal/LA are the Taylor time (with / without balancing) relative to LinearAlgebra.exp; relerr is ‖got − exp(A)‖ / ‖exp(A)‖.

BLAS floats

Float64

n LA (µs) Taylor (µs) Taylor no-balance (µs) Tayl/LA nobal/LA relerr
8 3.9 5.9 5.6 1.51 1.43 1.3e-15
16 14.6 13.1 11.8 0.90 0.81 5.4e-15
32 57.8 30.1 26.7 0.52 0.46 1.2e-14
64 263.1 136.1 134.6 0.52 0.51 2.1e-14
128 2108.7 1219.7 1188.3 0.58 0.56 3.5e-14
256 6263.0 4161.5 4050.0 0.66 0.65 8.4e-14

ComplexF64

n LA (µs) Taylor (µs) Taylor no-balance (µs) Tayl/LA nobal/LA relerr
8 8.1 10.0 9.0 1.23 1.11 2.2e-15
16 31.1 36.8 30.0 1.19 0.97 1.9e-15
32 152.1 133.3 119.1 0.88 0.78 5.1e-15
64 943.1 801.2 719.9 0.85 0.76 2.5e-14
128 3456.6 2833.4 2591.5 0.82 0.75 3.0e-14
256 13064.8 11885.0 10696.6 0.91 0.82 7.4e-14

Taylor is faster than LinearAlgebra.exp for n ≥ 16 (Float64) and competitive-to-faster for ComplexF64, only paying a penalty at the smallest sizes. Balancing costs ~10–15% on these well-scaled random matrices; it is kept on by default for robustness on badly-scaled inputs.

Arbitrary precision (256-bit)

LinearAlgebra.exp has no method for Matrix{BigFloat}.
Reported below are Taylor timings and the self-consistency residual ‖exp(A)·exp(−A) − I‖₁.

BigFloat

n Taylor (ms) no-balance (ms) ‖E·E(−A) − I‖₁
8 1.46 1.38 1.9e-74
16 9.67 9.62 2.7e-73
32 75.63 76.57 3.0e-71
64 769.48 769.52 1.5e-68

Complex{BigFloat}

n Taylor (ms) no-balance (ms) ‖E·E(−A) − I‖₁
8 5.95 6.00 4.5e-74
16 44.53 43.26 2.9e-72
32 410.79 418.93 6.3e-71
64 3449.70 3432.66 2.0e-69

@lkdvos lkdvos changed the title [WIP] Taylor Exponentiate Taylor series native exponential implementation Jul 1, 2026
@Jutho

Jutho commented Jul 2, 2026

Copy link
Copy Markdown
Member Author

I guess for the BigFloat case, the important timing is how it compares to the current approach in MAK, namely via the generic eigenvalue decomposition. I think that should be very favorable.

@codecov

codecov Bot commented Jul 2, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 83.62069% with 19 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/common/balancing.jl 65.11% 15 Missing ⚠️
src/implementations/exponential.jl 95.83% 3 Missing ⚠️
src/interface/exponential.jl 0.00% 1 Missing ⚠️
Files with missing lines Coverage Δ
src/MatrixAlgebraKit.jl 100.00% <ø> (ø)
src/interface/matrixfunctions.jl 100.00% <ø> (ø)
src/interface/exponential.jl 0.00% <0.00%> (ø)
src/implementations/exponential.jl 97.87% <95.83%> (-2.13%) ⬇️
src/common/balancing.jl 65.11% <65.11%> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Jutho and others added 2 commits July 2, 2026 11:51
Register a pure-Julia scaling-and-squaring Taylor algorithm as
`MatrixFunctionViaTaylor` and fit it to the exponential interface: the
`(τ, A)` forms reuse the existing generic method, and it becomes the
default algorithm for all dense matrices (`DiagonalAlgorithm` still
handles `Diagonal`). Being LAPACK-free, it also covers arbitrary
precision, where `LinearAlgebra.exp` has no method.

The truncation order and number of squarings are chosen to minimize the
Paterson-Stockmeyer matrix-multiplication count under the Taylor
remainder bound, the polynomial is evaluated with incrementally built
coefficients, and squaring reuses a ping-pong buffer. `balance!` is
rewritten as a clean, type-generic Parlett-Reinsch scaling and wired in
as an optional preprocessing step.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@lkdvos lkdvos force-pushed the jh/taylorexponentiate branch from a7f2e42 to 0d07fe9 Compare July 2, 2026 15:51
@lkdvos

lkdvos commented Jul 2, 2026

Copy link
Copy Markdown
Member

It seems like for BigFloat the decompositions are actually faster in some cases and slower in other, in particular for hermitian inputs. I'm not sure what should be the conclusion of what algorithm to choose by default, or what is causing this since that is slightly unexpected to me, but I guess the matmuls just aren't as efficient, so the more complex work needed for eigensolvers is less bad.

[Float64]  timing — general matrix   (min time per call)
      n      Taylor          LA         Eig
      8       4.5µs       3.3µs      13.3µs
     16       8.8µs      10.9µs      43.4µs
     32      24.6µs      36.4µs     217.3µs
     64     157.8µs     205.0µs     882.1µs
    128      2.24ms      2.34ms      9.20ms
    256      3.28ms      7.41ms     33.83ms

[Float64]  timing — hermitian matrix   (min time per call)
      n      Taylor          LA         Eig        Eigh
      8       4.0µs       7.9µs      12.6µs       9.5µs
     16       7.8µs      23.1µs      40.5µs      25.2µs
     32      18.1µs      74.0µs     218.4µs      76.5µs
     64     110.8µs     329.7µs     960.2µs     333.1µs
    128     836.6µs      3.27ms      9.32ms      3.22ms
    256      2.42ms     12.51ms     38.67ms     11.35ms

[ComplexF64]  timing — general matrix   (min time per call)
      n      Taylor          LA         Eig
      8       9.3µs       7.3µs      24.5µs
     16      29.5µs      27.6µs      91.9µs
     32     138.1µs     132.3µs     424.4µs
     64     944.5µs     899.1µs      2.11ms
    128      2.77ms      2.88ms     20.68ms
    256     17.37ms     17.80ms     80.29ms

[ComplexF64]  timing — hermitian matrix   (min time per call)
      n      Taylor          LA         Eig        Eigh
      8       7.4µs     114.4µs      21.7µs     115.2µs
     16      21.6µs     257.0µs      80.8µs     256.6µs
     32      95.6µs     587.6µs     382.1µs     594.0µs
     64     957.5µs     998.9µs      2.02ms      1.04ms
    128      2.06ms      7.69ms     21.79ms      7.75ms
    256      7.51ms     24.30ms     91.26ms     21.90ms

[BigFloat]  timing — general matrix   (min time per call)
      n      Taylor         Eig
      8      1.09ms      1.88ms
     16      8.32ms     11.80ms
     32     67.12ms     93.51ms
     64    695.57ms    731.99ms

[BigFloat]  timing — hermitian matrix   (min time per call)
      n      Taylor         Eig        Eigh
      8     844.7µs      1.28ms      1.70ms
     16      6.41ms      8.52ms      9.51ms
     32     51.65ms     70.95ms     50.23ms
     64    567.32ms    561.44ms    302.70ms

[Complex{BigFloat}]  timing — general matrix   (min time per call)
      n      Taylor         Eig
      8      5.15ms      4.60ms
     16     41.52ms     35.76ms
     32    341.52ms    274.70ms
     64      2.951s      2.114s

[Complex{BigFloat}]  timing — hermitian matrix   (min time per call)
      n      Taylor         Eig        Eigh
      8      3.87ms      3.56ms      2.36ms
     16     31.55ms     28.01ms     13.73ms
     32    247.06ms    197.29ms     98.47ms
     64      2.256s      1.650s    687.86ms

@lkdvos lkdvos force-pushed the jh/taylorexponentiate branch from 357d1e6 to b111776 Compare July 2, 2026 16:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants