fredrikj.net / blog /

Optimizing multiprecision LLL in FLINT

August 21, 2025

This post describes some recent improvements to the multiprecision arithmetic in FLINT's LLL lattice basis reduction. LLL has many important applications, including two key tasks where it is used internally by FLINT:

As we shall see, these two applications have gotten significantly faster in the current development version of FLINT in large dimensions, with speedups up to 6x and 60x observed on benchmarks for factoring and integer relations respectively.

Contents

Overview of FLINT's LLL

FLINT's LLL incorporates ideas from the following articles:

The idea is essentially the following: we first do an heuristic LLL reduction using floating-point arithmetic and then rigorously certify its correctness using posteriori analysis of an approximate QR factorization. The certification itself can use floating-point arithmetic if we make judicious use of directed rounding. This strategy is significantly faster than executing the original LLL algorithm in exact rational arithmetic.

Both the heuristic LLL reduction and the certification come in two versions: a fast machine precision implementation, and a slower multiprecision implementation used as a fallback when machine precision fails. This typically becomes necessary in large enough dimensions. Finally, there is an exact certification (using $\mathbb{Q}$ arithmetic and the naive definition of LLL-reducedness) used as an ultimate fallback when even the multiprecision floating-point QR certification is inconclusive.

(Also worth noting is that for lattices with large bit size, FLINT reduces to smaller bit sizes using "ULLL" an unpublished algorithm by Andrew Novocin but inspired by the $\tilde L^1$ algorithm from Andrew Novocin, Damien Stehlé, Gilles Villard, An LLL-reduction algorithm with quasi-linear time complexity. However, this high-level wrapper is largely independent of the underlying floating-point LLLs.)

Changes

Over the last month, I made the following improvements to FLINT's LLL code:

Polynomial factoring benchmark

Here are the results on FLINT's benchmark suite for factoring in $\mathbb{Z}[x]$ (running build/fmpz_poly_factor/profile/p-factor_hard -hard).

Quick explanation: each input polynomial has the specified degree and coefficients smaller than $\pm 2^{\text{Bits}}$. "Factors" is the number of factors over $\mathbb{Z}[x]$ and "Local factors" is the number of factors over $\mathbb{Z}_p[x]$ (usually larger than the number of global factors; it is smaller in some entries in the table below due to a deflation hack). A large number of local factors is what necessitates running LLL to find the correct recombination; the dimension of the lattice is roughly that of the number of local factors.

Polynomial Degree Bits Local factors Factors FLINT 3.3 FLINT 3.4-dev Speedup
P1 156 1407 10 36 0.007 s 0.007 s
P2 196 1391 7 12 0.011 s 0.011 s
P3 336 1982 7 16 0.022 s 0.022 s
P4 462 2511 42 2 0.159 s 0.158 s
P5 64 131 32 1 0.007 s 0.007 s
P6 144 548 24 6 0.010 s 0.010 s
P7 384 187 88 1 0.382 s 0.218 s 1.50x
P8 972 707 54 1 0.120 s 0.114 s
M12_5 792 3623 72 1 0.376 s 0.328 s 1.14x
M12_6 924 4870 84 2 4.920 s 4.851 s
T1 900 915 30 2 0.102 s 0.102 s
T2 900 421 32 2 0.108 s 0.110 s
T3 2401 1541 63 4 1.909 s 1.768 s 1.08x
P7*M12_5 1176 3787 188 2 278 s 51.2 s 5.43x
P7*M12_6 1308 5029 206 3 535 s 96.5 s 5.54x
H1 960 955 131 28 1.927 s 0.935 s 2.06x
H2 4096 2 256 6 444 s 71.4 s 6.21x
C1 1024 478 64 32 0.556 s 0.256 s 2.17x
S7 128 289 64 1 0.134 s 0.088 s 1.52x
S8 256 624 128 1 1.431 s 1.077 s 1.32x
S9 512 1334 256 1 372 s 57.2 s 6.50x
S10 1024 2837 512 1 28548 s 6292 s 4.53x

We observe some very nice speedups on all examples with more than 100 local factors where the multiprecision code in the LLL typically really starts to kick in, and on some smaller examples too. For the largest examples there seems to be a pretty consistent 4x to 6x speedup, with the time for the adversarial Swinnerton-Dyer polynomial S10 dropping all the way from eight hours to less than two hours!

Actually, S9 (and presumably also S10) is still slower than in FLINT 1.6 of ten years ago (Bill Hart reported this example taking 21 seconds with FLINT 1.6 in 2016) due to some missing heuristics in the FLINT 2 (and FLINT 3) factoring code. These heuristics presumably result in easier lattice problems. Nevertheless, it is interesting that we partly seem to catch up just by improving LLL performance for "hard" lattices.

Integer relation benchmark

I added a new profile program (build/fmpz_lll/profile/p-lindep) to test performance on integer relation lattices. This program generates a vector of $n$ uniformly random real numbers, inserts $\log(2)$, $\log(3)$, $\log(6)$ at positions $0.25n$, $0.5$ and $0.75n$, and runs _qqbar_acb_lindep with precision 64, 128, 256, ... bits until the relation $(\ldots, 0, \pm 1, 0, \ldots, 0, \pm 1, 0, \ldots, 0, \mp 1, 0, \ldots)$ is recovered. The "Precision" column in the table shows at what input precision (= bit size of lattice) this occurs.

n Precision FLINT 3.3 FLINT 3.4-dev Speedup
10 64 0.000063 s 0.000056 s 1.12x
20 64 0.000424 s 0.000370 s 1.15x
30 64 0.00104 s 0.000921 s 1.13x
40 64 0.00182 s 0.00165 s 1.10x
50 128 0.00975 s 0.00873 s 1.12x
60 128 0.0134 s 0.0122 s 1.10x
70 128 0.0166 s 0.0153 s 1.08x
80 256 0.0598 s 0.0551 s 1.09x
90 256 0.0692 s 0.0640 s 1.08x
100 256 0.0854 s 0.0795 s 1.07x
110 512 0.386 s 0.363 s 1.06x
120 512 0.415 s 0.390 s 1.06x
130 512 0.488 s 0.464 s 1.05x
140 512 0.547 s 0.518 s 1.06x
150 512 0.598 s 0.568 s 1.05x
160 1024 2.457 s 2.355 s 1.04x
170 1024 2.679 s 2.571 s 1.04x
180 1024 2.860 s 2.750 s 1.04x
190 1024 3.163 s 3.054 s 1.04x
200 1024 3.412 s 3.306 s 1.03x
210 2048 33.59 s 18.53 s 1.81x
220 2048 212.3 s 19.78 s 10.73x
230 2048 56.78 s 19.17 s 2.96x
240 2048 245.0 s 20.05 s 12.22x
250 2048 36.11 s 17.63 s 2.05x
260 2048 286.6 s 20.47 s 14.00x
270 2048 316.3 s 22.89 s 13.82x
280 2048 368.7 s 26.20 s 14.08x
290 4096 1522 s 24.35 s 62.51x
300 4096 1532 s 150.0 s 10.21x
310 4096 1687 s 155.0 s 10.88x
320 4096 1856 s 160.4 s 11.57x

One can think of more interesting integer relation problems (this one would be solvable by brute force), but this one seemed appropriate for a simple benchmark.

One notices significant improvements for $n$ larger than about 200, with a consistent order-of-magnitude speedup for $n$ larger than 250 and a 60x speedup in one case ($n = 290$). It is worth noting that the running time for the new LLL code increases much more smoothly with $n$: the FLINT 3.3 LLL would randomly take 5-10 times longer for some $n$ compared to adjacent one due to hitting the slow fmpq fallback, which is now avoided.

Future

There is much more to optimize about FLINT's LLL.



fredrikj.net  |  Blog index  |  RSS feed  |  Follow me on Mastodon  |  Become a sponsor