fredrikj.net / blog /
FLINT 3.4
November 25, 2025
FLINT 3.4 is out! A full list of changes can be found in the history section of the documentation. Read on for a survey of the major new features and improvements. By my count, 23 authors contributed to this release; apologies, as usual, to anyone whose contribution is not included in this blog post.
Contents
- LLL, integer relations and factoring in $\mathbb{Z}[x]$
- Word-size primality testing and integer factorization
- Generic power series
- Polynomial multiplication
- Directed rounding for nfloat
- Gröbner bases and multivariate rational functions over $\operatorname{GF}(p)$
- Irreducible polynomials over $\operatorname{GF}(p)$
- Matrix permanents
- Example programs
- Generic QR factorization
- Polynomial interpolation
- Power series solutions of ODEs
- Bug fixes
- Housekeeping
- October workshop results
LLL, integer relations and factoring in $\mathbb{Z}[x]$
The multiprecision LLL is now significantly faster mainly as a result of using nfloat, arf and arb instead of mpf, mpfr and fmpq for arithmetic, in combination with various minor tweaks. Here are two sample consequences of the LLL improvements between FLINT 3.3 and 3.4:
- The time to factor the degree-512 Swinnerton-Dyer polynomial $S_9(x)$ dropped from 372 seconds to 57 seconds (a 6.5x speedup).
- The time to find an integer relation in a length-300 vector of real numbers dropped from 1532 seconds to 150 seconds (a 10x speedup).
This work was presented in more detail in my last blog post.
Word-size primality testing and integer factorization
Primality testing with n_is_prime and factorization with n_factor for single-word integers (i.e. up to 64 bits) have been improved significantly in FLINT 3.4. This also benefits fmpz_is_prime and fmpz_factor when the input is a single-word integer. To a lesser extent, factorization of double-word integers with fmpz_factor has also been improved.
Primality testing
The single-word primality test has essentially been rewritten completely. The new algorithm looks as follows:
- Use a $O(1)$ bit-table lookup up to $2^{15}$ (previously $2^{12}$).
- Do trial division by a small set of primes (this part of the algorithm is essentially unchanged).
- Up to $2^{32}$, do a base-2 strong probable prime test and eliminate the 2314 composites that pass this test using a lookup table (previously another probable prime test with a well-chosen base would be used). The modular exponentiation for the base-2 test is optimized using Shoup reduction.
- Up to $2^{64}$, do a base-2 strong probable prime test followed by a single additional base-$b$ test where $b$ is chosen to guarantee that we detect all 31894014 composites that pass this test. Previously FLINT would use 2-3 strong probable prime tests or a BPSW test in this range. The single sufficient base for the new test is looked up in a precomputed 350 KB hash table, requiring only 2/3 the space of the 512 KB hash table design previously published by Forisek and Jancina.
The improvement is illustrated by the following plot.
Figure: For each bit size, we test 10000 random odd integers for primality using n_is_prime. The left plot shows the average time per test and the right plot shows the speedup for the whole batch.
Factoring
For integer factorization, the following improvements have been made:
- Optimize trial division by using the Granlund-Montgomery divisibility test instead of Euclidean division with remainder, and increase the trial prime limit to $2^{16}$.
- Use Bill Hart's "one line factor" algorithm instead of SQUFOF up to 50 bits (previously 39 bits).
- Do a primality test earlier (after some initial rounds of trial division, not at the end of all trial division).
- Improve the square root calculation in one line factor and SQUFOF.
- Use SQUFOF instead of the quadratic sieve in the 65-80 bit range (previously SQUFOF was only used up to 64 bits).
The improvement is illustrated by the following plot.
Figure: For each bit size, we factor 1000 random integers using n_factor up to 64 bits and fmpz_factor otherwise. The left plot shows the average time per factorization (note that the time axis is logarithmic) and the right plot shows the speedup for the whole batch.
Generic power series
Power series over generic rings were available semi-privately in the previous version of FLINT, but now have a fully public interface: the gr_series module. The implementation has also been improved in various ways. Note that there are two distinct kinds of power series:
- gr_series_mod - truncated power series, i.e. elements of $R[[x]] / x^n$, with exact reresentation of elements
- gr_series - infinite power series, i.e. elements of $R[[x]]$, represented as finite truncations with a per-element error term $O(x^n)$. Polynomials of small degree can be represented exactly.
One of the improvements to the existing code is that potentially invalid operations are detected more robustly. Let's look at division $\mathbb{Z}[[x]]$ as an example. Define the exact power series (polynomials) $$A = 7 + 3 x + 5 x^2$$ $$B = 12 + 2 x + 19 x^3 + 3 x^4$$ and form their product with precision 6: $$AB = 84 + 50x + 66x^2 + 143x^3 + 78x^4 + 104x^5 + O(x^6).$$ If we now attempt to evaluate $(AB) / B$, we get an error:
>>> from flint_ctypes import *
>>> x = PowerSeriesRing(ZZ, 6).gen()
>>> A = 7 + 3*x + 5*x**2
>>> B = 12 + 2*x + 19*x**3 + 3*x**4
>>> (A * B)
84 + 50*x + 66*x^2 + 143*x^3 + 78*x^4 + 104*x^5 + O(x^6)
>>> (A * B) / B
Traceback (most recent call last):
...
FlintUnableError: failed to compute x / y in {Power series over Integer ring (fmpz) with precision O(x^6)} for {x = 84 + 50*x + 66*x^2 + 143*x^3 + 78*x^4 + 104*x^5 + O(x^6)}, {y = 12 + 2*x + 19*x^3 + 3*x^4}
The reason is that the finite-precision enclosure of $AB$ contains power series which are not exactly divisible by $B$, e.g. $AB + x^6$.
The division does go through if we use divexact (division method which in any ring asserts the quotient exists) instead of the standard checked division, if replace $B$ by a power series $C$ that is invertible, if we increase the precision so that the inputs to the division are exact, or if we work over $\mathbb{Q}$ instead:
>>> (A * B).divexact(B) 7 + 3*x + 5*x^2 + O(x^6) >>> C = -1 + 2*x + 19*x**3 + 3*x**4 >>> (A * C) / C 7 + 3*x + 5*x^2 + O(x^6) >>> x = PowerSeriesRing(ZZ, 7).gen() >>> A = 7 + 3*x + 5*x**2 >>> B = 12 + 2*x + 19*x**3 + 3*x**4 >>> A * B 84 + 50*x + 66*x^2 + 143*x^3 + 78*x^4 + 104*x^5 + 15*x^6 >>> (A * B) / B 7 + 3*x + 5*x^2 >>> x = PowerSeriesRing(QQ, 6).gen() >>> A = 7 + 3*x + 5*x**2 >>> B = 12 + 2*x + 19*x**3 + 3*x**4 >>> (A * B) / B 7 + 3*x + 5*x^2 + O(x^6)
Another example: the (ordinary) exponential function is undefined for infinite power series in finite characteristic, but makes sense for truncated power series in characteristic $p \ge n$:
>>> PowerSeriesRing(ZZmod(7)).gen().exp()
...
FlintUnableError: failed to compute exp(x) in {Power series over Integers mod 7 (_gr_nmod) with precision O(x^6)} for {x = x}
>>> PowerSeriesModRing(ZZmod(7), 7).gen().exp()
1 + x + 4*x^2 + 6*x^3 + 5*x^4 + x^5 + 6*x^6 (mod x^7)
Polynomial multiplication
Tiny moduli
Multiplication of nmod_poly with tiny moduli $p \le 23$ is up to twice as fast for polynomials of intermediate length (see plot).
This is achieved by packing two coefficients into each double when multiplying using fft_small.
Modular multiplication
The new function nmod_poly_mulmod_precond allows computing $f g \bmod h$ where both $g$ and $h$ are fixed polynomials faster than a modular polynomial multiplication where only $h$ is fixed. This is currently most effective for degrees smaller than about 200, where it can yield more than a factor two speedup (in some cases as much as a factor ten). In the future, it is planned that this function will incorporate FFT caching to more significantly speed up huge degrees where it currently gives at most a few percent improvement.
In addition, modular multiplication with preconditioning only on the modulus (nmod_poly_mulmod_preinv) has been optimized for the case where the modulus is sparse.
Nested univariates
Multiplication in nested polynomial rings like $R[x][y][z]$ represented using the gr_poly type now uses Kronecker substitution when the base ring $R$ supports fast univariate polynomial multiplication (which is the case for almost all base rings in FLINT). This also speeds up multiplying power series with polynomial coefficients.
| $n$ | FLINT 3.3 | FLINT 3.4 | Speedup |
|---|---|---|---|
| 2 | 1.31e-06 s | 1.28e-06 s | 1.02x |
| 4 | 1.61e-05 s | 1.62e-05 s | 0.99x |
| 8 | 0.000227 s | 0.000144 s | 1.58x |
| 16 | 0.00318 s | 0.000906 s | 3.51x |
| 32 | 0.0428 s | 0.00285 s | 15.0x |
| 64 | 0.371 s | 0.0134 s | 27.7x |
| 128 | 3.173 s | 0.0586 s | 54.1x |
| 256 | 28.959 s | 0.415 s | 69.8x |
| 512 | 285.959 s | 1.731 s | 165.2x |
| 1024 | 1708.81 s | 8.093 s | 211.1x |
The table gives example timings multiplying in $\mathbb{R}[x][y]$ when the inputs have degree $n-1$ in both variables, i.e. $n^2$ terms, and the coefficients are uniformly random 512-bit arb balls.
Directed rounding for nfloat
It is now possible to set nfloat to round results towards $-\infty$ or towards $+\infty$, as an alternative to the default behavior in which rounding is unspecified (for maximum speed). Directed rounding is supported by a decent subset of methods (not all methods) including matrix multiplication. This feature is currently used in FLINT's floating-point LLL as noted above, replacing the older and slower code based on MPFR.
Gröbner bases and multivariate rational functions over $\operatorname{GF}(p)$
The new fmpz_mod_mpoly_q module, implemented by Andrii Yanovets, allows working in multivariate rational function fields $\operatorname{GF}(p) (x_1, \ldots, x_n)$ for arbitrary-size primes $p$. Andrii also added fmpz_mod_mpoly_buchberger_naive for computing Gröbner bases in $\operatorname{GF}(p)[x_1, \ldots, x_n]$. In both cases, the functionality mirrors existing features for $\mathbb{Z}$.
As the name suggests, fmpz_mod_mpoly_buchberger_naive is a fairly standard implementation of Buchberger's algorithm and should not be expected to be competitive with state of the art Gröbner bases engines; the main point is that it works out of the box for FLINT users. That being said, when testing this code, we found examples where FLINT seemed to be more robust than alternatives. In SageMath,
sage: R.crashes my computer after 30 minutes. It is possible to get a result out of SageMath in about 10 seconds, but this requires manually converting to the more favorable degrevlex term order and transforming back to lex; for some reason this isn't automatic. FLINT manages to brute force out the Gröbner basis directly in the lex term order in 14 seconds. Magma finishes this Gröbner basis computation in a fraction of a second using an automatic conversion, but did not finish in 10 minutes when asked to compute directly in the lex order. It would be interesting to do more a systematic comparison with other systems to see if there are classes of ideals for which FLINT's direct implementation is competitive.= PolynomialRing(GF(4722366482869611659327), order="lex") sage: ....: F = [4491077222722822274764*x1^3*x2^2*x3^3+1142989093829542840957*x1*x2*x3+161135877041952360283*x2^2*x3^2+1963195568815792903903*x3^2, 473518656873042772582*x1^3*x2^2*x3^2+538964973504606229287*x1^3*x3^3+ ....: 2996428526938692536564*x1^2*x2^2*x3^2+1389613235437080610847*x1*x3] ....: sage: %time G = Ideal(F).groebner_basis()
Irreducible polynomials over $\operatorname{GF}(p)$
FLINT 3.4 includes new functionality for deterministically generating minimal-weight monic irreducible polynomials of any specified degree $n$ over any word-size prime-order field $\operatorname{GF}(p)$. Minimal weight means that the chosen polynomial has the minimum possible number of nonzero terms. For example, there are 624 irreducible monic polynomials of degree 5 over $\operatorname{GF}(5)$ of which 16 are trinomials (there are no irreducible binomials of this degree); FLINT returns $x^5 + x^2 + 2$, the first irreducible trinomial with respect to a specific lexicographic order.
Minimal-weight irreducibles are now used by default to construct finite fields of type fq_nmod when a Conway polynomial is not available, in contrast to the previous behavior of generating a random irreducible which in general would not be as sparse as possible. As a result, constructing finite extension fields of large degree is now significantly faster, and arithmetic in such fields will also be faster in some cases due to improved sparsity. Users can also explicitly choose a sparse polynomial even when a Conway polynomial is available.
Here is an example: to construct $\operatorname{GF}(5^{3125})$, FLINT 3.3 would take 309 seconds to generate the rather dense irreducible $$ \begin{aligned} &x^{3125} + 4x^{3084} + 4x^{3050} + 4x^{3047} + 4x^{3028} + x^{3012} + 2x^{3003} + 2x^{2983} + 4x^{2974} + x^{2973} + x^{2964} + x^{2951} + {}\\ &x^{2933} + x^{2914} + x^{2894} + 4x^{2877} + x^{2871} + 2x^{2855} + 2x^{2852} + 3x^{2818} + 2x^{2808} + x^{2799} + 2x^{2798} + {}\\ &4x^{2772} + 3x^{2770} + 3x^{2683} + 4x^{2670} + 4x^{2661} + 4x^{2654} + x^{2641} + x^{2636} + 3x^{2615} + 2x^{2608} + 4x^{2592} + {}\\ &4x^{2571} + 3x^{2544} + 2x^{2537} + x^{2534} + x^{2512} + 4x^{2511} + 2x^{2505} + 4x^{2504} + 4x^{2493} + x^{2486} + 3x^{2480} + {}\\ &3x^{2468} + 2x^{2465} + x^{2459} + 3x^{2439} + 4x^{2431} + 2x^{2428} + x^{2415} + 2x^{2389} + 3x^{2383} + x^{2377} + 4x^{2357} + {}\\ &x^{2350} + 3x^{2345} + x^{2342} + 2x^{2318} + 3x^{2280} + 2x^{2278} + 3x^{2275} + x^{2258} + 4x^{2155} + 4x^{2148} + x^{2147} + {}\\ &4x^{2103} + 2x^{2101} + x^{2083} + 4x^{2066} + 3x^{2063} + x^{2058} + 3x^{2017} + 3x^{1998} + 2x^{1995} + 3x^{1992} + 2x^{1987} + {}\\ &3x^{1975} + 2x^{1972} + 2x^{1954} + x^{1948} + 2x^{1920} + x^{1919} + 2x^{1884} + 2x^{1873} + 2x^{1857} + 3x^{1853} + 4x^{1848} + {}\\ &2x^{1847} + 3x^{1846} + 2x^{1843} + x^{1841} + 3x^{1840} + x^{1822} + x^{1819} + 4x^{1815} + 4x^{1812} + x^{1786} + 3x^{1777} + {}\\ &2x^{1774} + 4x^{1771} + 3x^{1749} + 4x^{1747} + 4x^{1745} + 4x^{1737} + 3x^{1735} + x^{1733} + 2x^{1708} + x^{1706} + 3x^{1689} + {}\\ &3x^{1686} + 2x^{1668} + x^{1666} + x^{1663} + 4x^{1624} + 3x^{1616} + 3x^{1613} + 2x^{1592} + 3x^{1569} + 4x^{1556} + x^{1521} + {}\\ &2x^{1501} + x^{1495} + x^{1487} + 4x^{1457} + 4x^{1454} + 4x^{1450} + 4x^{1446} + 4x^{1422} + 3x^{1421} + 2x^{1402} + 4x^{1388} + {}\\ &4x^{1384} + 2x^{1377} + 2x^{1367} + 4x^{1358} + 4x^{1357} + 2x^{1346} + 2x^{1336} + x^{1314} + x^{1298} + x^{1294} + 4x^{1258} + {}\\ &3x^{1253} + 3x^{1225} + 2x^{1213} + 4x^{1205} + x^{1179} + 4x^{1169} + 4x^{1155} + 4x^{1139} + x^{1123} + 3x^{1117} + 3x^{1112} + {}\\ &x^{1093} + 3x^{1083} + 4x^{1075} + x^{1073} + 3x^{1063} + x^{1051} + 2x^{1050} + 2x^{1043} + 2x^{1039} + 3x^{1038} + x^{1036} + {}\\ &4x^{1026} + x^{1009} + x^{1006} + 2x^{992} + 4x^{979} + 2x^{973} + 3x^{964} + x^{947} + 4x^{944} + 3x^{928} + 2x^{908} + {}\\ &4x^{906} + 2x^{903} + 2x^{875} + x^{871} + 4x^{803} + 3x^{786} + 4x^{761} + 2x^{741} + 3x^{708} + 2x^{701} + x^{700} + {}\\ &2x^{699} + x^{691} + 3x^{646} + 4x^{637} + x^{624} + 2x^{603} + 4x^{593} + x^{573} + x^{562} + x^{555} + 3x^{554} + {}\\ &2x^{546} + x^{543} + 4x^{507} + 4x^{494} + 4x^{479} + x^{473} + 4x^{440} + 2x^{435} + 3x^{420} + 4x^{418} + 3x^{417} + {}\\ &4x^{393} + 2x^{379} + x^{376} + 4x^{371} + 3x^{369} + 3x^{348} + x^{335} + x^{312} + 2x^{301} + 4x^{296} + 2x^{291} + {}\\ &2x^{284} + 4x^{259} + x^{250} + x^{219} + 3x^{201} + x^{179} + 4x^{176} + 2x^{163} + 2x^{143} + 2x^{133} + x^{112} + {}\\ &2x^{102} + x^{85} + 4x^{73} + x^{59} + 3x^{54} + 3x^{46} + 3x^{38} + x^{16} + 4x^{11} + x^{10} + 2x^{7} + 2. \end{aligned} $$
FLINT 3.4 takes 3.9 seconds to generate the much neater (indeed, minimal) $$x^{3125} + x^{90} + x^{64} + 1.$$ The time to construct $\operatorname{GF}(5^{n})$ for all $1 \le n \le 1000$ dropped from 2238 seconds to 66 seconds; a 34x speedup. Side note: I've started compiling some tables of minimal weight irreducible polynomials at irreducible-polynomials.
The algorithm uses brute force enumeration with various pruning tricks; irreducibility testing and polynomial factorization over $\operatorname{GF}(p)$ has also been optimized in general.
Matrix permanents
The following new functions allow computing matrix permanents:
- fmpz_mat_permanent (integers)
- fmpq_mat_permanent (rationals)
- gr_mat_permanent (generic rings)
FLINT implements cofactor expansion for small $n$ and the Ryser and Glynn formulas with $O(n 2^n)$ complexity for large $n$. We support multithreading (provided that the scalar ring implementation is threadsafe): calling flint_set_num_threads(m) gives a nearly linear speedup with $m$ up to the number of available cores since the problem is embarrassingly parallel.
We illustrate computing the permanent of a $30 \times 30$ matrix over $\mathbb{Z}$:
In [1]: from flint_ctypes import * In [2]: set_num_threads(8) In [3]: N = 30; A = MatZZ([[i * N + j for i in range(N)] for j in range(N)]) In [4]: %time A.permanent() CPU times: user 28.5 s, sys: 619 µs, total: 28.5 s Wall time: 3.66 s Out[4]: 1310667391206712912720700743913139240246485608264856718037097975602698687859610090667558882508800000000000000
A permanent of a $20 \times 20$ matrix over $\mathbb{R}[x]$:
In [5]: N = 20; A = Mat(RRx)([[RR.sin(i+j) + RRx.gen() for i in range(N)] for j in range(N)]) In [6]: %time A.permanent() CPU times: user 6.63 s, sys: 29 µs, total: 6.63 s Wall time: 848 ms Out[6]: [-4.350792542e+12 +/- 9.26e+2] + [-1.6319963e+11 +/- 5.56e+3]*x + [5.1067e+9 +/- 6.08e+4]*x^2 + [2.2492e+10 +/- 5.02e+5]*x^3 + [1.0018e+10 +/- 6.68e+5]*x^4 + [3.4961e+11 +/- 2.22e+6]*x^5 + [7.6786e+11 +/- 7.50e+6]*x^6 + [4.7277e+12 +/- 4.57e+7]*x^7 + [1.99339e+13 +/- 6.00e+7]*x^8 + [4.0192e+13 +/- 5.25e+8]*x^9 + [2.79263e+14 +/- 3.74e+8]*x^10 + [2.4522e+14 +/- 1.28e+9]*x^11 + [2.65720e+15 +/- 4.90e+9]*x^12 + [1.15208e+15 +/- 3.42e+9]*x^13 + [1.921106e+16 +/- 7.82e+9]*x^14 + [4.2737e+15 +/- 1.67e+10]*x^15 + [1.127211e+17 +/- 2.48e+10]*x^16 + [1.22577e+16 +/- 6.57e+10]*x^17 + [5.603898e+17 +/- 2.12e+10]*x^18 + [2.34757e+16 +/- 5.93e+10]*x^19 + [2.432902008e+18 +/- 2.66e+8]*x^20
Example programs
Modular form coefficients
The new example program examples/mfcoefs.c by Pascal Molin demonstrates computing Fourier coefficients of classical modular forms. The algorithm is described in Pascal's new paper Computing Euler products and coefficients of classical modular forms for twisted L-functions. The program includes a few hardcoded examples; to show the simplest one, running
$ build/examples/mfcoefs 11 100 [[-2] [-1] [1] [-2] [1] [4] [-2] [0] [-1] [0] [7] [3] [-8] [-6] [8] [-6] [5] [12] [-7] [-3] [4] [-10] [-6] [15] [-7] ]outputs the coefficients of $$q \prod_{k \ge 1} (1-q^k)^2 (1-q^{11 k})^2 = q - 2q^2 - q^3 + 2q^4 + q^5 + 2q^6 - 2q^7 - \ldots$$ for $q^p$ with prime exponent $p < 100$. The program saves a logarithmic memory factor by only computing the coefficients at prime index; it is described in the paper how to quickly recover the coefficients at non-prime index.
It takes just one second to compute the coefficients for exponents up to $10^7$, and my machine was able to do $10^8$ in 13 seconds, using about 10 GB of RAM:
$ build/examples/mfcoefs --time 11 10000000 euler (x3) cpu = 256 ms wall = 256 ms prod (x1) cpu = 532 ms wall = 532 ms total cpu = 1284 ms wall = 1286 ms $ build/examples/mfcoefs --time 11 100000000 euler (x3) cpu = 3199 ms wall = 3199 ms prod (x1) cpu = 5230 ms wall = 5230 ms total cpu = 12851 ms wall = 12852 ms
On a machine with sufficient memory, Pascal reports computing coefficients up to $10^9$ of various modular forms in a few minutes.
As discussed in the paper, this program is significantly faster than functions available for computing coefficients of modular forms currently available in Pari/GP, SageMath or Magma. The implementation relies heavily on FLINT's fft_small module with a specific setup to work with FFT-friendly primes; this is a trick that could save lots of time in many places throughout FLINT but which is currently unused elsewhere (with one exception: the new double-packing nmod_poly multiplication for tiny moduli discussed above).
There is a possibility that the functionality of this example program could move into the FLINT library in the future, ideally as part of a more systematic framework for working with modular forms and $L$-functions. Let us know if this interests you.
Pi and the AGM
The new pi_agm example program demonstrates how one can implement the Salamin-Brent algorithm to calculate π using FLINT's arbitrary-precision real arithmetic. Here is a sample calculation of π to 107 digits:
$ build/examples/pi_agm 10000000 -threads 8
precision = 33219285 bits...
Iteration 1 / 22: error bound 0.00391
Iteration 2 / 22: error bound 2.98e-8
Iteration 3 / 22: error bound 8.67e-19
Iteration 4 / 22: error bound 1.84e-40
Iteration 5 / 22: error bound 8.24e-84
Iteration 6 / 22: error bound 8.28e-171
Iteration 7 / 22: error bound 4.18e-345
Iteration 8 / 22: error bound 5.34e-694
Iteration 9 / 22: error bound 2.18e-1392
Iteration 10 / 22: error bound 3.62e-2789
Iteration 11 / 22: error bound 5.01e-5583
Iteration 12 / 22: error bound 2.39e-11171
Iteration 13 / 22: error bound 5.46e-22348
Iteration 14 / 22: error bound 1.42e-44701
Iteration 15 / 22: error bound 4.82e-89409
Iteration 16 / 22: error bound 1.38e-178824
Iteration 17 / 22: error bound 1.14e-357655
Iteration 18 / 22: error bound 1.95e-715318
Iteration 19 / 22: error bound 2.82e-1430644
Iteration 20 / 22: error bound 2.97e-2861296
Iteration 21 / 22: error bound 1.64e-5722600
Iteration 22 / 22: error bound 2.51e-11445209
cpu/wall(s): 15.546 2.714
virt/peak/res/peak(MB): 625.99 625.99 116.78 116.78
[3.14159265358979323846{...9999959 digits...}63171948173534895590 +/- 3.20e-10000000]
FLINT's builtin function to compute π uses the Chudnovsky series, which is about 2-3 times faster:
$ build/examples/pi 10000000 -threads 8
precision = 33219285 bits... cpu/wall(s): 5.011 0.879
virt/peak/res/peak(MB): 683.74 960.88 243.44 243.44
[3.14159265358979323846{...9999959 digits...}63171948173534895590 +/- 3.20e-10000000]
Generic QR factorization
The functions gr_mat_qr and gr_mat_lq have been added for computing QR factorizations (or transposed, LQ factorizations) of generic matrices. These functions are mainly intended for matrices with floating-point or arb ball coefficients; they also work over exact fields like qqbar but are not currently optimized for that use case.
The implemented algorithm is Gram-Schmidt orthogonalization with block recursive splitting so that the asymptotic complexity is that of matrix multiplication. Like most other linear algebra in FLINT, block recursion is very efficient in practice and allows for parallelism. For example, it takes 85 seconds for gr_mat_qr to QR-factorize a 1000 by 1000 matrix with 2048-bit arb coefficients and only 24 seconds using 8 threads, whereas naive iterative Gram-Schmidt takes 395 seconds.
There are several open TODOs for improving QR factorization, e.g. implementing the more numerically stable Householder algorithm, supporting complex numbers, allowing rank-deficient matrices, and computing posteriori enclosures in ball arithmetic.
Related to the QR factorization, we have also added gr_mat_is_orthogonal for checking if a matrix is orthogonal (along with variants of this function for checking row and column orthogonality/orthonormality), along with gr_mat_randtest_orthogonal for generating random orthogonal matrices.
Polynomial interpolation
Integers and rationals
Univariate polynomial interpolation over $\mathbb{Z}$ and $\mathbb{Q}$ now use asymptotically fast multimodular algorithms. The fmpz_poly case was done by myself and fmpq_poly by Remi Prebet. The improvement can be dramatic: for example, recovering a length-10240 polynomial with random 100-bit coefficients from its values at $f(0), \ldots, f(10239)$ took 328 seconds with FLINT 3.3 and now takes just 6.1 seconds; a 54x speedup.
In addition, the existing interpolation functions have been augmented to allow error handling and separate functions have been added for faster interpolation when the interpolation problem is known to be exact. Interpolation over $\mathbb{Q}$ now also allows rational coordinates as input which was not previously the case.
Geometric progressions
A new feature in FLINT 3.4 is a set of functions for nmod_poly fast multipoint evaluation and interpolation on geometrically spaced points, implemented by Mael Hostettler based on initial code by Vincent Neiger and Éric Schost. The case of geometric points can be done in $O(M(n))$ whereas fast multipoint evaluation and interpolation on a general set of points costs $O(M(n) \log n)$.
The table below compares timings for evaluation and interpolation on $n$ points modulo a 64-bit prime, using the following functions.
- _nmod_poly_evaluate_nmod_vec_fast_precomp (general)
- _nmod_poly_interpolate_nmod_vec_fast_precomp (general)
- _nmod_poly_evaluate_geometric_nmod_vec_fast_precomp (geometric)
- _nmod_poly_interpolate_geometric_nmod_vec_fast_precomp (geometric)
| General | Geometric | Speedup | |||||||
|---|---|---|---|---|---|---|---|---|---|
| $n$ | Precomp | Evaluate | Interpolate | Precomp | Evaluate | Interpolate | Precomp | Evaluate | Interpolate |
| 10 | 1.25e-06 s | 5.43e-07 s | 4.58e-07 s | 5.48e-07 s | 2.84e-07 s | 2.84e-07 s | 2.28x | 1.91x | 1.61x |
| 100 | 3.13e-05 s | 1.86e-05 s | 1.57e-05 s | 4.61e-06 s | 9.55e-06 s | 7.1e-06 s | 6.79x | 1.95x | 2.21x |
| 1000 | 0.000893 s | 0.000547 s | 0.000298 s | 4.48e-05 s | 6.84e-05 s | 8.24e-05 s | 19.93x | 8.00x | 3.62x |
| 10000 | 0.0161 s | 0.0113 s | 0.00507 s | 0.000676 s | 0.000814 s | 0.00104 s | 23.82x | 13.88x | 4.87x |
| 100000 | 0.251 s | 0.179 s | 0.0705 s | 0.00721 s | 0.0106 s | 0.012 s | 34.81x | 16.89x | 5.87x |
| 1000000 | 3.358 s | 2.572 s | 0.94 s | 0.0758 s | 0.211 s | 0.167 s | 44.30x | 12.19x | 5.63x |
| 10000000 | 48.848 s | 38.246 s | 13.376 s | 0.799 s | 2.877 s | 2.921 s | 61.14x | 13.29x | 4.58x |
Both the general and geometric functions require a precomputation (denoted by "Precomp" in the table). This needs to be done once for a fixed set of points, after which the same precomputed data can be reused for many evaluations or interpolations. The geometric algorithm is not only significantly faster than the general one for both precomputation, evaluation and interpolation; it also use much less memory ($O(n)$ vs $O(n \log n)$).
Generic rings
Polynomial interpolation has also been implemented for gr_poly (quadratic-time using the Newton basis and asymptotically fast using the standard product tree method).
Power series solutions of ODEs
The function gr_mat_gr_poly_solve_lode_newton implemented by Ricardo Buring allows computing power series solutions of linear ODE systems $Y'(t) = A(t)Y(t)$ with rational function coefficients. It uses asymptotically fast Newton iteration.
Bug fixes
As usual, there are lots of major and minor bugfixes in this release. One of the more serious bugs, fixed by Claus Fieker, led fmpz_mat_minpoly to return wrong results for some matrices with large coefficients due to a faulty implementation of the ovals of Cassini method to bound the matrix's spectrum. A different problem in the bounds calculation used by fmpz_mat_minpoly was already fixed in the previous release; it's unfortunate that we didn't spot the second bug at the same time!
Housekeeping
Among other maintenance-level changes, FLINT is now compiled with -Wmissing-prototypes by default. This flag catches two problems:
- functions intended to be public but lacking header file declarations,
- functions intended to be private but lacking the static attribute.
October workshop results
A big chunk of the work that went into FLINT 3.4 was done at the October workshop in Palaiseau (our second workshop this year), with 10 on-site participants and a handful of remote contributors. The Wiki page has an overview of all the topics developed during the workshop.
The workshop week ended with several of us attending (in my case as a member of the jury) the PhD defense of Alexandre Goyer on symbolic-numeric algorithms in differential algebra. Dr. Goyer's code for factoring differential operators is available in the ore_algebra package for SageMath; on many examples, this outperforms the long-standing state of the art, Mark van Hoeij's DEtools[DFactor] in Maple.
A couple more features in FLINT 3.4 done at the workshop but not listed above include:
- New random number generation functions by Dimitri Lesnoff.
- Efficient inversion of nmod vectors by Vincent Neiger.
- Differentiation and integration of gr_mpoly and other utility functions for generic polynomials by Ricardo Buring.
A few other projects were started or continued at this workshop, but not finished:
- Machine-precision code for polynomial root-finding by Guillaume Moroz.
- The n_fft module by Vincent Neiger.
- New implementations of $p$-adic arithmetic by Rubén Muñoz-Bertrand and myself.
- Error bounds for ordinary differential equations by Marc Mezzarobba.
We will hopefully see most of these done in the near future.
I'm hopeful that we can organize at least one or two FLINT workshops next year. Proposals are welcome.
fredrikj.net |
Blog index |
RSS feed |
Follow me on Mastodon |
Become a sponsor