fredrikj.net / blog /

High-precision ball arithmetic

April 5, 2012

About a week ago, I started working on a C library implementing arbitrary-precision real balls, creatively titled Arb. It’s mostly based on FLINT types, using MPFR for testing and for fallback code.

Ball arithmetic is intended to allow efficient, rigorous high-precision numerical evaluation. A real ball is implemented as an interval $[x-r, x+r] \times 2^e$ where $x, e, r$ are integers. This representation supports error propagation while avoiding the ~2x speed and memory overhead of the more conventional endpoint interval representation $[a \times 2^e, b \times 2^f]$.

There are two drawbacks: balls are somewhat less flexible, and they have slightly higher overhead at low precision. At the moment multiplying two arb_t numbers is about 3x slower than multiplying two mpfr_t numbers at 100-400 bits (which would make it somewhat less efficient than MPFI arithmetic). Above 1000 bits, the difference is only a few percent.

The current code is completely naive, and with a bit of low-level hacking it should be possible to get similar performance to a single mpfr_t multiplication at 200-1000 bits. Up to perhaps 256 bits, it would be attractive to have faster, fixed-precision interval types as a complement. (There is actually some prototype code for this hidden somewhere in a branch of my FLINT repository on GitHub…)

I will soon add complex numbers, polynomial balls, and matrix balls (based on exact FLINT polynomials and matrices). In fact, the main reason that I created an arb_t scalar type at all is to simplify interfacing with and writing test code for polynomial balls. But the type can clearly be useful on its own. When the code stabilizes enough, I plan to merge all this back into FLINT.

Part of the design of the arb_t type is that it can hold small integers efficiently, only truncating them to approximate numbers when they grow very large. This is useful for performing binary splitting on infinite series, reducing memory usage (and possibly improving performance) compared to using exact integers throughout.

The arb_t type is still very incomplete (and some functions don’t quite work as they should in all cases), but the module is functional enough that I’ve been able to implement binary splitting of a couple of constants. It’s interesting to see how much of an improvement (if any) ball arithmetic gives over using plain integers, so I’m including some benchmarks below. These functions are not the goal of the Arb library; I plan to do more useful things with it later, but they’re fun to look at as a start.

First out are some timings for π using the Chudnovsky algorithm, compared to the MPFR pi function, Mathematica 7, and the slightly modified version of the gmp-chudnovsky program included in FLINT. All times are measured in seconds.

Digits MMA7 MPFR gmp-chudnovsky Arb
105 0.14 0.17 0.05 0.05
106 2.5 3.81 0.92 1.16
107 42.53 67.5 15.9 22.6
108 1380 252 339

Using ball arithmetic turns out to be pretty much exactly as fast as using plain integers (timings not included), so at least the arb_t is not adding much overhead compared to integers. The gmp-chudnovsky code is around 40% faster because it works with partially factored integers, requiring a lot of extra code (it would be interesting to turn this into a general integer type and find other uses for it) and because it uses heuristic numerical code for the final square root and division. MPFR is slower because it uses the AGM algorithm. Mathematica is probably slower because it relies on an old GMP version.

Timings for Euler’s constant (γ) using the Brent-McMillan binary splitting algorithm, compared to Mathematica, MPFR and the integer (fmpz) binary splitting code in the FLINT arith in FLINT:

Digits MMA7 MPFR fmpz Arb
104 0.18 0.20 0.12 0.07
105 6.42 5.00 3.14 2.36
106 138 174 66.5 51.7

Ball arithmetic gives a small speedup. Memory usage is reduced significantly compared to the fmpz code. I was also able to compute 107 digits, but I lost the output of how long it took (roughly half an hour, I think). MPFR is slower because it uses a different algorithm.

The speedup for the binary splitting stage is actually larger than indicated, because 1/3 of the total time is spent computing log(n) with MPFR, where n is a parameter. One can save time by choosing n to be a power of two (log(2) is fast), but this makes the binary splitting slower when n is much smaller than the next power of two. I don’t know a general way to compute log(n) faster. Perhaps one could round n up to $m 2^e$ where $m \le 7$, and try to compute log(2), log(3), log(5), log(7) quickly. It’s worth pointing out that the binary splitting for power-of-two parameter can be optimized a bit by using bit shifting; this is not implemented yet.

Riemann zeta constants $\zeta(n)$ for integer n, compared to MPFR and the fmpz binary splitting code in the FLINT arith module:

n Digits MMA7 MPFR fmpz Arb
13 104 13.7 0.24 0.58 0.19
13 105 27.3 14.0 6.54
13 106 3639 287 148
43 104 13.7 0.36 2.29 0.50
43 105 43.2 51.6 18.2
43 106 1028 417

Here ball arithmetic gives a nice speedup which increases with larger n. Mathematica and MPFR both use algorithms with complexity somewhere around $\tilde O(b^2)$ where $b$ is the number of digits, whereas the algorithm I implemented has complexity $\tilde O(nb)$. Due to its overhead, it only becomes fast for small n and b in the tens of thousands.

Finally, using a separate hypergeometric series for $\zeta(3)$, this particular constant can be computed even faster. Here ball arithmetic gives a very small speed improvement over using fmpzs:

Digits MMA7 fmpz Arb
105 1.68 0.29 0.24
106 38.9 6.52 5.30
107 772 127 107

(I’m omitting MPFR because it doesn’t have a separate algorithm implemented for this constant.)

Later on, I will add faster code for $\zeta(n)$ where n is large or the precision is small, and code for combined evaluation of $\zeta(2), \zeta(3), \ldots, \zeta(n)$ (used for series expansions of various functions).