fredrikj.net / blog /

# Code rewrite, and Fibonacci numbers

*September 6, 2012*

I’ve finally gotten around to rewriting my FLINT-based library for ball interval arithmetic from scratch (working branch here). It now features a cleaner design that allows it to be extended much more easily, and sufficiently many core functions are in place that it’s becoming possible to write interesting code on top of it without stopping every three lines to work around some limitation by accessing the internal representation. That said, it’s still missing a couple of important features (and tests and documentation), so it’s some way from an alpha release.

There is a now single underlying floating-point type, tentatively named fmpr_t, in which both mantissa and exponent are FLINT fmpz:s. A ball (fmprb_t) is just a pair of fmpr_t:s. This representation has several advantages; for example, exact integer values and (small multiples of) powers of two are represented efficiently, as are any numbers which are kept at a low precision (such as the radii of balls). There are some more aspects that I won’t go into here.

Approaches I tried earlier included making a ball a single structure (with the exponent shared between midpoint and radius), and using separate, specialized types for midpoint and radius. Such representations admit some optimizations, and make certain operations particularly easy to implement, but overall I found them to be too inflexible and requiring too much extra code for doing adjustments and conversions.

The drawback of the current code is that, at very low precision, fmprb_t arithmetic can be perhaps 5-10 times slower than mpz or mpfr arithmetic. This is largely a result of addition, multiplication and rounding in the fmpr_t type being implemented completely naively by cobbling together fmpz functions, and my previous experiments indicate that the overhead can be brought down to a factor 1-2 using mpn level code and some inline assembly. Since the fmpr_t type has precise semantics and very little code now accesses the internal representation, this work should be quite easy now (but I’m going to postpone it until the higher-level features are more developed).

As mentioned above, the fmprb_t type has the nice feature that it computes exactly and efficiently with integers that do not overflow the target precision. When all intermediate values in some computation are known to be integers (or rational numbers, if one explicitly separates numerators and denominators), one can set the precision to “infinite” to use it just like an integer type. At a finite precision, integers much smaller than the target precision require less time and memory to compute with than full-precision floating-point numbers, while larger integers automatically get truncated. The error bound in the output is zero if and only if no truncation has occurred. This allows writing rigorous numerical code which automatically scale to have near-optimal time and space complexity at both very low and very high precision (see the previous post on high-precision ball arithmetic for the application of computing constants to high precision).

To illustrate this, I’ve added some code for computing Fibonacci numbers using the binary powering algorithm of D. Takahashi. Here is how it compares to the MPIR builtin (mpz_fib_ui) and an fmprb_t implementation of Binet’s (de Moivre’s) formula, for computing the 10^{9}:th Fibonacci number:

Algorithm | Bits | Value | Time |

Takahashi | 53 | 7.95231787455468e+208987639 +/- 1.2751e+208987624 | 88 μs |

Takahashi | 10^{6} |
7.95231787455468e+208987639 +/- 1.1601e+208686610 | 150 ms |

Takahashi | ∞ | 7.95231787455468e+208987639 +/- 0 | 18 s |

MPIR | ∞ | 7.95231787455468e+208987639 +/- 0 | 18 s |

Binet | 53 | 7.95231787455468e+208987639 +/- 1.2964e+208987624 | 27 μs |

Binet | 10^{6} |
7.95231787455468e+208987639 +/- 2.104e+208686610 | 400 ms |

Binet | log_{2}(F(10^{9})) + 10 |
7.95231787455468e+208987639 +/- 0.00099284 | 690 s |

For the exact computation, we see that the fmprb_t implementation of Takahashi’s algorithm is just as fast as the highly optimized code in MPIR. A rough approximation (53 bits) or a computation of a large number of leading digits (10^{6} bits), together with a rigorous error bound, takes much less time than the exact computation. (Note: the radii given in the table are valid for the internal representation of the result, and ignore the additional error from rounding the midpoint to 15 digits for printing.)

Takahashi’s formula builds the n:th Fibonacci number from smaller integers in a way thtat gives a total complexity of O(M(n)), while Binet’s formula does O(log(n)) multiplications of numbers which all have full precision, giving a complexity of O(M(n) log(n)). Binet’s formula is a bit faster at very low precision, however, due to performing fewer arithmetic operations.

At 53-bit precision, dividing the timings by the operation count shows that the cost of fmprb_t arithmetic is almost half a microsecond per arithmetic operation. This is not particularly impressive (although much faster than e.g. mpmath arithmetic — despite the fact that mpmath does not track errors), and as mentioned above, can be made much better with some low-level optimization.

Note that, with Binet’s formula, we need a bit of extra work to compute Fibonacci numbers exactly. We must first bound the magnitude of the Fibonacci number, and then compute at a slightly higher precision. Since the error with log_{2}(F(10^{9})) + 10 bits of precision turns out to be bounded by 0.00099284, we are sure that rounding the midpoint to the nearest integer gives the correct result.

For reference, here is the code implementing Binet’s formula:

void fmprb_fib_ui_binet(fmprb_t x, ulong n, long prec) { long wp = prec + FLINT_BIT_COUNT(n); fmprb_t phi, sqrt5; fmprb_init(phi); fmprb_init(sqrt5); fmprb_sqrt_ui(sqrt5, 5, wp); fmprb_add_ui(phi, sqrt5, 1, wp); fmprb_mul_2exp_si(phi, phi, -1); fmprb_pow_ui(phi, phi, n, wp); fmprb_div(x, phi, sqrt5, wp); /* (1-phi)^n / sqrt(5) < 2^(-floor(n/2)) */ fmprb_add_error_2exp_si(x, -(long) n / 2); fmprb_clear(phi); fmprb_clear(sqrt5); }