Last updated: 2012-09-29 01:27:03 CET
Arb is an experimental C library implementing arbitrary-precision floating-point ball arithmetic, written by Fredrik Johansson <fredrik.johansson@gmail.com>. The git repository is https://github.com/fredrik-johansson/arb/
Ball arithmetic, also known as mid-rad interval arithmetic, is an extension of floating-point arithmetic in which an error bound is attached to each variable. This allows doing rigorous computations over the real numbers, while avoiding the overhead of traditional (inf-sup) interval arithmetic at high precision, and eliminating much of the need for time-consuming and bug-prone manual error analysis.
At the moment, Arb contains:
Planned features include: transcendental functions, more extensive polynomial and matrix functionality, and complex balls (and polynomials and matrices thereof).
Arb uses MPIR and FLINT for the underlying integer arithmetic. The code conventions borrow from FLINT, and the project might get merged back into FLINT when the code stabilizes in the future. Arb also uses MPFR for some fallback code and for testing purposes.
The current version of Arb implements most of its floating-point arithmetic naively using high-level FLINT types. The speed at low precision is far from optimal, and the memory management can sometimes be wasteful. The internals will be rewritten in the future to fix the inefficiencies, which eventually should make Arb ball arithmetic about as fast as mpz or mpfr arithmetic at any precision.
Warning: as this is an early version, any part of the interface is subject to change! Also be aware that there are known and unknown bugs.
./configure <options> make make check make install
Arb requires recent versions of MPIR, MPFR and FLINT. Currently a source checkout of FLINT from https://github.com/fredrik-johansson/flint2 is required. If MPIR, MPFR or FLINT is installed in some other location than the default path /usr/local, pass the flag --with-mpir=... --with-mpfr=... or --with-flint=... with the correct path to configure (type ./configure --help to show more options).
Here is a simple sample program to get started using Arb:
#include "fmprb.h" int main() { fmprb_t x; fmprb_init(x); fmprb_const_pi(x, 50 * 3.33); fmprb_printd(x, 50); printf("\n"); fmprb_clear(x); }The output should be something like the following:
3.1415926535897932384626433832795028841971693993751 +/- 4.2764e-50
A variable of type fmpr_t holds an arbitrary-precision binary floating-point number, i.e. a rational number of the form $x \times 2^y$ where $x, y \in \mathbb{Z}$ and $x$ is odd; or one of the special values zero, plus infinity, minus infinity, or NaN (not-a-number).
The component $x$ is called the mantissa, and $y$ is called the exponent. Note that this is just one among many possible conventions: the mantissa (alternatively significand) is sometimes viewed as a fraction in the interval $[1/2, 1)$, with the exponent pointing to the position above the top bit rather than the position of the bottom bit, and with a separate sign.
The conventions for special values largely follow those of the IEEE floating-point standard. At the moment, there is no support for negative zero, unsigned infinity, or a NaN with a payload, though some these might be added in the future.
An fmpr number is exact and has no inherent "accuracy". We use the term precision to denote either the target precision of an operation, or the bit size of a mantissa (which in general is unrelated to the "accuracy" of the number: for example, the floating-point value 1 has a precision of 1 bit in this sense and is simultaneously an infinitely accurate approximation of the integer 1 and a 2-bit accurate approximation of $\sqrt 2 = 1.011010100\ldots_2$).
Except where otherwise noted, the output of an operation is the floating-point number obtained by taking the inputs as exact numbers, in principle carrying out the operation exactly, and rounding the resulting real number to the nearest representable floating-point number whose mantissa has at most the specified number of bits, in the specified direction of rounding. Some operations are always or optionally done exactly.
An fmpr_struct holds a mantissa and an exponent. If the mantissa and exponent are sufficiently small, their values are stored as immediate values in the fmpr_struct; large values are represented by pointers to heap-allocated arbitrary-precision integers. Currently, both the mantissa and exponent are implemented using the FLINT fmpz type. Special values are currently encoded by the mantissa being set to zero.
An fmpr_t is defined as an array of length one of type fmpr_struct, permitting an fmpr_t to be passed by reference.
Specifies the rounding mode for the result of an approximate operation.
Specifies that the result of an operation should be rounded to the nearest representable number, rounding to an odd mantissa if there is a tie between two values. Note: the code for this rounding mode is currently not implemented.
Specifies that the result of an operation should be rounded to the nearest representable number in the direction towards zero.
Specifies that the result of an operation should be rounded to the nearest representable number in the direction away from zero.
Specifies that the result of an operation should be rounded to the nearest representable number in the direction towards minus infinity.
Specifies that the result of an operation should be rounded to the nearest representable number in the direction towards plus infinity.
If passed as the precision parameter to a function, indicates that no rounding is to be performed. This must only be used when it is known that the result of the operation can be represented exactly and fits in memory (the typical use case is working with values small integers). Note that, for example, adding two numbers whose exponents are far apart can easily produce an exact result that is far too large to store in memory.
Initializes the variable x for use. Its value is set to zero.
Clears the variable x, freeing or recycling its allocated memory.
Sets x respectively to 0, 1, $+\infty$, $-\infty$, NaN.
Returns nonzero iff x respectively equals 0, 1, $+\infty$, $-\infty$, NaN.
Returns nonzero iff x equals either $+\infty$ or $-\infty$.
Returns nonzero iff x is a finite, nonzero floating-point value, i.e. not one of the special values 0, $+\infty$, $-\infty$, NaN.
Returns nonzero iff x is one of the special values 0, $+\infty$, $-\infty$, NaN, i.e. not a finite, nonzero floating-point value.
Rounds the mantissa and exponent in-place.
Sets y to a copy of x.
Sets y to a copy of x rounded in the direction specified by rnd to the number of bits specified by prec.
Given the return value rret and output variable result from a function performing a rounding (e.g. fmpr_set_round or fmpr_add), sets err to a bound for the absolute error.
Like fmpr_set_error_result, but adds err_in to the error.
Returns nonzero iff x and y are exactly equal. This function does not treat NaN specially, i.e. NaN compares as equal to itself.
Returns negative, zero, or positive, depending on whether x is respectively smaller, equal, or greater compared to y. Comparison with NaN is undefined.
Compares the absolute values of x and y.
Returns $-1$, $0$ or $+1$ according to the sign of x. The sign of NaN is undefined.
Generates a finite random number whose mantissa has precision at most bits and whose exponent has at most mag_bits bits. The values are distributed non-uniformly: special bit patterns are generated with high probability in order to allow the test code to exercise corner cases.
Identical to fmpr_randtest, except that zero is never produced as an output.
Indentical to fmpr_randtest, except that the output occasionally is set to an infinity or NaN.
Sets the MPFR variable x to the value of y. If the precision of x is too small to allow y to be represented exactly, it is rounded in the specified MPFR rounding mode. The return value indicates the direction of rounding, following the standard convention of the MPFR library.
Sets x to the exact value of the MPFR variable y.
Sets x exactly to the integer c.
Sets y to the exact value of x. The result is undefined if x is not a finite fraction.
Sets x to the value of y, rounded according to prec and rnd.
Sets x to $\mathrm{man} \times 2^{\mathrm{exp}}$.
Sets man and exp to the unique integers such that $x = \mathrm{man} \times 2^{\mathrm{exp}}$ and man is odd, provided that x is a nonzero finite fraction. If x is zero, both man and exp are set to zero. If x is infinite or NaN, the result is undefined.
Converts x to a mantissa with predetermined exponent, i.e. computes an integer y such that $y \times 2^e \approx x$, truncating if necessary. Returns 0 if exact and 1 if truncation occurred.
Prints the mantissa and exponent of x as integers, precisely showing the internal representation.
Prints x as a decimal floating-point number, rounding to the specified number of digits. This function is currently implemented using MPFR, and does not support large exponents.
Sets y to the negation of x.
Sets y to the negation of x, rounding the result.
Sets y to the absolute value of x.
Sets $z = x + y$, rounded according to prec and rnd. The precision can be FMPR_PREC_EXACT to perform an exact addition, provided that the result fits in memory.
Sets z to the value that results by adding an infinitesimal quantity of the given sign to x, and rounding. The result is undefined if x is zero.
Sets $z = x - y$, rounded according to prec and rnd. The precision can be FMPR_PREC_EXACT to perform an exact addition, provided that the result fits in memory.
Sets $z = x \times y$, rounded according to prec and rnd. The precision can be FMPR_PREC_EXACT to perform an exact multiplication, provided that the result fits in memory.
Sets y to x multiplied by $2^e$ without rounding.
Sets $z = x / y$, rounded according to prec and rnd. If $y$ is zero, $z$ is set to NaN.
Sets $z = z + x \times y$, rounded according to prec and rnd. The intermediate multiplication is always performed without roundoff. The precision can be FMPR_PREC_EXACT to perform an exact addition, provided that the result fits in memory.
Sets $z = z - x \times y$, rounded according to prec and rnd. The intermediate multiplication is always performed without roundoff. The precision can be FMPR_PREC_EXACT to perform an exact subtraction, provided that the result fits in memory.
Sets $z$ to the square root of $x$, rounded according to prec and rnd. The result is NaN if $x$ is negative.
Sets $y = b^e$, computed using without guaranteeing correct (optimal) rounding, but guaranteeing that the result is a correct upper or lower bound if the rounding is directional. Currently requires $b \ge 0$.
Sets $z$ to $\log(x)$, rounded according to prec and rnd. The result is NaN if $x$ is negative. This function is currently implemented using MPFR and does not support large exponents.
Sets $z$ to $\log(1+x)$, rounded according to prec and rnd. This function computes an accurate value when $x$ is small. The result is NaN if $1+x$ is negative. This function is currently implemented using MPFR and does not support large exponents.
Sets $z$ to $\exp(x)$, rounded according to prec and rnd. This function is currently implemented using MPFR and does not support large exponents.
Sets $z$ to $\exp(x)-1$, rounded according to prec and rnd. This function computes an accurate value when $x$ is small. This function is currently implemented using MPFR and does not support large exponents.
An fmprb_t represents a ball over the real numbers, that is, an interval $[m-r, m+r]$ where the midpoint $m$ and the radius $r$ are (extended) real numbers and $r$ is nonnegative. The result of an (approximate) operation done on fmprb_t variables is a ball which contains the result of the (mathematically exact) operation applied to any choice of points in the input balls. In general, the output ball is not the smallest possible.
The precision parameter passed to each function roughly indicates the precision to which calculations on the midpoint are carried out (operations on the radius are always done using a fixed, small precision.)
For arithmetic operations, the precision parameter currently simply specifies the precision of the corresponding fmpr operation. In the future, the arithmetic might be made faster by incorporating sloppy rounding (typically equivalent to a loss of 1-2 bits of effective working precision) when the result is known to be inexact (while still propagating errors rigorously, of course). Arithmetic operations done on exact input with exactly representable output are always guaranteed to produce exact output.
For more complex operations, the precision parameter indicates a minimum working precision (algorithms might allocate extra internal precision to attempt to produce an output accurate to the requested number of bits, especially when the required precision can be estimated easily, but this is not generally required).
If the precision is increased and the inputs either are exact or are computed with increased accuracy as well, the output should converge proportionally, absent any bugs. The general intended strategy for using ball arithmetic is to add a few guard bits, and then repeat the calculation as necessary with an exponentially increasing number of guard bits (Ziv's strategy) until the result is exact enough for one's purposes (typically the first attempt will be successful). There are some caveats: in general, ball arithmetic only makes sense for (Lipschitz) continuous functions, and trying to approximate functions close to singularities might result in slow convergence, or failure to converge.
Warning: some methods for transcendental functions and constants currently perform the error propagation in a non-rigorous way, due to the implementation being incomplete (in some cases, a rigorous error bound for the algorithm or function might not be known at all). This should be indicated in the documentation for each function.
An fmprb_struct consists of a pair of fmpr_struct:s. An fmprb_t is defined as an array of length one of type fmprb_struct, permitting an fmprb_t to be passed by reference.
The precision used for operations on the radius. This is small enough to fit in a single word, currently 30 bits.
Macro returning a pointer to the midpoint of x as an fmpr_t.
Macro returning a pointer to the radius of x as an fmpr_t.
Initializes the variable x for use. Its midpoint and radius are both set to zero.
Clears the variable x, freeing or recycling its allocated memory.
Returns a pointer to an array of n initialized fmprb_struct:s.
Clears an array of n initialized fmprb_struct:s.
Returns nonzero iff the radius of x is zero.
Returns nonzero iff x and y are equal as balls, i.e. have both the same midpoint and radius.
Sets x to zero.
Returns nonzero iff the midpoint and radius of x are both zero.
Sets y to a copy of x.
Sets y to a copy of x, rounded to prec bits.
Sets y to the negation of x.
Sets y to the absolute value of x. No attempt is made to improve the interval represented by x if it contains zero.
Sets y exactly to x.
Sets y to the rational number x, rounded to prec bits.
Returns nonzero iff x is exactly 1.
Sets x to the exact integer 1.
Prints the internal representation of x.
Prints x in decimal. The printed value of the radius is not adjusted to compensate for the fact that the binary-to-decimal conversion of both the midpoint and the radius introduces additional error.
Generates a random ball. The midpoint and radius will both be finite.
Sets q to a random rational number from the interval represented by x. A denominator is chosen by multiplying the binary denominator of x by a random integer up to size bits.
The outcome is undefined if the midpoint or radius of x is non-finite, or if the exponent of the midpoint or radius is so large or small that representing the endpoints as exact rational numbers would cause overflows.
Adds err, which is assumed to be nonnegative, to the radius of x.
Adds $2^e$ to the radius of x.
Adds the supremum of err, which is assumed to be nonnegative, to the radius of x.
Returns nonzero iff the given number is contained in the interval represented by x.
Computes the exact interval represented by x, in the form of an integer interval multiplied by a power of two, i.e. $x = [a, b] \times 2^{\mathrm{exp}}$.
The outcome is undefined if the midpoint or radius of x is non-finite, or if the difference in magnitude between the midpoint and radius is so large that representing the endpoints exactly would cause overflows.
Returns the effective relative error of self measured in bits, defined as the difference between the position of the top bit in the radius and the top bit in the midpoint, plus one. The result is clamped between plus/minus FMPR_PREC_EXACT.
Returns the effective relative accuracy of x measured in bits, equal to the negative of the return value from fmprb_rel_error_bits.
Sets $z = x + y$, rounded to prec bits. The precision can be FMPR_PREC_EXACT provided that the result fits in memory.
Sets $z = x - y$, rounded to prec bits. The precision can be FMPR_PREC_EXACT provided that the result fits in memory.
Sets $z = x \times y$, rounded to prec bits. The precision can be FMPR_PREC_EXACT provided that the result fits in memory.
Sets $y$ to $x$ multiplied by $2^e$.
Sets $z = x / y$, rounded to prec bits. If $y$ contains zero, $z$ is set to $0 \pm \infty$. Otherwise, error propagation uses the rule $$\left| \frac{x}{y} - \frac{x+\xi_1 a}{y+\xi_2 b} \right| = \left|\frac{x \xi_2 b - y \xi_1 a}{y (y+\xi_2 b)}\right| \le \frac{|xb|+|ya|}{|y| (|y|-b)}$$ where $-1 \le \xi_1, \xi_2 \le 1$, and where the triangle inequality has been applied to the numerator and the reverse triangle inequality has been applied to the denominator.
Sets $y = x / (2^n - 1)$, rounded to prec bits.
Sets $z = z + x \times y$, rounded to prec bits. The precision can be FMPR_PREC_EXACT provided that the result fits in memory.
Sets $z = z - x \times y$, rounded to prec bits. The precision can be FMPR_PREC_EXACT provided that the result fits in memory.
Sets $z$ to the square root of $x$, rounded to prec bits. Error propagation is done using the following rule: assuming $m > r \ge 0$, the error is largest at $m - r$, and we have $\sqrt{m} - \sqrt{m-r} \le r / (2 \sqrt{m-r})$.
Sets $y = b^e$ using binary exponentiation. Provided that $b$ and $e$ are small enough and the exponent is positive, the exact power can be computed using FMPR_PREC_EXACT.
Sets $z = \log(x)$. Error propagation is done using the following rule: assuming $m > r \ge 0$, the error is largest at $m - r$, and we have $\log(m) - \log(m-r) = \log(1 + r/(m-r))$. The last expression is calculated accurately for small radii via fmpr_log1p. An input containing zero currently raises an exception.
Sets $z = \exp(x)$. Error propagation is done using the following rule: the error is largest at $m + r$, and we have $\exp(m+r) - \exp(m) = \exp(m) (\exp(r)-1)$. The last expression is calculated accurately for small radii via fmpr_expm1.
Sets $s = \sin x$, $c = \cos x$. Error propagation uses the rule $|\sin(m \pm r) - \sin(m)| \le r$ (this could be tightened to $\min(r,2)$).
Sets $z = \tan^{-1} x$. Letting $d = \max(0, |m| - r)$, the propagated error is bounded by $r / (1 + d^2)$ (this could be tightened).
Sets x to n factorial, computed using binary splitting. Provided that $n$ is small enough, the exact factorial can be computed using FMPR_PREC_EXACT.
Sets x to the rising factorial $x (x+1) (x+2) \cdots (x+n-1)$, computed using binary splitting. The basecase processes eight factors at a time using the formula $x(x+1)\cdots(x+7) = (28 + 98x + 63x^2 + 14x^3 + x^4)^2 - 16 (7+2x)^2$, replacing 7 full-precision multiplications with 4 squarings and 1 multiplication [1]. Empirically, this is about twice as fast at high precision. Numerical stability is slightly worse.
[1] R. Crandall and C. Pomerance, "Prime Numbers: A Computational Perspective", second edition, Springer (2005), p. 316.
Sets x to the rising factorial $x (x+1) (x+2) \cdots (x+n-1)$, computed using fast multipoint evaluation. This only requires $O(n^{1/2+\varepsilon})$ multiplications, but has high overhead and poor numerical stability (adding $O(n)$ guard bits to the input might be necessary to achieve full accuracy). It can be expected to be faster than the binary splitting algorithm if the input is a full-precision number, the precision is at least 100000 bits, and $n$ is of the same order of magnitude as (perhaps slightly smaller than) the number of bits.
Sets x to the binomial coefficient ${n \choose k}$, computed using binary splitting. Provided that $n$ and $k$ are small enough, an exact binomial coefficient can be computed using FMPR_PREC_EXACT.
Sets x to the Fibonacci number $F_n$. Uses the binary squaring algorithm described in D. Takahashi, "A fast algorithm for computing large Fibonacci numbers", Information Processing Letters 75 (2000) 243-246 Provided that $n$ is small enough, an exact Fibonacci number can be computed using FMPR_PREC_EXACT.
Sets x to $\pi$, computed using the Chudnovsky algorithm. Letting $A = 13591409$, $B = 545140134$, $C = 640320$, we have $\pi \approx 1 / s_N$ where $$s_N = 12 \sum_{k=0}^N \frac{(-1)^k (6k)! (A+Bk)} {(3k)! (k!)^3 C^{3k+3/2}}$$
The implementation computes an approximation for the algebraic number $1/s_N$ using binary splitting, bounding the rounding error automatically. The hypergeometric term ratio is asymptotically $R = C^3 / (2^6 \times 3^3) \approx 1.5 \times 10^{14}$, and in fact we have $|\pi - 1/s_N| < 1/R^N$ (with a more detailed calculation, the truncation error could be bounded closer to $1/R^{N+1}$).
Sets x to $\pi$. The value is cached for repeated use.
Sets x to $\log \sqrt{2 \pi}$. The value is cached for repeated use.
Sets x to Euler's constant $\gamma$, computed using the second Bessel function formula of Brent and McMillan. Brent and McMillan conjectured that the error depending on the internal parameter $n$ is of order $O(e^{-8n})$. Brent has recently proved that this bound is correct, but without determining an explicit big-O factor.
[1] R. P. Brent and E. M. McMillan, "Some new algorithms for high-precision computation of Euler's constant", Mathematics of Computation 34 (1980), 305-312.
[2] R. P. Brent, "Ramanujan and Euler's Constant", http://wwwmaths.anu.edu.au/~brent/pd/Euler_CARMA_10.pdf
[3] The MPFR team (2012), "MPFR Algorithms", http://www.mpfr.org/algo.html
Sets x to Apery's constant $\zeta(3)$, computed by applying binary splitting to a hypergeometric series.
Assuming $s \ge 2$, approximates $\zeta(s)$ by $1 + 2^{-s}$ along with a correct error bound. We use the following bounds: for $s > b$, $\zeta(s) - 1 < 2^{-b}$, and generally, $\zeta(s) - (1 + 2^{-s}) < 2^{2-\lfloor 3 s/2 \rfloor}$.
Computes $\zeta(s)$ using the Euler product. This is fast only if s is large compared to the precision.
Writing $P(a,b) = \prod_{a \le p \le b} (1 - p^{-s})$, we have $1/\zeta(s) = P(a,M) P(M+1,\infty)$.
To bound the error caused by truncating the product at $M$, we write $P(M+1,\infty) = 1 - \epsilon(s,M)$. Since $0 < P(a,M) \le 1$, the absolute error for $\zeta(s)$ is bounded by $\epsilon(s,M)$.
According to the analysis in [1], it holds for all $s \ge 6$ and $M \ge 1$ that $1/P(M+1,\infty) - 1 \le f(s,M) \equiv 2 M^{1-s} / (s/2 - 1)$. Thus, we have $1/(1-\epsilon(s,M)) - 1 \le f(s,M)$, and expanding the geometric series allows us to conclude that $\epsilon(M) \le f(s,M)$.
[1] S. Fillebrown, "Faster Computation of Bernoulli Numbers", Journal of Algorithms 13, 431-445 (1992)
Computes $\zeta(n)$ for even $n$ via the corresponding Bernoulli number, which is generated using FLINT.
Evaluates $\zeta(s)$ at $\mathrm{num}$ consecutive integers $s$ beginning with $\mathrm{start}$ and proceeding in increments of $\mathrm{step}$. Uses Borwein's formula [1], implemented to support fast multi-evaluation (but also works well for a single $s$).
Requires $\mathrm{start} \ge 2$. For efficiency, the largest $s$ should be at most about as large as $\mathrm{prec}$. Arguments approaching LONG_MAX will cause overflows. One should therefore only use this function for s up to about prec, and then switch to the Euler product.
The algorithm for single $s$ is basically identical to the one used in MPFR (see [2] for a detailed description). In particular, we evaluate the sum backwards to avoid storing more than one $d_k$ coefficient, and use integer arithmetic throughout since it is convenient and the terms turn out to be slightly larger than $2^\mathrm{prec}$. The only numerical error in the main loop comes from the division by $k^s$, which adds less than 1 unit of error per term. For fast multi-evaluation, we repeatedly divide by $k^{\mathrm{step}}$. Each division reduces the input error and adds at most 1 unit of additional rounding error, so by induction, the error per term is always smaller than 2 units.
References:
[1] P. Borwein, "An Efficient Algorithm for the Riemann Zeta Function", Constructive experimental and nonlinear analysis, CMS Conference Proc. 27 (2000), 29-34 http://www.cecm.sfu.ca/personal/pborwein/PAPERS/P155.pdf
[2] The MPFR team (2012), "MPFR Algorithms", http://www.mpfr.org/algo.html
[3] X. Gourdon and P. Sebah (2003), "Numerical evaluation of the Riemann Zeta-function" http://numbers.computation.free.fr/Constants/Miscellaneous/zetaevaluations.pdf
Computes $\zeta(s)$ for arbitrary $s \ge 2$ using a binary splitting implementation of Borwein's formula. The algorithm has quasilinear complexity with respect to the precision.
Computes $\zeta(s)$ for nonnegative integer $s \ne 1$, automatically choosing an appropriate algorithm.
Computes $\zeta(s)$ at num consecutive integers (respectively num even or num odd integers) beginning with $s = \mathrm{start} \ge 2$, automatically choosing an appropriate algorithm.
Uses Karatsuba's algorithm [1] to compute num coefficients in the Taylor series of $\Gamma(a+x)$ for rational $0 < a ≤ 1$, i.e. computes $\Gamma(a), \Gamma'(a) ... \Gamma^{(\mathrm{num}-1)}(a) / (\mathrm{num}-1)!$ This algorithm is most efficient at high precision, for num much smaller than the number of bits, and with small denominators of $a$. In particular, with num = 1, this algorithm computes $\Gamma(a)$ efficiently for small rational $a$.
Let $s = \max(2, \mathrm{num}-1)$. With parameters $r$ and $n$ chosen such that $r \ge n$ and $n \ge 2 s \log 2 s$, Karatsuba shows that $$\Gamma^{(j)}(a) = \sum_{k=0}^r \frac{(-1)^k}{k!} \frac{n^{k+a}}{k+a} \sum_{m=0}^j (-1)^m \frac{j! \, \log^{j-m} n}{(j-m)! (k+a)^m} + \theta_j$$
where $$|\theta_j| \le \frac{5}{3} e^{-n} \log^s n + \left(\frac{e}{r+2}\right)^{r+2} (1 + n^{r+2} \log^s n).$$
We choose the parameters $n$ and $r$ heuristically to be nearly optimal, and then evaluate the above formula to bound $\theta_j$ rigorously.
Karatsuba claims that choosing $r \ge 3n$ gives $|\theta_j| \le 2^{-n-1}$. This is, unfortunately, incorrect. Setting $r = n \alpha$ and expanding the error term around $n = \infty$, one finds that $\alpha$ asymptotically should be $1/W(1/e) \approx 3.59112147666862$ where $W(x)$ is the Lambert W-function. We also optimize the selection of $n$ by choosing $n \approx b \log 2$ where $b$ is the desired number of bits, rather than $n \approx b$, and round $n$ so that it has a short binary expansion (this gives smaller numbers in the binary splitting stage).
Finally, if $s$ is small, we perform binary splitting to a working precision of about $2.2$ times the target precision rather than exactly. This factor was tested to give full accuracy up to at least one million digits when $s \approx 1$. A more careful analysis should be done here so that a working precision is selected which always is sufficient and also nearly optimal.
[1] E. A. Karatsuba, "Fast evaluation of the Hurwitz zeta function and Dirichlet L-series", Problems of Information Transmission, Vol. 34, No. 4, 1998.
Sets $y = \log \Gamma(x)$, assuming that $x > 0$.
For large $x$, uses Stirling's expansion $$\log \Gamma(x) = \left(x-\frac{1}{2}\right)\log x - x + \frac{\ln {2 \pi}}{2} + \sum_{k=1}^{n-1} t_k + R(n,x) $$ where $$t_k = \frac{B_{2k}}{2k(2k-1)x^{2k-1}}$$ and $|R(n,x)| < t_n$.
If $x$ is too small for the asymptotic expansion to give sufficient accuracy directly, we translate to $x + r$ using the formula $\log \Gamma(x) = \log \Gamma(x+r) - \log(x (x+1) (x+2) \cdots (x+r-1))$. To obtain a remainder smaller than $2^{-b}$, we must choose an $r$ such that $x + r > \beta b$, where $\beta > \log(2) / (2 \pi) \approx 0.11$. We use a slightly larger factor $\beta \approx 0.2$ to more closely balance $n$ and $r$. A much larger $\beta$ (e.g. $\beta = 1$) could be used to reduce the number of Bernoulli numbers that have to be precomputed, at the expense of slower repeated evaluation.
An fmprb_poly_t represents a polynomial over the real numbers, implemented as an array of coefficients of type fmprb_struct.
Most functions are provided in two versions: an underscore method which operates directly on pre-allocated arrays of coefficients and generally has some restrictions (such as requiring the lengths to be nonzero and not supporting aliasing of the input and output arrays), and a non-underscore method which performs automatic memory management and handles degenerate cases.
Contains a pointer to an array of coefficients (coeffs), the used length (length), and the allocated size of the array (alloc).
An fmprb_poly_t is defined as an array of length one of type fmprb_poly_struct, permitting an fmprb_poly_t to be passed by reference.
Initializes the polynomial for use, setting it to the zero polynomial.
Clears the polynomial, deallocating all coefficients and the coefficient array.
Makes sures that the coefficient array of the polynomial contains at least len initialized coefficients.
Directly changes the length of the polynomial, without allocating or deallocating coefficients. The value shold not exceed the allocation length.
Strips any trailing coefficients which are identical to zero. The output
Sets poly to the constant 0 respectively 1.
Sets poly to src.
Prints the polynomial as an array of coefficients, printing each coefficient using fmprb_printd.
Returns nonzero iff poly1 contains poly2.
Returns nonzero iff A and B are equal as polynomial balls, i.e. all coefficients have equal midpoint and radius.
Sets {C, max(lenA, lenB)} to the sum of {A, lenA} and {B, lenB}. Allows aliasing of the input and output operands.
Sets C to the sum of A and B.
Sets {C, n} to the product of {A, lenA} and {B, lenB}, truncated to length n. The output is not allowed to be aliased with either of the inputs. We require lenA ≥ lenB > 0, n > 0, lenA + lenB - 1 ≥ n.
As currently implemented, this function puts each input polynomial on a common exponent, truncates to prec bits, and multiplies exactly over the integers. The output error is computed by cross-multiplying the max norms.
Sets C to the product of A and B, truncated to length n.
Sets {C, n} to the product of {A, lenA} and {B, lenB}, truncated to length n. The output is not allowed to be aliased with either of the inputs. We require lenA ≥ lenB > 0, n > 0. This function currently calls _fmprb_poly_mullow.
Sets C to the product of A and B.
Sets {Qinv, len} to the power series inverse of {Q, len}. Uses Newton iteration.
Sets Qinv to the power series inverse of Q.
Performs polynomial division with remainder, computing a quotient $Q$ and a remainder $R$ such that $A = BQ + R$. The leading coefficient of $B$ must not contain zero. The implementation reverses the inputs and performs power series division.
Divides $A$ by the polynomial $x - c$, computing the quotient $Q$ as well as the remainder $R = f(c)$.
Generates the polynomial $(x-x_0)(x-x_1)\cdots(x-x_{n-1})$.
Returns an initialized data structured capable of representing a remainder tree (product tree) of the given number of roots.
Deallocates a tree structure as allocated using _fmprb_poly_tree_alloc.
Constructs a product tree from a given array of roots. The tree structure must be pre-allocated to the specified length using _fmprb_poly_tree_alloc.
Sets res to the composition $h(x) = f(g(x))$ where $f$ is given by poly1 and $g$ is given by poly2, respectively using Horner's rule, divide-and-conquer, and an automatic choice between the two algorithms. The underscore methods do not support aliasing of the output with either input polynomial.
Sets res to the power series composition $h(x) = f(g(x))$ truncated to order $O(x^n)$ where $f$ is given by poly1 and $g$ is given by poly2, respectively using Horner's rule, the Brent-Kung baby step-giant step algorithm, and an automatic choice between the two algorithms. We require that the constant term in $g(x)$ is exactly zero. The underscore methods do not support aliasing of the output with either input polynomial.
Sets res to $f(a)$, evaluated using Horner's rule.
Evaluates the polynomial simultaneously at n given points, calling _fmprb_poly_evaluate repeatedly.
Evaluates the polynomial simultaneously at n given points, using fast multipoint evaluation.
Recovers the unique polynomial of length at most n that interpolates the given x and y values. This implementation first interpolates in the Newton basis and then converts back to the monomial basis.
Recovers the unique polynomial of length at most n that interpolates the given x and y values. This implementation uses the barycentric form of Lagrange interpolation.
Recovers the unique polynomial of length at most n that interpolates the given x and y values, using fast Lagrange interpolation. The precomp function takes a precomputed product tree over the x values and a vector of interpolation weights as additional inputs.
Sets {res, len - 1} to the derivative of {poly, len}. Allows aliasing of the input and output.
Sets res to the derivative of poly.
Sets {res, len} to the integral of {poly, len - 1}. Allows aliasing of the input and output.
Sets res to the integral of poly.
Sets $f$ to the power series logarithm of $h$, truncated to length $n$. Uses the formula $\log f = \int f' / f$, adding the logarithm of the constant term in $h$ as the constant of integration. The underscore method does not support aliasing of the input and output arrays.
Sets $f$ to the power series exponential of $h$, truncated to length $n$.
The basecase version uses a simple recurrence for the coefficients, requiring $O(nm)$ operations where $m$ is the length of $h$.
The main implementation uses Newton iteration, starting from a small number of terms given by the basecase algorithm. The complexity is $O(M(n))$. Redundant operations in the Newton iteration are avoided by using the scheme described in [1].
The underscore methods support aliasing and allow the input to be shorter than the output, but require the lengths to be nonzero.
[1] G. Hanrot and P. Zimmermann, "Newton Iteration Revisited", http://www.loria.fr/~zimmerma/papers/fastnewton.ps.gz (2004 preprint)
Sets $f$ to the series expansion of $\log(\Gamma(1-x))$, truncated to length $n$.
An fmprb_mat_t represents a dense matrix over the real numbers, implemented as an array of entries of type fmprb_struct.
The dimension (number of rows and columns) of a matrix is fixed at initialization, and the user must ensure that inputs and outputs to an operation have compatible dimensions. The number of rows or columns in a matrix can be zero.
Contains a pointer to a flat array of the entries (entries), an array of pointers to the start of each row (rows), and the number of rows (r) and columns (c).
An fmprb_mat_t is defined as an array of length one of type fmprb_mat_struct, permitting an fmprb_mat_t to be passed by reference.
Macro giving a pointer to the entry at row i and column j.
Returns the number of rows of the matrix.
Returns the number of columns of the matrix.
Initializes the matrix, setting it to the zero matrix with the given number of rows and columns.
Clears the matrix, deallocating all entries.
Sets dest to src. The operands must have identical dimensions.
Prints each entry in the matrix with the specified number of decimal digits.
Returns nonzero iff the matrices have the same dimensions and identical entries.
Returns nonzero iff the matrices have the same dimensions and each entry in mat2 is contained in the corresponding entry in mat1.
Sets all entries in mat to zero.
Sets the entries on the main diagonal to ones, and all other entries to zero.
Sets dest to the exact negation of src. The operands must have the same dimensions.
Sets res to the sum of mat1 and mat2. The operands must have the same dimensions.
Sets res to the difference of mat1 and mat2. The operands must have the same dimensions.
Sets res to the matrix product of mat1 and mat2. The operands must have compatible dimensions for matrix multiplication.
Given an $n \times n$ matrix $A$, computes an LU decomposition $PLU = A$ using Gaussian elimination with partial pivoting. The input and output matrices can be the same, performing the decomposition in-place.
Entry $i$ in the permutation vector perm is set to the row index in the input matrix corresponding to row $i$ in the output matrix.
The algorithm succeeds and returns nonzero if it can find $n$ invertible (i.e. not containing zero) pivot entries. This guarantees that the matrix is invertible.
The algorithm fails and returns zero, leaving the entries in $P$ and $LU$ undefined, if it cannot find $n$ invertible pivot elements. In this case, either the matrix is singular, the input matrix was computed to insufficient precision, or the LU decomposition was attempted at insufficient precision.
Solves $AX = B$ given the precomputed nonsingular LU decomposition $A = PLU$. The matrices $X$ and $B$ are allowed to be aliased with each other, but $X$ is not allowed to be aliased with $LU$.
Solves $AX = B$ where $A$ is a nonsingular $n \times n$ matrix and $X$ and $B$ are $n \times m$ matrices, using LU decomposition.
If $m > 0$ and $A$ cannot be inverted numerically (indicating either that $A$ is singular or that the precision is insufficient), the values in the output matrix are left undefined and zero is returned. A nonzero return value guarantees that $A$ is invertible and that the exact solution matrix is contained in the output.
Sets $X = A^{-1}$ where $A$ is a square matrix, computed by solving the system $AX = I$.
If $A$ cannot be inverted numerically (indicating either that $A$ is singular or that the precision is insufficient), the values in the output matrix are left undefined and zero is returned. A nonzero return value guarantees that the matrix is invertible and that the exact inverse is contained in the output.
Computes the determinant of the matrix, using Gaussian elimination with partial pivoting. If at some point an invertible pivot element cannot be found, the elimination is stopped and the magnitude of the determinant of the remaining submatrix is bounded using Hadamard's inequality.
Arb is licensed GNU General Public License version 2, or any later version.
Fredrik's work on Arb is supported by Austrian Science Fund FWF Grant Y464-N18 (Fast Computer Algebra for Special Functions).
Arb includes code written by Bill Hart and Sebastian Pancratz for FLINT (also licensed GPL 2.0+).
Wishes or bug reports? Send any comments to fredrik.johansson@gmail.com