Arb documentation

Last updated: 2012-09-29 01:27:03 CET

Contents


Introduction

Arb is an experimental C library implementing arbitrary-precision floating-point ball arithmetic, written by Fredrik Johansson <fredrik.johansson@gmail.com>. The git repository is https://github.com/fredrik-johansson/arb/

Ball arithmetic, also known as mid-rad interval arithmetic, is an extension of floating-point arithmetic in which an error bound is attached to each variable. This allows doing rigorous computations over the real numbers, while avoiding the overhead of traditional (inf-sup) interval arithmetic at high precision, and eliminating much of the need for time-consuming and bug-prone manual error analysis.

At the moment, Arb contains:

Planned features include: transcendental functions, more extensive polynomial and matrix functionality, and complex balls (and polynomials and matrices thereof).

Arb uses MPIR and FLINT for the underlying integer arithmetic. The code conventions borrow from FLINT, and the project might get merged back into FLINT when the code stabilizes in the future. Arb also uses MPFR for some fallback code and for testing purposes.

The current version of Arb implements most of its floating-point arithmetic naively using high-level FLINT types. The speed at low precision is far from optimal, and the memory management can sometimes be wasteful. The internals will be rewritten in the future to fix the inefficiencies, which eventually should make Arb ball arithmetic about as fast as mpz or mpfr arithmetic at any precision.

Warning: as this is an early version, any part of the interface is subject to change! Also be aware that there are known and unknown bugs.


Setup

To compile, test and install Arb, do:
./configure <options>
make
make check
make install

Arb requires recent versions of MPIR, MPFR and FLINT. Currently a source checkout of FLINT from https://github.com/fredrik-johansson/flint2 is required. If MPIR, MPFR or FLINT is installed in some other location than the default path /usr/local, pass the flag --with-mpir=... --with-mpfr=... or --with-flint=... with the correct path to configure (type ./configure --help to show more options).

Here is a simple sample program to get started using Arb:

#include "fmprb.h"

int main()
{
    fmprb_t x;
    fmprb_init(x);
    fmprb_const_pi(x, 50 * 3.33);
    fmprb_printd(x, 50); printf("\n");
    fmprb_clear(x);
}
The output should be something like the following:
3.1415926535897932384626433832795028841971693993751 +/- 4.2764e-50

fmpr.h (floating-point arithmetic)

A variable of type fmpr_t holds an arbitrary-precision binary floating-point number, i.e. a rational number of the form $x \times 2^y$ where $x, y \in \mathbb{Z}$ and $x$ is odd; or one of the special values zero, plus infinity, minus infinity, or NaN (not-a-number).

The component $x$ is called the mantissa, and $y$ is called the exponent. Note that this is just one among many possible conventions: the mantissa (alternatively significand) is sometimes viewed as a fraction in the interval $[1/2, 1)$, with the exponent pointing to the position above the top bit rather than the position of the bottom bit, and with a separate sign.

The conventions for special values largely follow those of the IEEE floating-point standard. At the moment, there is no support for negative zero, unsigned infinity, or a NaN with a payload, though some these might be added in the future.

An fmpr number is exact and has no inherent "accuracy". We use the term precision to denote either the target precision of an operation, or the bit size of a mantissa (which in general is unrelated to the "accuracy" of the number: for example, the floating-point value 1 has a precision of 1 bit in this sense and is simultaneously an infinitely accurate approximation of the integer 1 and a 2-bit accurate approximation of $\sqrt 2 = 1.011010100\ldots_2$).

Except where otherwise noted, the output of an operation is the floating-point number obtained by taking the inputs as exact numbers, in principle carrying out the operation exactly, and rounding the resulting real number to the nearest representable floating-point number whose mantissa has at most the specified number of bits, in the specified direction of rounding. Some operations are always or optionally done exactly.

Types, macros and constants

fmpr_struct

An fmpr_struct holds a mantissa and an exponent. If the mantissa and exponent are sufficiently small, their values are stored as immediate values in the fmpr_struct; large values are represented by pointers to heap-allocated arbitrary-precision integers. Currently, both the mantissa and exponent are implemented using the FLINT fmpz type. Special values are currently encoded by the mantissa being set to zero.

fmpr_t

An fmpr_t is defined as an array of length one of type fmpr_struct, permitting an fmpr_t to be passed by reference.

fmpr_rnd_t

Specifies the rounding mode for the result of an approximate operation.

FMPR_RND_NEAREST

Specifies that the result of an operation should be rounded to the nearest representable number, rounding to an odd mantissa if there is a tie between two values. Note: the code for this rounding mode is currently not implemented.

FMPR_RND_DOWN

Specifies that the result of an operation should be rounded to the nearest representable number in the direction towards zero.

FMPR_RND_UP

Specifies that the result of an operation should be rounded to the nearest representable number in the direction away from zero.

FMPR_RND_FLOOR

Specifies that the result of an operation should be rounded to the nearest representable number in the direction towards minus infinity.

FMPR_RND_CEIL

Specifies that the result of an operation should be rounded to the nearest representable number in the direction towards plus infinity.

FMPR_PREC_EXACT

If passed as the precision parameter to a function, indicates that no rounding is to be performed. This must only be used when it is known that the result of the operation can be represented exactly and fits in memory (the typical use case is working with values small integers). Note that, for example, adding two numbers whose exponents are far apart can easily produce an exact result that is far too large to store in memory.

Memory management

void fmpr_init(fmpr_t x)

Initializes the variable x for use. Its value is set to zero.

void fmpr_clear(fmpr_t x)

Clears the variable x, freeing or recycling its allocated memory.

Special values

void fmpr_zero(fmpr_t x)
void fmpr_one(fmpr_t x)
void fmpr_pos_inf(fmpr_t x)
void fmpr_neg_inf(fmpr_t x)
void fmpr_nan(fmpr_t x)

Sets x respectively to 0, 1, $+\infty$, $-\infty$, NaN.

int fmpr_is_zero(const fmpr_t x)
int fmpr_is_one(const fmpr_t x)
int fmpr_is_pos_inf(const fmpr_t x)
int fmpr_is_neg_inf(const fmpr_t x)
int fmpr_is_nan(const fmpr_t x)

Returns nonzero iff x respectively equals 0, 1, $+\infty$, $-\infty$, NaN.

int fmpr_is_inf(const fmpr_t x)

Returns nonzero iff x equals either $+\infty$ or $-\infty$.

int fmpr_is_normal(const fmpr_t x)

Returns nonzero iff x is a finite, nonzero floating-point value, i.e. not one of the special values 0, $+\infty$, $-\infty$, NaN.

int fmpr_is_special(const fmpr_t x)

Returns nonzero iff x is one of the special values 0, $+\infty$, $-\infty$, NaN, i.e. not a finite, nonzero floating-point value.

Assignment and rounding

long _fmpr_normalise(fmpz_t man, fmpz_t exp, long prec, fmpr_rnd_t rnd)

Rounds the mantissa and exponent in-place.

void fmpr_set(fmpr_t y, const fmpr_t x)

Sets y to a copy of x.

long fmpr_set_round(fmpr_t y, const fmpr_t x, long prec, fmpr_rnd_t rnd)

Sets y to a copy of x rounded in the direction specified by rnd to the number of bits specified by prec.

void fmpr_set_error_result(fmpr_t err, const fmpr_t result, long rret)

Given the return value rret and output variable result from a function performing a rounding (e.g. fmpr_set_round or fmpr_add), sets err to a bound for the absolute error.

void fmpr_add_error_result(fmpr_t err, const fmpr_t err_in, const fmpr_t result, long rret, long prec, fmpr_rnd_t rnd)

Like fmpr_set_error_result, but adds err_in to the error.

Comparisons

int fmpr_equal(const fmpr_t x, const fmpr_t y)

Returns nonzero iff x and y are exactly equal. This function does not treat NaN specially, i.e. NaN compares as equal to itself.

int fmpr_cmp(const fmpr_t x, const fmpr_t y)

Returns negative, zero, or positive, depending on whether x is respectively smaller, equal, or greater compared to y. Comparison with NaN is undefined.

int fmpr_cmpabs(const fmpr_t x, const fmpr_t y)

Compares the absolute values of x and y.

int fmpr_sgn(const fmpr_t x)

Returns $-1$, $0$ or $+1$ according to the sign of x. The sign of NaN is undefined.

Random number generation

void fmpr_randtest(fmpr_t x, flint_rand_t state, long bits, long mag_bits)

Generates a finite random number whose mantissa has precision at most bits and whose exponent has at most mag_bits bits. The values are distributed non-uniformly: special bit patterns are generated with high probability in order to allow the test code to exercise corner cases.

void fmpr_randtest_not_zero(fmpr_t x, flint_rand_t state, long bits, long mag_bits)

Identical to fmpr_randtest, except that zero is never produced as an output.

void fmpr_randtest_special(fmpr_t x, flint_rand_t state, long bits, long mag_bits)

Indentical to fmpr_randtest, except that the output occasionally is set to an infinity or NaN.

Conversions

int fmpr_get_mpfr(mpfr_t x, const fmpr_t y, mpfr_rnd_t rnd)

Sets the MPFR variable x to the value of y. If the precision of x is too small to allow y to be represented exactly, it is rounded in the specified MPFR rounding mode. The return value indicates the direction of rounding, following the standard convention of the MPFR library.

void fmpr_set_mpfr(fmpr_t x, const mpfr_t y)

Sets x to the exact value of the MPFR variable y.

void fmpr_set_ui(fmpr_t x, ulong c)
void fmpr_set_si(fmpr_t x, long c)
void fmpr_set_fmpz(fmpr_t x, const fmpz_t c)

Sets x exactly to the integer c.

void fmpr_get_fmpq(fmpq_t y, const fmpr_t x);

Sets y to the exact value of x. The result is undefined if x is not a finite fraction.

long fmpr_set_fmpq(fmpr_t x, const fmpq_t y, long prec, fmpr_rnd_t rnd);

Sets x to the value of y, rounded according to prec and rnd.

void fmpr_set_fmpz_2exp(fmpr_t x, const fmpz_t man, const fmpz_t exp);
void fmpr_set_si_2exp_si(fmpr_t x, long man, long exp)
void fmpr_set_ui_2exp_si(fmpr_t x, ulong man, long exp)

Sets x to $\mathrm{man} \times 2^{\mathrm{exp}}$.

void fmpr_get_fmpz_2exp(fmpz_t man, fmpz_t exp, const fmpr_t x);

Sets man and exp to the unique integers such that $x = \mathrm{man} \times 2^{\mathrm{exp}}$ and man is odd, provided that x is a nonzero finite fraction. If x is zero, both man and exp are set to zero. If x is infinite or NaN, the result is undefined.

int fmpr_get_fmpz_fixed_fmpz(fmpz_t y, const fmpr_t x, const fmpz_t e)
int fmpr_get_fmpz_fixed_si(fmpz_t y, const fmpr_t x, long e)

Converts x to a mantissa with predetermined exponent, i.e. computes an integer y such that $y \times 2^e \approx x$, truncating if necessary. Returns 0 if exact and 1 if truncation occurred.

Input and output

void fmpr_print(const fmpr_t x)

Prints the mantissa and exponent of x as integers, precisely showing the internal representation.

void fmpr_printd(const fmpr_t x, long digits)

Prints x as a decimal floating-point number, rounding to the specified number of digits. This function is currently implemented using MPFR, and does not support large exponents.

Arithmetic

void fmpr_neg(fmpr_t y, const fmpr_t x)

Sets y to the negation of x.

long fmpr_neg_round(fmpr_t y, const fmpr_t x, long prec, fmpr_rnd_t rnd)

Sets y to the negation of x, rounding the result.

void fmpr_abs(fmpr_t y, const fmpr_t x)

Sets y to the absolute value of x.

long fmpr_add(fmpr_t z, const fmpr_t x, const fmpr_t y, long prec, fmpr_rnd_t rnd)
long fmpr_add_ui(fmpr_t z, const fmpr_t x, ulong y, long prec, fmpr_rnd_t rnd)
long fmpr_add_si(fmpr_t z, const fmpr_t x, long y, long prec, fmpr_rnd_t rnd)
long fmpr_add_fmpz(fmpr_t z, const fmpr_t x, const fmpz_t y, long prec, fmpr_rnd_t rnd)

Sets $z = x + y$, rounded according to prec and rnd. The precision can be FMPR_PREC_EXACT to perform an exact addition, provided that the result fits in memory.

long _fmpr_add_eps(fmpr_t z, const fmpr_t x, int sign, long prec, fmpr_rnd_t rnd)

Sets z to the value that results by adding an infinitesimal quantity of the given sign to x, and rounding. The result is undefined if x is zero.

long fmpr_sub(fmpr_t z, const fmpr_t x, const fmpr_t y, long prec, fmpr_rnd_t rnd)
long fmpr_sub_ui(fmpr_t z, const fmpr_t x, ulong y, long prec, fmpr_rnd_t rnd)
long fmpr_sub_si(fmpr_t z, const fmpr_t x, long y, long prec, fmpr_rnd_t rnd)
long fmpr_sub_fmpz(fmpr_t z, const fmpr_t x, const fmpz_t y, long prec, fmpr_rnd_t rnd)

Sets $z = x - y$, rounded according to prec and rnd. The precision can be FMPR_PREC_EXACT to perform an exact addition, provided that the result fits in memory.

long fmpr_mul(fmpr_t z, const fmpr_t x, const fmpr_t y, long prec, fmpr_rnd_t rnd)
long fmpr_mul_ui(fmpr_t z, const fmpr_t x, ulong y, long prec, fmpr_rnd_t rnd)
long fmpr_mul_si(fmpr_t z, const fmpr_t x, long y, long prec, fmpr_rnd_t rnd)
long fmpr_mul_fmpz(fmpr_t z, const fmpr_t x, const fmpz_t y, long prec, fmpr_rnd_t rnd)

Sets $z = x \times y$, rounded according to prec and rnd. The precision can be FMPR_PREC_EXACT to perform an exact multiplication, provided that the result fits in memory.

void fmpr_mul_2exp_si(fmpr_t y, const fmpr_t x, long e)
void fmpr_mul_2exp_fmpz(fmpr_t y, const fmpr_t x, const fmpz_t e)

Sets y to x multiplied by $2^e$ without rounding.

long fmpr_div(fmpr_t z, const fmpr_t x, const fmpr_t y, long prec, fmpr_rnd_t rnd)
long fmpr_div_ui(fmpr_t z, const fmpr_t x, ulong y, long prec, fmpr_rnd_t rnd)
long fmpr_ui_div(fmpr_t z, ulong x, const fmpr_t y, long prec, fmpr_rnd_t rnd)
long fmpr_div_si(fmpr_t z, const fmpr_t x, long y, long prec, fmpr_rnd_t rnd)
long fmpr_si_div(fmpr_t z, long x, const fmpr_t y, long prec, fmpr_rnd_t rnd)
long fmpr_div_fmpz(fmpr_t z, const fmpr_t x, const fmpz_t y, long prec, fmpr_rnd_t rnd)
long fmpr_fmpz_div(fmpr_t z, const fmpz_t x, const fmpr_t y, long prec, fmpr_rnd_t rnd)
long fmpr_fmpz_div_fmpz(fmpr_t z, const fmpz_t x, const fmpz_t y, long prec, fmpr_rnd_t rnd)

Sets $z = x / y$, rounded according to prec and rnd. If $y$ is zero, $z$ is set to NaN.

long fmpr_addmul(fmpr_t z, const fmpr_t x, const fmpr_t y, long prec, fmpr_rnd_t rnd)
long fmpr_addmul_ui(fmpr_t z, const fmpr_t x, ulong y, long prec, fmpr_rnd_t rnd)
long fmpr_addmul_si(fmpr_t z, const fmpr_t x, long y, long prec, fmpr_rnd_t rnd)
long fmpr_addmul_fmpz(fmpr_t z, const fmpr_t x, const fmpz_t y, long prec, fmpr_rnd_t rnd)

Sets $z = z + x \times y$, rounded according to prec and rnd. The intermediate multiplication is always performed without roundoff. The precision can be FMPR_PREC_EXACT to perform an exact addition, provided that the result fits in memory.

long fmpr_submul(fmpr_t z, const fmpr_t x, const fmpr_t y, long prec, fmpr_rnd_t rnd)
long fmpr_submul_ui(fmpr_t z, const fmpr_t x, ulong y, long prec, fmpr_rnd_t rnd)
long fmpr_submul_si(fmpr_t z, const fmpr_t x, long y, long prec, fmpr_rnd_t rnd)
long fmpr_submul_fmpz(fmpr_t z, const fmpr_t x, const fmpz_t y, long prec, fmpr_rnd_t rnd)

Sets $z = z - x \times y$, rounded according to prec and rnd. The intermediate multiplication is always performed without roundoff. The precision can be FMPR_PREC_EXACT to perform an exact subtraction, provided that the result fits in memory.

long fmpr_sqrt(fmpr_t y, const fmpr_t x, long prec, fmpr_rnd_t rnd)
long fmpr_sqrt_ui(fmpr_t z, ulong x, long prec, fmpr_rnd_t rnd)
long fmpr_sqrt_fmpz(fmpr_t z, const fmpz_t x, long prec, fmpr_rnd_t rnd)

Sets $z$ to the square root of $x$, rounded according to prec and rnd. The result is NaN if $x$ is negative.

void fmpr_pow_sloppy_fmpz(fmpr_t y, const fmpr_t b, const fmpz_t e, long prec, fmpr_rnd_t rnd)
void fmpr_pow_sloppy_ui(fmpr_t y, const fmpr_t b, ulong e, long prec, fmpr_rnd_t rnd)
void fmpr_pow_sloppy_si(fmpr_t y, const fmpr_t b, long e, long prec, fmpr_rnd_t rnd)

Sets $y = b^e$, computed using without guaranteeing correct (optimal) rounding, but guaranteeing that the result is a correct upper or lower bound if the rounding is directional. Currently requires $b \ge 0$.

Special functions

long fmpr_log(fmpr_t y, const fmpr_t x, long prec, fmpr_rnd_t rnd)

Sets $z$ to $\log(x)$, rounded according to prec and rnd. The result is NaN if $x$ is negative. This function is currently implemented using MPFR and does not support large exponents.

long fmpr_log1p(fmpr_t y, const fmpr_t x, long prec, fmpr_rnd_t rnd)

Sets $z$ to $\log(1+x)$, rounded according to prec and rnd. This function computes an accurate value when $x$ is small. The result is NaN if $1+x$ is negative. This function is currently implemented using MPFR and does not support large exponents.

long fmpr_exp(fmpr_t y, const fmpr_t x, long prec, fmpr_rnd_t rnd)

Sets $z$ to $\exp(x)$, rounded according to prec and rnd. This function is currently implemented using MPFR and does not support large exponents.

long fmpr_expm1(fmpr_t y, const fmpr_t x, long prec, fmpr_rnd_t rnd)

Sets $z$ to $\exp(x)-1$, rounded according to prec and rnd. This function computes an accurate value when $x$ is small. This function is currently implemented using MPFR and does not support large exponents.


fmprb.h (real ball arithmetic)

An fmprb_t represents a ball over the real numbers, that is, an interval $[m-r, m+r]$ where the midpoint $m$ and the radius $r$ are (extended) real numbers and $r$ is nonnegative. The result of an (approximate) operation done on fmprb_t variables is a ball which contains the result of the (mathematically exact) operation applied to any choice of points in the input balls. In general, the output ball is not the smallest possible.

The precision parameter passed to each function roughly indicates the precision to which calculations on the midpoint are carried out (operations on the radius are always done using a fixed, small precision.)

For arithmetic operations, the precision parameter currently simply specifies the precision of the corresponding fmpr operation. In the future, the arithmetic might be made faster by incorporating sloppy rounding (typically equivalent to a loss of 1-2 bits of effective working precision) when the result is known to be inexact (while still propagating errors rigorously, of course). Arithmetic operations done on exact input with exactly representable output are always guaranteed to produce exact output.

For more complex operations, the precision parameter indicates a minimum working precision (algorithms might allocate extra internal precision to attempt to produce an output accurate to the requested number of bits, especially when the required precision can be estimated easily, but this is not generally required).

If the precision is increased and the inputs either are exact or are computed with increased accuracy as well, the output should converge proportionally, absent any bugs. The general intended strategy for using ball arithmetic is to add a few guard bits, and then repeat the calculation as necessary with an exponentially increasing number of guard bits (Ziv's strategy) until the result is exact enough for one's purposes (typically the first attempt will be successful). There are some caveats: in general, ball arithmetic only makes sense for (Lipschitz) continuous functions, and trying to approximate functions close to singularities might result in slow convergence, or failure to converge.

Warning: some methods for transcendental functions and constants currently perform the error propagation in a non-rigorous way, due to the implementation being incomplete (in some cases, a rigorous error bound for the algorithm or function might not be known at all). This should be indicated in the documentation for each function.

Types, macros and constants

fmprb_struct
fmprb_t

An fmprb_struct consists of a pair of fmpr_struct:s. An fmprb_t is defined as an array of length one of type fmprb_struct, permitting an fmprb_t to be passed by reference.

FMPRB_RAD_PREC

The precision used for operations on the radius. This is small enough to fit in a single word, currently 30 bits.

fmprb_midref(x)

Macro returning a pointer to the midpoint of x as an fmpr_t.

fmprb_radref(x)

Macro returning a pointer to the radius of x as an fmpr_t.

Memory management

void fmprb_init(fmprb_t x)

Initializes the variable x for use. Its midpoint and radius are both set to zero.

void fmprb_clear(fmprb_t x)

Clears the variable x, freeing or recycling its allocated memory.

fmprb_struct * _fmprb_vec_init(long n)

Returns a pointer to an array of n initialized fmprb_struct:s.

void _fmprb_vec_clear(fmprb_struct * v, long n)

Clears an array of n initialized fmprb_struct:s.

Basic manipulation

int fmprb_is_exact(const fmprb_t x)

Returns nonzero iff the radius of x is zero.

int fmprb_equal(const fmprb_t x, const fmprb_t y)

Returns nonzero iff x and y are equal as balls, i.e. have both the same midpoint and radius.

void fmprb_zero(fmprb_t x)

Sets x to zero.

fmprb_is_zero(const fmprb_t x)

Returns nonzero iff the midpoint and radius of x are both zero.

void fmprb_set(fmprb_t y, const fmprb_t x)

Sets y to a copy of x.

void fmprb_set_round(fmprb_t y, const fmprb_t x, long prec)

Sets y to a copy of x, rounded to prec bits.

fmprb_neg(fmprb_t y, const fmprb_t x)

Sets y to the negation of x.

fmprb_abs(fmprb_t y, const fmprb_t x)

Sets y to the absolute value of x. No attempt is made to improve the interval represented by x if it contains zero.

void fmprb_set_fmpr(fmprb_t y, const fmpr_t x)
void fmprb_set_si(fmprb_t y, long x)
void fmprb_set_ui(fmprb_t y, ulong x)
void fmprb_set_fmpz(fmprb_t y, const fmpz_t x)

Sets y exactly to x.

void fmprb_set_fmpq(fmprb_t y, const fmpq_t x, long prec)

Sets y to the rational number x, rounded to prec bits.

int fmprb_is_one(const fmprb_t x)

Returns nonzero iff x is exactly 1.

void fmprb_one(fmprb_t x)

Sets x to the exact integer 1.

Input and output

void fmprb_print(const fmprb_t x)

Prints the internal representation of x.

void fmprb_printd(const fmprb_t x, long digits)

Prints x in decimal. The printed value of the radius is not adjusted to compensate for the fact that the binary-to-decimal conversion of both the midpoint and the radius introduces additional error.

Random number generation

void fmprb_randtest(fmprb_t x, flint_rand_t state, long prec, long mag_bits)

Generates a random ball. The midpoint and radius will both be finite.

void fmprb_get_rand_fmpq(fmpq_t q, flint_rand_t state, const fmprb_t x, long bits)

Sets q to a random rational number from the interval represented by x. A denominator is chosen by multiplying the binary denominator of x by a random integer up to size bits.

The outcome is undefined if the midpoint or radius of x is non-finite, or if the exponent of the midpoint or radius is so large or small that representing the endpoints as exact rational numbers would cause overflows.

Precision and comparisons

void fmprb_add_error_fmpr(fmprb_t x, const fmpr_t err)

Adds err, which is assumed to be nonnegative, to the radius of x.

void fmprb_add_error_2exp_si(fmprb_t x, long e)

Adds $2^e$ to the radius of x.

void fmprb_add_error(fmprb_t x, const fmprb_t err)

Adds the supremum of err, which is assumed to be nonnegative, to the radius of x.

int fmprb_contains_fmpr(const fmprb_t x, const fmpr_t y)
int fmprb_contains_fmpq(const fmprb_t x, const fmpq_t y)
int fmprb_contains_fmpz(const fmprb_t x, const fmpz_t y)
int fmprb_contains_mpfr(const fmprb_t x, const mpfr_t y)
int fmprb_contains_zero(const fmprb_t x)

Returns nonzero iff the given number is contained in the interval represented by x.

void fmprb_get_interval_fmpz_2exp(fmpz_t a, fmpz_t b, fmpz_t exp, const fmprb_t x)

Computes the exact interval represented by x, in the form of an integer interval multiplied by a power of two, i.e. $x = [a, b] \times 2^{\mathrm{exp}}$.

The outcome is undefined if the midpoint or radius of x is non-finite, or if the difference in magnitude between the midpoint and radius is so large that representing the endpoints exactly would cause overflows.

long fmprb_rel_error_bits(const fmprb_t x)

Returns the effective relative error of self measured in bits, defined as the difference between the position of the top bit in the radius and the top bit in the midpoint, plus one. The result is clamped between plus/minus FMPR_PREC_EXACT.

long fmprb_rel_accuracy_bits(const fmprb_t x)

Returns the effective relative accuracy of x measured in bits, equal to the negative of the return value from fmprb_rel_error_bits.

Arithmetic

void fmprb_add(fmprb_t z, const fmprb_t x, const fmprb_t y, long prec)
void fmprb_add_ui(fmprb_t z, const fmprb_t x, ulong y, long prec)
void fmprb_add_si(fmprb_t z, const fmprb_t x, long y, long prec)
void fmprb_add_fmpz(fmprb_t z, const fmprb_t x, const fmpz_t y, long prec)
void fmprb_add_fmpr(fmprb_t z, const fmprb_t x, const fmpr_t y, long prec)

Sets $z = x + y$, rounded to prec bits. The precision can be FMPR_PREC_EXACT provided that the result fits in memory.

void fmprb_sub(fmprb_t z, const fmprb_t x, const fmprb_t y, long prec)
void fmprb_sub_ui(fmprb_t z, const fmprb_t x, ulong y, long prec)
void fmprb_sub_si(fmprb_t z, const fmprb_t x, long y, long prec)
void fmprb_sub_fmpz(fmprb_t z, const fmprb_t x, const fmpz_t y, long prec)

Sets $z = x - y$, rounded to prec bits. The precision can be FMPR_PREC_EXACT provided that the result fits in memory.

void fmprb_mul(fmprb_t z, const fmprb_t x, const fmprb_t y, long prec)
void fmprb_mul_ui(fmprb_t z, const fmprb_t x, ulong y, long prec)
void fmprb_mul_si(fmprb_t z, const fmprb_t x, long y, long prec)
void fmprb_mul_fmpz(fmprb_t z, const fmprb_t x, const fmpz_t y, long prec)

Sets $z = x \times y$, rounded to prec bits. The precision can be FMPR_PREC_EXACT provided that the result fits in memory.

void fmprb_mul_2exp_si(fmprb_t y, const fmprb_t x, long e)

Sets $y$ to $x$ multiplied by $2^e$.

void fmprb_div(fmprb_t z, const fmprb_t x, const fmprb_t y, long prec)
void fmprb_div_ui(fmprb_t z, const fmprb_t x, ulong y, long prec)
void fmprb_div_si(fmprb_t z, const fmprb_t x, long y, long prec)
void fmprb_div_fmpz(fmprb_t z, const fmprb_t x, const fmpz_t y, long prec)
void fmprb_fmpz_div_fmpz(fmprb_t y, const fmpz_t num, const fmpz_t den, long prec)
void fmprb_ui_div(fmprb_t z, ulong x, const fmprb_t y, long prec);

Sets $z = x / y$, rounded to prec bits. If $y$ contains zero, $z$ is set to $0 \pm \infty$. Otherwise, error propagation uses the rule $$\left| \frac{x}{y} - \frac{x+\xi_1 a}{y+\xi_2 b} \right| = \left|\frac{x \xi_2 b - y \xi_1 a}{y (y+\xi_2 b)}\right| \le \frac{|xb|+|ya|}{|y| (|y|-b)}$$ where $-1 \le \xi_1, \xi_2 \le 1$, and where the triangle inequality has been applied to the numerator and the reverse triangle inequality has been applied to the denominator.

void fmprb_div_2expm1_ui(fmprb_t y, const fmprb_t x, ulong n, long prec);

Sets $y = x / (2^n - 1)$, rounded to prec bits.

void fmprb_addmul(fmprb_t z, const fmprb_t x, const fmprb_t y, long prec)
void fmprb_addmul_ui(fmprb_t z, const fmprb_t x, ulong y, long prec)
void fmprb_addmul_si(fmprb_t z, const fmprb_t x, long y, long prec)
void fmprb_addmul_fmpz(fmprb_t z, const fmprb_t x, const fmpz_t y, long prec)

Sets $z = z + x \times y$, rounded to prec bits. The precision can be FMPR_PREC_EXACT provided that the result fits in memory.

void fmprb_submul(fmprb_t z, const fmprb_t x, const fmprb_t y, long prec)
void fmprb_submul_ui(fmprb_t z, const fmprb_t x, ulong y, long prec)
void fmprb_submul_si(fmprb_t z, const fmprb_t x, long y, long prec)
void fmprb_submul_fmpz(fmprb_t z, const fmprb_t x, const fmpz_t y, long prec)

Sets $z = z - x \times y$, rounded to prec bits. The precision can be FMPR_PREC_EXACT provided that the result fits in memory.

void fmprb_sqrt(fmprb_t z, const fmprb_t x, long prec)
void fmprb_sqrt_ui(fmprb_t z, ulong x, long prec)
void fmprb_sqrt_fmpz(fmprb_t z, const fmpz_t x, long prec)

Sets $z$ to the square root of $x$, rounded to prec bits. Error propagation is done using the following rule: assuming $m > r \ge 0$, the error is largest at $m - r$, and we have $\sqrt{m} - \sqrt{m-r} \le r / (2 \sqrt{m-r})$.

void fmprb_pow_fmpz(fmprb_t y, const fmprb_t b, const fmpz_t e, long prec)
void fmprb_pow_ui(fmprb_t y, const fmprb_t b, ulong e, long prec)
void fmprb_ui_pow_ui(fmprb_t y, ulong b, ulong e, long prec)
void fmprb_si_pow_ui(fmprb_t y, long b, ulong e, long prec)

Sets $y = b^e$ using binary exponentiation. Provided that $b$ and $e$ are small enough and the exponent is positive, the exact power can be computed using FMPR_PREC_EXACT.

Special functions

void fmprb_log(fmprb_t z, const fmprb_t x, long prec)
void fmprb_log_ui(fmprb_t z, ulong x, long prec)
void fmprb_log_fmpz(fmprb_t z, const fmpz_t x, long prec)

Sets $z = \log(x)$. Error propagation is done using the following rule: assuming $m > r \ge 0$, the error is largest at $m - r$, and we have $\log(m) - \log(m-r) = \log(1 + r/(m-r))$. The last expression is calculated accurately for small radii via fmpr_log1p. An input containing zero currently raises an exception.

void fmprb_exp(fmprb_t z, const fmprb_t x, long prec)

Sets $z = \exp(x)$. Error propagation is done using the following rule: the error is largest at $m + r$, and we have $\exp(m+r) - \exp(m) = \exp(m) (\exp(r)-1)$. The last expression is calculated accurately for small radii via fmpr_expm1.

void fmprb_sin(fmprb_t s, const fmprb_t x, long prec)
void fmprb_cos(fmprb_t c, const fmprb_t x, long prec)
void fmprb_sin_cos(fmprb_t s, fmprb_t c, const fmprb_t x, long prec)

Sets $s = \sin x$, $c = \cos x$. Error propagation uses the rule $|\sin(m \pm r) - \sin(m)| \le r$ (this could be tightened to $\min(r,2)$).

void fmprb_atan(fmprb_t z, const fmprb_t x, long prec)

Sets $z = \tan^{-1} x$. Letting $d = \max(0, |m| - r)$, the propagated error is bounded by $r / (1 + d^2)$ (this could be tightened).

void fmprb_fac_ui(fmprb_t x, ulong n, long prec)

Sets x to n factorial, computed using binary splitting. Provided that $n$ is small enough, the exact factorial can be computed using FMPR_PREC_EXACT.

void fmprb_rfac_ui_bsplit(fmprb_t y, const fmprb_t x, ulong n, long prec)

Sets x to the rising factorial $x (x+1) (x+2) \cdots (x+n-1)$, computed using binary splitting. The basecase processes eight factors at a time using the formula $x(x+1)\cdots(x+7) = (28 + 98x + 63x^2 + 14x^3 + x^4)^2 - 16 (7+2x)^2$, replacing 7 full-precision multiplications with 4 squarings and 1 multiplication [1]. Empirically, this is about twice as fast at high precision. Numerical stability is slightly worse.

[1] R. Crandall and C. Pomerance, "Prime Numbers: A Computational Perspective", second edition, Springer (2005), p. 316.

void fmprb_rfac_ui_multipoint(fmprb_t y, const fmprb_t x, ulong n, long prec)

Sets x to the rising factorial $x (x+1) (x+2) \cdots (x+n-1)$, computed using fast multipoint evaluation. This only requires $O(n^{1/2+\varepsilon})$ multiplications, but has high overhead and poor numerical stability (adding $O(n)$ guard bits to the input might be necessary to achieve full accuracy). It can be expected to be faster than the binary splitting algorithm if the input is a full-precision number, the precision is at least 100000 bits, and $n$ is of the same order of magnitude as (perhaps slightly smaller than) the number of bits.

void fmprb_bin_ui(fmprb_t x, const fmprb_t n, ulong k, long prec)
void fmprb_bin_uiui(fmprb_t x, ulong n, ulong k, long prec)

Sets x to the binomial coefficient ${n \choose k}$, computed using binary splitting. Provided that $n$ and $k$ are small enough, an exact binomial coefficient can be computed using FMPR_PREC_EXACT.

void fmprb_fib_fmpz(fmprb_t f, const fmpz_t n, long prec)
void fmprb_fib_ui(fmprb_t f, ulong n, long prec)

Sets x to the Fibonacci number $F_n$. Uses the binary squaring algorithm described in D. Takahashi, "A fast algorithm for computing large Fibonacci numbers", Information Processing Letters 75 (2000) 243-246 Provided that $n$ is small enough, an exact Fibonacci number can be computed using FMPR_PREC_EXACT.

void fmprb_const_pi_chudnovsky(fmprb_t x, long prec)

Sets x to $\pi$, computed using the Chudnovsky algorithm. Letting $A = 13591409$, $B = 545140134$, $C = 640320$, we have $\pi \approx 1 / s_N$ where $$s_N = 12 \sum_{k=0}^N \frac{(-1)^k (6k)! (A+Bk)} {(3k)! (k!)^3 C^{3k+3/2}}$$

The implementation computes an approximation for the algebraic number $1/s_N$ using binary splitting, bounding the rounding error automatically. The hypergeometric term ratio is asymptotically $R = C^3 / (2^6 \times 3^3) \approx 1.5 \times 10^{14}$, and in fact we have $|\pi - 1/s_N| < 1/R^N$ (with a more detailed calculation, the truncation error could be bounded closer to $1/R^{N+1}$).

void fmprb_const_pi(fmprb_t x, long prec)

Sets x to $\pi$. The value is cached for repeated use.

void fmprb_const_log_sqrt2pi(fmprb_t x, long prec)

Sets x to $\log \sqrt{2 \pi}$. The value is cached for repeated use.

void fmprb_const_euler_brent_mcmillan(fmprb_t res, long prec)

Sets x to Euler's constant $\gamma$, computed using the second Bessel function formula of Brent and McMillan. Brent and McMillan conjectured that the error depending on the internal parameter $n$ is of order $O(e^{-8n})$. Brent has recently proved that this bound is correct, but without determining an explicit big-O factor.

[1] R. P. Brent and E. M. McMillan, "Some new algorithms for high-precision computation of Euler's constant", Mathematics of Computation 34 (1980), 305-312.

[2] R. P. Brent, "Ramanujan and Euler's Constant", http://wwwmaths.anu.edu.au/~brent/pd/Euler_CARMA_10.pdf

[3] The MPFR team (2012), "MPFR Algorithms", http://www.mpfr.org/algo.html

void fmprb_const_zeta3_bsplit(fmprb_t x, long prec)

Sets x to Apery's constant $\zeta(3)$, computed by applying binary splitting to a hypergeometric series.

void fmprb_zeta_ui_asymp(fmprb_t z, ulong s, long prec)

Assuming $s \ge 2$, approximates $\zeta(s)$ by $1 + 2^{-s}$ along with a correct error bound. We use the following bounds: for $s > b$, $\zeta(s) - 1 < 2^{-b}$, and generally, $\zeta(s) - (1 + 2^{-s}) < 2^{2-\lfloor 3 s/2 \rfloor}$.

void fmprb_zeta_ui_euler_product(fmprb_t z, ulong s, long prec)

Computes $\zeta(s)$ using the Euler product. This is fast only if s is large compared to the precision.

Writing $P(a,b) = \prod_{a \le p \le b} (1 - p^{-s})$, we have $1/\zeta(s) = P(a,M) P(M+1,\infty)$.

To bound the error caused by truncating the product at $M$, we write $P(M+1,\infty) = 1 - \epsilon(s,M)$. Since $0 < P(a,M) \le 1$, the absolute error for $\zeta(s)$ is bounded by $\epsilon(s,M)$.

According to the analysis in [1], it holds for all $s \ge 6$ and $M \ge 1$ that $1/P(M+1,\infty) - 1 \le f(s,M) \equiv 2 M^{1-s} / (s/2 - 1)$. Thus, we have $1/(1-\epsilon(s,M)) - 1 \le f(s,M)$, and expanding the geometric series allows us to conclude that $\epsilon(M) \le f(s,M)$.

[1] S. Fillebrown, "Faster Computation of Bernoulli Numbers", Journal of Algorithms 13, 431-445 (1992)

void fmprb_zeta_ui_bernoulli(fmprb_t x, ulong n, long prec)

Computes $\zeta(n)$ for even $n$ via the corresponding Bernoulli number, which is generated using FLINT.

void fmprb_zeta_ui_vec_borwein(fmprb_struct * z, ulong start, long num, ulong step, long prec)

Evaluates $\zeta(s)$ at $\mathrm{num}$ consecutive integers $s$ beginning with $\mathrm{start}$ and proceeding in increments of $\mathrm{step}$. Uses Borwein's formula [1], implemented to support fast multi-evaluation (but also works well for a single $s$).

Requires $\mathrm{start} \ge 2$. For efficiency, the largest $s$ should be at most about as large as $\mathrm{prec}$. Arguments approaching LONG_MAX will cause overflows. One should therefore only use this function for s up to about prec, and then switch to the Euler product.

The algorithm for single $s$ is basically identical to the one used in MPFR (see [2] for a detailed description). In particular, we evaluate the sum backwards to avoid storing more than one $d_k$ coefficient, and use integer arithmetic throughout since it is convenient and the terms turn out to be slightly larger than $2^\mathrm{prec}$. The only numerical error in the main loop comes from the division by $k^s$, which adds less than 1 unit of error per term. For fast multi-evaluation, we repeatedly divide by $k^{\mathrm{step}}$. Each division reduces the input error and adds at most 1 unit of additional rounding error, so by induction, the error per term is always smaller than 2 units.

References:

[1] P. Borwein, "An Efficient Algorithm for the Riemann Zeta Function", Constructive experimental and nonlinear analysis, CMS Conference Proc. 27 (2000), 29-34 http://www.cecm.sfu.ca/personal/pborwein/PAPERS/P155.pdf

[2] The MPFR team (2012), "MPFR Algorithms", http://www.mpfr.org/algo.html

[3] X. Gourdon and P. Sebah (2003), "Numerical evaluation of the Riemann Zeta-function" http://numbers.computation.free.fr/Constants/Miscellaneous/zetaevaluations.pdf

void fmprb_zeta_ui_bsplit(fmprb_t x, ulong s, long prec)

Computes $\zeta(s)$ for arbitrary $s \ge 2$ using a binary splitting implementation of Borwein's formula. The algorithm has quasilinear complexity with respect to the precision.

void fmprb_zeta_ui(fmprb_t x, ulong s, long prec)

Computes $\zeta(s)$ for nonnegative integer $s \ne 1$, automatically choosing an appropriate algorithm.

void fmprb_zeta_ui_vec(fmprb_struct * x, ulong start, long num, long prec)
void fmprb_zeta_ui_vec_even(fmprb_struct * x, ulong start, long num, long prec)
void fmprb_zeta_ui_vec_odd(fmprb_struct * x, ulong start, long num, long prec)

Computes $\zeta(s)$ at num consecutive integers (respectively num even or num odd integers) beginning with $s = \mathrm{start} \ge 2$, automatically choosing an appropriate algorithm.

void fmprb_gamma_fmpq_karatsuba(fmprb_struct * v, const fmpq_t a, long num, long prec)

Uses Karatsuba's algorithm [1] to compute num coefficients in the Taylor series of $\Gamma(a+x)$ for rational $0 < a ≤ 1$, i.e. computes $\Gamma(a), \Gamma'(a) ... \Gamma^{(\mathrm{num}-1)}(a) / (\mathrm{num}-1)!$ This algorithm is most efficient at high precision, for num much smaller than the number of bits, and with small denominators of $a$. In particular, with num = 1, this algorithm computes $\Gamma(a)$ efficiently for small rational $a$.

Let $s = \max(2, \mathrm{num}-1)$. With parameters $r$ and $n$ chosen such that $r \ge n$ and $n \ge 2 s \log 2 s$, Karatsuba shows that $$\Gamma^{(j)}(a) = \sum_{k=0}^r \frac{(-1)^k}{k!} \frac{n^{k+a}}{k+a} \sum_{m=0}^j (-1)^m \frac{j! \, \log^{j-m} n}{(j-m)! (k+a)^m} + \theta_j$$

where $$|\theta_j| \le \frac{5}{3} e^{-n} \log^s n + \left(\frac{e}{r+2}\right)^{r+2} (1 + n^{r+2} \log^s n).$$

We choose the parameters $n$ and $r$ heuristically to be nearly optimal, and then evaluate the above formula to bound $\theta_j$ rigorously.

Karatsuba claims that choosing $r \ge 3n$ gives $|\theta_j| \le 2^{-n-1}$. This is, unfortunately, incorrect. Setting $r = n \alpha$ and expanding the error term around $n = \infty$, one finds that $\alpha$ asymptotically should be $1/W(1/e) \approx 3.59112147666862$ where $W(x)$ is the Lambert W-function. We also optimize the selection of $n$ by choosing $n \approx b \log 2$ where $b$ is the desired number of bits, rather than $n \approx b$, and round $n$ so that it has a short binary expansion (this gives smaller numbers in the binary splitting stage).

Finally, if $s$ is small, we perform binary splitting to a working precision of about $2.2$ times the target precision rather than exactly. This factor was tested to give full accuracy up to at least one million digits when $s \approx 1$. A more careful analysis should be done here so that a working precision is selected which always is sufficient and also nearly optimal.

[1] E. A. Karatsuba, "Fast evaluation of the Hurwitz zeta function and Dirichlet L-series", Problems of Information Transmission, Vol. 34, No. 4, 1998.

void fmprb_gamma_log(fmprb_t y, const fmprb_t x, long prec)

Sets $y = \log \Gamma(x)$, assuming that $x > 0$.

For large $x$, uses Stirling's expansion $$\log \Gamma(x) = \left(x-\frac{1}{2}\right)\log x - x + \frac{\ln {2 \pi}}{2} + \sum_{k=1}^{n-1} t_k + R(n,x) $$ where $$t_k = \frac{B_{2k}}{2k(2k-1)x^{2k-1}}$$ and $|R(n,x)| < t_n$.

If $x$ is too small for the asymptotic expansion to give sufficient accuracy directly, we translate to $x + r$ using the formula $\log \Gamma(x) = \log \Gamma(x+r) - \log(x (x+1) (x+2) \cdots (x+r-1))$. To obtain a remainder smaller than $2^{-b}$, we must choose an $r$ such that $x + r > \beta b$, where $\beta > \log(2) / (2 \pi) \approx 0.11$. We use a slightly larger factor $\beta \approx 0.2$ to more closely balance $n$ and $r$. A much larger $\beta$ (e.g. $\beta = 1$) could be used to reduce the number of Bernoulli numbers that have to be precomputed, at the expense of slower repeated evaluation.


fmprb_poly.h (polynomials of real balls)

An fmprb_poly_t represents a polynomial over the real numbers, implemented as an array of coefficients of type fmprb_struct.

Most functions are provided in two versions: an underscore method which operates directly on pre-allocated arrays of coefficients and generally has some restrictions (such as requiring the lengths to be nonzero and not supporting aliasing of the input and output arrays), and a non-underscore method which performs automatic memory management and handles degenerate cases.

Types, macros and constants

fmprb_poly_struct
fmprb_poly_t

Contains a pointer to an array of coefficients (coeffs), the used length (length), and the allocated size of the array (alloc).

An fmprb_poly_t is defined as an array of length one of type fmprb_poly_struct, permitting an fmprb_poly_t to be passed by reference.

Memory management

void fmprb_poly_init(fmprb_poly_t poly)

Initializes the polynomial for use, setting it to the zero polynomial.

void fmprb_poly_clear(fmprb_poly_t poly)

Clears the polynomial, deallocating all coefficients and the coefficient array.

void fmprb_poly_fit_length(fmprb_poly_t poly, long len)

Makes sures that the coefficient array of the polynomial contains at least len initialized coefficients.

void _fmprb_poly_set_length(fmprb_poly_t poly, long len)

Directly changes the length of the polynomial, without allocating or deallocating coefficients. The value shold not exceed the allocation length.

void _fmprb_poly_normalise(fmprb_poly_t poly)

Strips any trailing coefficients which are identical to zero. The output

void fmprb_poly_zero(fmprb_poly_t poly)
void fmprb_poly_one(fmprb_poly_t poly)

Sets poly to the constant 0 respectively 1.

Conversions

void fmprb_poly_set_fmpz_poly(fmprb_poly_t poly, const fmpz_poly_t src, long prec)
void fmprb_poly_set_fmpq_poly(fmprb_poly_t poly, const fmpq_poly_t src, long prec)

Sets poly to src.

Input and output

void fmprb_poly_printd(const fmprb_poly_t poly, long digits)

Prints the polynomial as an array of coefficients, printing each coefficient using fmprb_printd.

Comparisons

int fmprb_poly_contains_fmpq_poly(const fmprb_poly_t poly1, const fmpq_poly_t poly2)

Returns nonzero iff poly1 contains poly2.

int fmprb_poly_equal(const fmprb_t A, const fmprb_t B)

Returns nonzero iff A and B are equal as polynomial balls, i.e. all coefficients have equal midpoint and radius.

Arithmetic

void _fmprb_poly_add(fmprb_struct * C, const fmprb_struct * A, long lenA, const fmprb_struct * B, long lenB, long prec)

Sets {C, max(lenA, lenB)} to the sum of {A, lenA} and {B, lenB}. Allows aliasing of the input and output operands.

void fmprb_poly_add(fmprb_poly_t C, const fmprb_poly_t A, const fmprb_poly_t B, long prec)

Sets C to the sum of A and B.

void _fmprb_poly_mullow(fmprb_struct * C, const fmprb_struct * A, long lenA, const fmprb_struct * B, long lenB, long n, long prec)

Sets {C, n} to the product of {A, lenA} and {B, lenB}, truncated to length n. The output is not allowed to be aliased with either of the inputs. We require lenA ≥ lenB > 0, n > 0, lenA + lenB - 1 ≥ n.

As currently implemented, this function puts each input polynomial on a common exponent, truncates to prec bits, and multiplies exactly over the integers. The output error is computed by cross-multiplying the max norms.

void fmprb_poly_mullow(fmprb_poly_t C, const fmprb_poly_t A, const fmprb_poly_t B, long n, long prec)

Sets C to the product of A and B, truncated to length n.

void _fmprb_poly_mul(fmprb_struct * C, const fmprb_struct * A, long lenA, const fmprb_struct * B, long lenB, long prec)

Sets {C, n} to the product of {A, lenA} and {B, lenB}, truncated to length n. The output is not allowed to be aliased with either of the inputs. We require lenA ≥ lenB > 0, n > 0. This function currently calls _fmprb_poly_mullow.

void fmprb_poly_mul(fmprb_poly_t C, const fmprb_poly_t A, const fmprb_poly_t B, long prec)

Sets C to the product of A and B.

void _fmprb_poly_inv_series(fmprb_struct * Qinv, const fmprb_struct * Q, long len, long prec)

Sets {Qinv, len} to the power series inverse of {Q, len}. Uses Newton iteration.

void fmprb_poly_inv_series(fmprb_poly_t Qinv, const fmprb_poly_t Q, long n, long prec)

Sets Qinv to the power series inverse of Q.

void _fmprb_poly_div(fmprb_struct * Q, const fmprb_struct * A, long lenA, const fmprb_struct * B, long lenB, long prec)
void _fmprb_poly_rem(fmprb_struct * R, const fmprb_struct * A, long lenA, const fmprb_struct * B, long lenB, long prec)
void _fmprb_poly_divrem(fmprb_struct * Q, fmprb_struct * R, const fmprb_struct * A, long lenA, const fmprb_struct * B, long lenB, long prec)
void fmprb_poly_divrem(fmprb_poly_t Q, fmprb_poly_t R, const fmprb_poly_t A, const fmprb_poly_t B, long prec)

Performs polynomial division with remainder, computing a quotient $Q$ and a remainder $R$ such that $A = BQ + R$. The leading coefficient of $B$ must not contain zero. The implementation reverses the inputs and performs power series division.

void _fmprb_poly_div_root(fmprb_struct * Q, fmprb_t R, const fmprb_struct * A, long len, const fmprb_t c, long prec)

Divides $A$ by the polynomial $x - c$, computing the quotient $Q$ as well as the remainder $R = f(c)$.

Product trees

void _fmprb_poly_product_roots(fmprb_struct * poly, const fmprb_struct * xs, long n, long prec)
void fmprb_poly_product_roots(fmprb_poly_t poly, fmprb_struct * xs, long n, long prec)

Generates the polynomial $(x-x_0)(x-x_1)\cdots(x-x_{n-1})$.

fmprb_struct ** _fmprb_poly_tree_alloc(long len)

Returns an initialized data structured capable of representing a remainder tree (product tree) of the given number of roots.

void _fmprb_poly_tree_free(fmprb_struct ** tree, long len)

Deallocates a tree structure as allocated using _fmprb_poly_tree_alloc.

void _fmprb_poly_tree_build(fmprb_struct ** tree, const fmprb_struct * roots, long len, long prec)

Constructs a product tree from a given array of roots. The tree structure must be pre-allocated to the specified length using _fmprb_poly_tree_alloc.

Composition

void _fmprb_poly_compose_horner(fmprb_struct * res, const fmprb_struct * poly1, long len1, const fmprb_struct * poly2, long len2, long prec)
void fmprb_poly_compose_horner(fmprb_poly_t res, const fmprb_poly_t poly1, const fmprb_poly_t poly2, long prec)
void _fmprb_poly_compose_divconquer(fmprb_struct * res, const fmprb_struct * poly1, long len1, const fmprb_struct * poly2, long len2, long prec);
void fmprb_poly_compose_divconquer(fmprb_poly_t res, const fmprb_poly_t poly1, const fmprb_poly_t poly2, long prec)
void _fmprb_poly_compose(fmprb_struct * res, const fmprb_struct * poly1, long len1, const fmprb_struct * poly2, long len2, long prec)
void fmprb_poly_compose(fmprb_poly_t res, const fmprb_poly_t poly1, const fmprb_poly_t poly2, long prec)

Sets res to the composition $h(x) = f(g(x))$ where $f$ is given by poly1 and $g$ is given by poly2, respectively using Horner's rule, divide-and-conquer, and an automatic choice between the two algorithms. The underscore methods do not support aliasing of the output with either input polynomial.

void _fmprb_poly_compose_series_horner(fmprb_struct * res, const fmprb_struct * poly1, long len1, const fmprb_struct * poly2, long len2, long n, long prec)
void fmprb_poly_compose_series_horner(fmprb_poly_t res, const fmprb_poly_t poly1, const fmprb_poly_t poly2, long n, long prec)
void _fmprb_poly_compose_series_brent_kung(fmprb_struct * res, const fmprb_struct * poly1, long len1, const fmprb_struct * poly2, long len2, long n, long prec)
void fmprb_poly_compose_series_brent_kung(fmprb_poly_t res, const fmprb_poly_t poly1, const fmprb_poly_t poly2, long n, long prec)
void _fmprb_poly_compose_series(fmprb_struct * res, const fmprb_struct * poly1, long len1, const fmprb_struct * poly2, long len2, long n, long prec)
void fmprb_poly_compose_series(fmprb_poly_t res, const fmprb_poly_t poly1, const fmprb_poly_t poly2, long n, long prec)

Sets res to the power series composition $h(x) = f(g(x))$ truncated to order $O(x^n)$ where $f$ is given by poly1 and $g$ is given by poly2, respectively using Horner's rule, the Brent-Kung baby step-giant step algorithm, and an automatic choice between the two algorithms. We require that the constant term in $g(x)$ is exactly zero. The underscore methods do not support aliasing of the output with either input polynomial.

Evaluation and interpolation

void _fmprb_poly_evaluate(fmprb_t res, const fmprb_struct * f, long len, const fmprb_t a, long prec)
void fmprb_poly_evaluate(fmprb_t res, const fmprb_poly_t f, const fmprb_t a, long prec)

Sets res to $f(a)$, evaluated using Horner's rule.

void _fmprb_poly_evaluate_vec_iter(fmprb_struct * ys, const fmprb_struct * poly, long plen, const fmprb_struct * xs, long n, long prec)
void fmprb_poly_evaluate_vec_iter(fmprb_struct * ys, const fmprb_poly_t poly, const fmprb_struct * xs, long n, long prec)

Evaluates the polynomial simultaneously at n given points, calling _fmprb_poly_evaluate repeatedly.

void _fmprb_poly_evaluate_vec_fast_precomp(fmprb_struct * vs, const fmprb_struct * poly, long plen, fmprb_struct ** tree, long len, long prec)
void _fmprb_poly_evaluate_vec_fast(fmprb_struct * ys, const fmprb_struct * poly, long plen, const fmprb_struct * xs, long n, long prec)
void fmprb_poly_evaluate_vec_fast(fmprb_struct * ys, const fmprb_poly_t poly, const fmprb_struct * xs, long n, long prec)

Evaluates the polynomial simultaneously at n given points, using fast multipoint evaluation.

void _fmprb_poly_interpolate_newton(fmprb_struct * poly, const fmprb_struct * xs, const fmprb_struct * ys, long n, long prec)
void fmprb_poly_interpolate_newton(fmprb_poly_t poly, const fmprb_struct * xs, const fmprb_struct * ys, long n, long prec)

Recovers the unique polynomial of length at most n that interpolates the given x and y values. This implementation first interpolates in the Newton basis and then converts back to the monomial basis.

void _fmprb_poly_interpolate_barycentric(fmprb_struct * poly, const fmprb_struct * xs, const fmprb_struct * ys, long n, long prec)
void fmprb_poly_interpolate_barycentric(fmprb_poly_t poly, const fmprb_struct * xs, const fmprb_struct * ys, long n, long prec)

Recovers the unique polynomial of length at most n that interpolates the given x and y values. This implementation uses the barycentric form of Lagrange interpolation.

void _fmprb_poly_interpolation_weights(fmprb_struct * w, fmprb_struct ** tree, long len, long prec)
void _fmprb_poly_interpolate_fast_precomp(fmprb_struct * poly, const fmprb_struct * ys, fmprb_struct ** tree, const fmprb_struct * weights, long len, long prec)
void _fmprb_poly_interpolate_fast(fmprb_struct * poly, const fmprb_struct * xs, const fmprb_struct * ys, long len, long prec)
void fmprb_poly_interpolate_fast(fmprb_poly_t poly, const fmprb_struct * xs, const fmprb_struct * ys, long n, long prec)

Recovers the unique polynomial of length at most n that interpolates the given x and y values, using fast Lagrange interpolation. The precomp function takes a precomputed product tree over the x values and a vector of interpolation weights as additional inputs.

Differentiation

void _fmprb_poly_derivative(fmprb_struct * res, const fmprb_struct * poly, long len, long prec)

Sets {res, len - 1} to the derivative of {poly, len}. Allows aliasing of the input and output.

void fmprb_poly_derivative(fmprb_poly_t res, const fmprb_poly_t poly, long prec)

Sets res to the derivative of poly.

void _fmprb_poly_integral(fmprb_struct * res, const fmprb_struct * poly, long len, long prec)

Sets {res, len} to the integral of {poly, len - 1}. Allows aliasing of the input and output.

void fmprb_poly_integral(fmprb_poly_t res, const fmprb_poly_t poly, long prec)

Sets res to the integral of poly.

Special functions

void _fmprb_poly_log_series(fmprb_struct * f, fmprb_struct * h, long n, long prec)
void fmprb_poly_log_series(fmprb_poly_t f, const fmprb_poly_t h, long n, long prec)

Sets $f$ to the power series logarithm of $h$, truncated to length $n$. Uses the formula $\log f = \int f' / f$, adding the logarithm of the constant term in $h$ as the constant of integration. The underscore method does not support aliasing of the input and output arrays.

void _fmprb_poly_exp_series_basecase(fmprb_struct * f, const fmprb_struct * h, long hlen, long n, long prec)
void fmprb_poly_exp_series_basecase(fmprb_poly_t f, const fmprb_poly_t h, long n, long prec)
void _fmprb_poly_exp_series(fmprb_struct * f, const fmprb_struct * h, long hlen, long n, long prec)
void fmprb_poly_exp_series(fmprb_poly_t f, const fmprb_poly_t h, long n, long prec)

Sets $f$ to the power series exponential of $h$, truncated to length $n$.

The basecase version uses a simple recurrence for the coefficients, requiring $O(nm)$ operations where $m$ is the length of $h$.

The main implementation uses Newton iteration, starting from a small number of terms given by the basecase algorithm. The complexity is $O(M(n))$. Redundant operations in the Newton iteration are avoided by using the scheme described in [1].

The underscore methods support aliasing and allow the input to be shorter than the output, but require the lengths to be nonzero.

[1] G. Hanrot and P. Zimmermann, "Newton Iteration Revisited", http://www.loria.fr/~zimmerma/papers/fastnewton.ps.gz (2004 preprint)

void fmprb_poly_log_gamma_series(fmprb_poly_t f, long n, long prec)

Sets $f$ to the series expansion of $\log(\Gamma(1-x))$, truncated to length $n$.


fmprb_mat.h (matrices of real balls)

An fmprb_mat_t represents a dense matrix over the real numbers, implemented as an array of entries of type fmprb_struct.

The dimension (number of rows and columns) of a matrix is fixed at initialization, and the user must ensure that inputs and outputs to an operation have compatible dimensions. The number of rows or columns in a matrix can be zero.

Types, macros and constants

fmprb_mat_struct
fmprb_mat_t

Contains a pointer to a flat array of the entries (entries), an array of pointers to the start of each row (rows), and the number of rows (r) and columns (c).

An fmprb_mat_t is defined as an array of length one of type fmprb_mat_struct, permitting an fmprb_mat_t to be passed by reference.

fmprb_mat_entry(mat, i, j)

Macro giving a pointer to the entry at row i and column j.

fmprb_mat_nrows(mat)

Returns the number of rows of the matrix.

fmprb_mat_ncols(mat)

Returns the number of columns of the matrix.

Memory management

void fmprb_mat_init(fmprb_mat_t mat, long r, long c)

Initializes the matrix, setting it to the zero matrix with the given number of rows and columns.

void fmprb_mat_clear(fmprb_mat_t mat)

Clears the matrix, deallocating all entries.

Conversions

void fmprb_mat_set(fmprb_mat_t dest, const fmprb_mat_t src)
void fmprb_mat_set_fmpz_mat(fmprb_mat_t dest, const fmpz_mat_t src)
void fmprb_mat_set_fmpq_mat(fmprb_mat_t dest, const fmpq_mat_t src, long prec)

Sets dest to src. The operands must have identical dimensions.

Input and output

void fmprb_mat_printd(const fmprb_mat_t mat, long digits)

Prints each entry in the matrix with the specified number of decimal digits.

Comparisons

int fmprb_mat_equal(const fmprb_mat_t mat1, const fmprb_mat_t mat2)

Returns nonzero iff the matrices have the same dimensions and identical entries.

int fmprb_mat_contains_fmpq_mat(const fmprb_mat_t mat1, const fmpq_mat_t mat2)

Returns nonzero iff the matrices have the same dimensions and each entry in mat2 is contained in the corresponding entry in mat1.

Special matrices

void fmprb_mat_zero(fmprb_mat_t mat)

Sets all entries in mat to zero.

void fmprb_mat_one(fmprb_mat_t mat)

Sets the entries on the main diagonal to ones, and all other entries to zero.

Arithmetic

void fmprb_mat_neg(fmprb_mat_t dest, const fmprb_mat_t src)

Sets dest to the exact negation of src. The operands must have the same dimensions.

void fmprb_mat_add(fmprb_mat_t res, const fmprb_mat_t mat1, const fmprb_mat_t mat2, long prec)

Sets res to the sum of mat1 and mat2. The operands must have the same dimensions.

void fmprb_mat_sub(fmprb_mat_t res, const fmprb_mat_t mat1, const fmprb_mat_t mat2, long prec)

Sets res to the difference of mat1 and mat2. The operands must have the same dimensions.

void fmprb_mat_mul(fmprb_mat_t res, const fmprb_mat_t mat1, const fmprb_mat_t mat2, long prec)

Sets res to the matrix product of mat1 and mat2. The operands must have compatible dimensions for matrix multiplication.

Solving

int fmprb_mat_lu(long * perm, fmprb_mat_t LU, const fmprb_mat_t A, long prec)

Given an $n \times n$ matrix $A$, computes an LU decomposition $PLU = A$ using Gaussian elimination with partial pivoting. The input and output matrices can be the same, performing the decomposition in-place.

Entry $i$ in the permutation vector perm is set to the row index in the input matrix corresponding to row $i$ in the output matrix.

The algorithm succeeds and returns nonzero if it can find $n$ invertible (i.e. not containing zero) pivot entries. This guarantees that the matrix is invertible.

The algorithm fails and returns zero, leaving the entries in $P$ and $LU$ undefined, if it cannot find $n$ invertible pivot elements. In this case, either the matrix is singular, the input matrix was computed to insufficient precision, or the LU decomposition was attempted at insufficient precision.

void fmprb_mat_solve_lu_precomp(fmprb_mat_t X, const long * perm, const fmprb_mat_t LU, const fmprb_mat_t B, long prec)

Solves $AX = B$ given the precomputed nonsingular LU decomposition $A = PLU$. The matrices $X$ and $B$ are allowed to be aliased with each other, but $X$ is not allowed to be aliased with $LU$.

int fmprb_mat_solve(fmprb_mat_t X, const fmprb_mat_t A, const fmprb_mat_t B, long prec)

Solves $AX = B$ where $A$ is a nonsingular $n \times n$ matrix and $X$ and $B$ are $n \times m$ matrices, using LU decomposition.

If $m > 0$ and $A$ cannot be inverted numerically (indicating either that $A$ is singular or that the precision is insufficient), the values in the output matrix are left undefined and zero is returned. A nonzero return value guarantees that $A$ is invertible and that the exact solution matrix is contained in the output.

int fmprb_mat_inv(fmprb_mat_t X, const fmprb_mat_t A, long prec)

Sets $X = A^{-1}$ where $A$ is a square matrix, computed by solving the system $AX = I$.

If $A$ cannot be inverted numerically (indicating either that $A$ is singular or that the precision is insufficient), the values in the output matrix are left undefined and zero is returned. A nonzero return value guarantees that the matrix is invertible and that the exact inverse is contained in the output.

void fmprb_mat_det(fmprb_t det, const fmprb_mat_t A, long prec)

Computes the determinant of the matrix, using Gaussian elimination with partial pivoting. If at some point an invertible pivot element cannot be found, the elimination is stopped and the magnitude of the determinant of the remaining submatrix is bounded using Hadamard's inequality.


History


Credits

Arb is licensed GNU General Public License version 2, or any later version.

Fredrik's work on Arb is supported by Austrian Science Fund FWF Grant Y464-N18 (Fast Computer Algebra for Special Functions).

Arb includes code written by Bill Hart and Sebastian Pancratz for FLINT (also licensed GPL 2.0+).


Wishes or bug reports? Send any comments to fredrik.johansson@gmail.com