arb.h – real numbers¶
An arb_t
represents a ball over the real numbers,
that is, an interval \([m \pm r] \equiv [m-r, m+r]\) where the midpoint \(m\) and the
radius \(r\) are (extended) real numbers and \(r\) is nonnegative (possibly infinite).
The result of an (approximate) operation done on arb_t
variables
is a ball which contains the result of the (mathematically exact) operation
applied to any choice of points in the input balls.
In general, the output ball is not the smallest possible.
The precision parameter passed to each function roughly indicates the precision to which calculations on the midpoint are carried out (operations on the radius are always done using a fixed, small precision.)
For arithmetic operations, the precision parameter currently simply
specifies the precision of the corresponding arf_t
operation.
In the future, the arithmetic might be made faster by incorporating
sloppy rounding (typically equivalent to a loss of 1-2 bits of effective
working precision) when the result is known to be inexact (while still
propagating errors rigorously, of course).
Arithmetic operations done on exact input with exactly
representable output are always guaranteed to produce exact output.
For more complex operations, the precision parameter indicates a minimum working precision (algorithms might allocate extra internal precision to attempt to produce an output accurate to the requested number of bits, especially when the required precision can be estimated easily, but this is not generally required).
If the precision is increased and the inputs either are exact or are computed with increased accuracy as well, the output should converge proportionally, absent any bugs. The general intended strategy for using ball arithmetic is to add a few guard bits, and then repeat the calculation as necessary with an exponentially increasing number of guard bits (Ziv’s strategy) until the result is exact enough for one’s purposes (typically the first attempt will be successful).
The following balls with an infinite or NaN component are permitted, and may be returned as output from functions.
- The ball \([+\infty \pm c]\), where \(c\) is finite, represents the point at positive infinity. Such a ball can always be replaced by \([+\infty \pm 0]\) while preserving mathematical correctness (this is currently not done automatically by the library).
- The ball \([-\infty \pm c]\), where \(c\) is finite, represents the point at negative infinity. Such a ball can always be replaced by \([-\infty \pm 0]\) while preserving mathematical correctness (this is currently not done automatically by the library).
- The ball \([c \pm \infty]\), where \(c\) is finite or infinite, represents the whole extended real line \([-\infty,+\infty]\). Such a ball can always be replaced by \([0 \pm \infty]\) while preserving mathematical correctness (this is currently not done automatically by the library). Note that there is no way to represent a half-infinite interval such as \([0,\infty]\).
- The ball \([\operatorname{NaN} \pm c]\), where \(c\) is finite or infinite, represents an indeterminate value (the value could be any extended real number, or it could represent a function being evaluated outside its domain of definition, for example where the result would be complex). Such an indeterminate ball can always be replaced by \([\operatorname{NaN} \pm \infty]\) while preserving mathematical correctness (this is currently not done automatically by the library).
Types, macros and constants¶
-
arb_struct
¶
-
arb_t
¶ An
arb_struct
consists of anarf_struct
(the midpoint) and amag_struct
(the radius). Anarb_t
is defined as an array of length one of typearb_struct
, permitting anarb_t
to be passed by reference.
-
arb_ptr
¶ Alias for
arb_struct *
, used for vectors of numbers.
-
arb_srcptr
¶ Alias for
const arb_struct *
, used for vectors of numbers when passed as constant input to functions.
Memory management¶
-
void
arb_init
(arb_t x)¶ Initializes the variable x for use. Its midpoint and radius are both set to zero.
-
arb_ptr
_arb_vec_init
(slong n)¶ Returns a pointer to an array of n initialized
arb_struct
entries.
-
void
_arb_vec_clear
(arb_ptr v, slong n)¶ Clears an array of n initialized
arb_struct
entries.
Assignment and rounding¶
-
void
arb_set_round_fmpz
(arb_t y, const fmpz_t x, slong prec)¶ Sets y to the value of x, rounded to prec bits.
-
void
arb_set_round_fmpz_2exp
(arb_t y, const fmpz_t x, const fmpz_t e, slong prec)¶ Sets y to \(x \cdot 2^e\), rounded to prec bits.
-
void
arb_set_fmpq
(arb_t y, const fmpq_t x, slong prec)¶ Sets y to the rational number x, rounded to prec bits.
-
int
arb_set_str
(arb_t res, const char * inp, slong prec)¶ Sets res to the value specified by the human-readable string inp. The input may be a decimal floating-point literal, such as “25”, “0.001”, “7e+141” or “-31.4159e-1”, and may also consist of two such literals separated by the symbol “+/-” and optionally enclosed in brackets, e.g. “[3.25 +/- 0.0001]”, or simply “[+/- 10]” with an implicit zero midpoint. The output is rounded to prec bits, and if the binary-to-decimal conversion is inexact, the resulting error is added to the radius.
The symbols “inf” and “nan” are recognized (a nan midpoint results in an indeterminate interval, with infinite radius).
Returns 0 if successful and nonzero if unsuccessful. If unsuccessful, the result is set to an indeterminate interval.
-
char *
arb_get_str
(const arb_t x, slong n, ulong flags)¶ Returns a nice human-readable representation of x, with at most n digits of the midpoint printed.
With default flags, the output can be parsed back with
arb_set_str()
, and this is guaranteed to produce an interval containing the original interval x.By default, the output is rounded so that the value given for the midpoint is correct up to 1 ulp (unit in the last decimal place).
If ARB_STR_MORE is added to flags, more (possibly incorrect) digits may be printed.
If ARB_STR_NO_RADIUS is added to flags, the radius is not included in the output if at least 1 digit of the midpoint can be printed.
By adding a multiple m of ARB_STR_CONDENSE to flags, strings of more than three times m consecutive digits are condensed, only printing the leading and trailing m digits along with brackets indicating the number of digits omitted (useful when computing values to extremely high precision).
Assignment of special values¶
Input and output¶
-
void
arb_printd
(const arb_t x, slong digits)¶ Prints x in decimal. The printed value of the radius is not adjusted to compensate for the fact that the binary-to-decimal conversion of both the midpoint and the radius introduces additional error.
-
void
arb_printn
(const arb_t x, slong digits, ulong flags)¶ Prints a nice decimal representation of x. By default, the output is guaranteed to be correct to within one unit in the last digit. An error bound is also printed explicitly. See
arb_get_str()
for details.
-
void
arb_fprint
(FILE * file, const arb_t x)¶ Prints the internal representation of x to the stream file.
-
void
arb_fprintd
(FILE * file, const arb_t x, slong digits)¶ Prints x in decimal to the stream file. The printed value of the radius is not adjusted to compensate for the fact that the binary-to-decimal conversion of both the midpoint and the radius introduces additional error.
-
void
arb_fprintn
(FILE * file, const arb_t x, slong digits, ulong flags)¶ Prints a nice decimal representation of x to the stream file. By default, the output is guaranteed to be correct to within one unit in the last digit. An error bound is also printed explicitly. See
arb_get_str()
for details.
Random number generation¶
-
void
arb_randtest
(arb_t x, flint_rand_t state, slong prec, slong mag_bits)¶ Generates a random ball. The midpoint and radius will both be finite.
-
void
arb_randtest_exact
(arb_t x, flint_rand_t state, slong prec, slong mag_bits)¶ Generates a random number with zero radius.
-
void
arb_randtest_precise
(arb_t x, flint_rand_t state, slong prec, slong mag_bits)¶ Generates a random number with radius around \(2^{-\text{prec}}\) the magnitude of the midpoint.
-
void
arb_randtest_wide
(arb_t x, flint_rand_t state, slong prec, slong mag_bits)¶ Generates a random number with midpoint and radius chosen independently, possibly giving a very large interval.
-
void
arb_randtest_special
(arb_t x, flint_rand_t state, slong prec, slong mag_bits)¶ Generates a random interval, possibly having NaN or an infinity as the midpoint and possibly having an infinite radius.
-
void
arb_get_rand_fmpq
(fmpq_t q, flint_rand_t state, const arb_t x, slong bits)¶ Sets q to a random rational number from the interval represented by x. A denominator is chosen by multiplying the binary denominator of x by a random integer up to bits bits.
The outcome is undefined if the midpoint or radius of x is non-finite, or if the exponent of the midpoint or radius is so large or small that representing the endpoints as exact rational numbers would cause overflows.
Radius and interval operations¶
-
void
arb_add_error
(arb_t x, const arb_t err)¶ Adds the absolute value of err to the radius of x (the operation is done in-place).
-
void
arb_union
(arb_t z, const arb_t x, const arb_t y, slong prec)¶ Sets z to a ball containing both x and y.
-
int
arb_intersection
(arb_t z, const arb_t x, const arb_t y, slong prec)¶ If x and y overlap according to
arb_overlaps()
, then z is set to a ball containing the intersection of x and y and a nonzero value is returned. Otherwise zero is returned and the value of z is undefined. If x or y contains NaN, the result is NaN.
-
void
arb_get_abs_ubound_arf
(arf_t u, const arb_t x, slong prec)¶ Sets u to the upper bound for the absolute value of x, rounded up to prec bits. If x contains NaN, the result is NaN.
-
void
arb_get_abs_lbound_arf
(arf_t u, const arb_t x, slong prec)¶ Sets u to the lower bound for the absolute value of x, rounded down to prec bits. If x contains NaN, the result is NaN.
-
void
arb_get_ubound_arf
(arf_t u, const arb_t x, long prec)¶ Sets u to the upper bound for the value of x, rounded up to prec bits. If x contains NaN, the result is NaN.
-
void
arb_get_lbound_arf
(arf_t u, const arb_t x, long prec)¶ Sets u to the lower bound for the value of x, rounded down to prec bits. If x contains NaN, the result is NaN.
-
void
arb_get_mag
(mag_t z, const arb_t x)¶ Sets z to an upper bound for the absolute value of x. If x contains NaN, the result is positive infinity.
-
void
arb_get_mag_lower
(mag_t z, const arb_t x)¶ Sets z to a lower bound for the absolute value of x. If x contains NaN, the result is zero.
-
void
arb_get_mag_lower_nonnegative
(mag_t z, const arb_t x)¶ Sets z to a lower bound for the signed value of x, or zero if x overlaps with the negative half-axis. If x contains NaN, the result is zero.
-
void
arb_get_interval_fmpz_2exp
(fmpz_t a, fmpz_t b, fmpz_t exp, const arb_t x)¶ Computes the exact interval represented by x, in the form of an integer interval multiplied by a power of two, i.e. \(x = [a, b] \times 2^{\text{exp}}\). The result is normalized by removing common trailing zeros from a and b.
This method aborts if x is infinite or NaN, or if the difference between the exponents of the midpoint and the radius is so large that allocating memory for the result fails.
Warning: this method will allocate a huge amount of memory to store the result if the exponent difference is huge. Memory allocation could succeed even if the required space is far larger than the physical memory available on the machine, resulting in swapping. It is recommended to check that the midpoint and radius of x both are within a reasonable range before calling this method.
-
void
arb_set_interval_mpfr
(arb_t x, const mpfr_t a, const mpfr_t b, slong prec)¶ Sets x to a ball containing the interval \([a, b]\). We require that \(a \le b\).
-
void
arb_get_interval_mpfr
(mpfr_t a, mpfr_t b, const arb_t x)¶ Constructs an interval \([a, b]\) containing the ball x. The MPFR version uses the precision of the output variables.
-
slong
arb_rel_error_bits
(const arb_t x)¶ Returns the effective relative error of x measured in bits, defined as the difference between the position of the top bit in the radius and the top bit in the midpoint, plus one. The result is clamped between plus/minus ARF_PREC_EXACT.
-
slong
arb_rel_accuracy_bits
(const arb_t x)¶ Returns the effective relative accuracy of x measured in bits, equal to the negative of the return value from
arb_rel_error_bits()
.
-
slong
arb_bits
(const arb_t x)¶ Returns the number of bits needed to represent the absolute value of the mantissa of the midpoint of x, i.e. the minimum precision sufficient to represent x exactly. Returns 0 if the midpoint of x is a special value.
-
void
arb_trim
(arb_t y, const arb_t x)¶ Sets y to a trimmed copy of x: rounds x to a number of bits equal to the accuracy of x (as indicated by its radius), plus a few guard bits. The resulting ball is guaranteed to contain x, but is more economical if x has less than full accuracy.
-
int
arb_get_unique_fmpz
(fmpz_t z, const arb_t x)¶ If x contains a unique integer, sets z to that value and returns nonzero. Otherwise (if x represents no integers or more than one integer), returns zero.
This method aborts if there is a unique integer but that integer is so large that allocating memory for the result fails.
Warning: this method will allocate a huge amount of memory to store the result if there is a unique integer and that integer is huge. Memory allocation could succeed even if the required space is far larger than the physical memory available on the machine, resulting in swapping. It is recommended to check that the midpoint of x is within a reasonable range before calling this method.
-
void
arb_ceil
(arb_t y, const arb_t x, slong prec)¶ Sets y to a ball containing \(\lfloor x \rfloor\) and \(\lceil x \rceil\) respectively, with the midpoint of y rounded to at most prec bits.
-
void
arb_get_fmpz_mid_rad_10exp
(fmpz_t mid, fmpz_t rad, fmpz_t exp, const arb_t x, slong n)¶ Assuming that x is finite and not exactly zero, computes integers mid, rad, exp such that \(x \in [m-r, m+r] \times 10^e\) and such that the larger out of mid and rad has at least n digits plus a few guard digits. If x is infinite or exactly zero, the outputs are all set to zero.
-
int
arb_can_round_mpfr
(const arb_t x, slong prec, mpfr_rnd_t rnd)¶ Returns nonzero if rounding the midpoint of x to prec bits in the direction rnd is guaranteed to give the unique correctly rounded floating-point approximation for the real number represented by x.
In other words, if this function returns nonzero, applying
arf_set_round()
, orarf_get_mpfr()
, orarf_get_d()
to the midpoint of x is guaranteed to return a correctly rounded arf_t, mpfr_t (provided that prec is the precision of the output variable), or double (provided that prec is 53). Moreover,arf_get_mpfr()
is guaranteed to return the correct ternary value according to MPFR semantics.Note that the mpfr version of this function takes an MPFR rounding mode symbol as input, while the arf version takes an arf rounding mode symbol. Otherwise, the functions are identical.
This function may perform a fast, inexact test; that is, it may return zero in some cases even when correct rounding actually is possible.
To be conservative, zero is returned when x is non-finite, even if it is an “exact” infinity.
Comparisons¶
-
int
arb_is_nonzero
(const arb_t x)¶ Returns nonzero iff zero is not contained in the interval represented by x.
-
int
arb_is_finite
(const arb_t x)¶ Returns nonzero iff the midpoint and radius of x are both finite floating-point numbers, i.e. not infinities or NaN.
-
int
arb_equal
(const arb_t x, const arb_t y)¶ Returns nonzero iff x and y are equal as balls, i.e. have both the same midpoint and radius.
Note that this is not the same thing as testing whether both x and y certainly represent the same real number, unless either x or y is exact (and neither contains NaN). To test whether both operands might represent the same mathematical quantity, use
arb_overlaps()
orarb_contains()
, depending on the circumstance.
-
int
arb_is_nonpositive
(const arb_t x)¶ Returns nonzero iff all points p in the interval represented by x satisfy, respectively, \(p > 0\), \(p \ge 0\), \(p < 0\), \(p \le 0\). If x contains NaN, returns zero.
-
int
arb_overlaps
(const arb_t x, const arb_t y)¶ Returns nonzero iff x and y have some point in common. If either x or y contains NaN, this function always returns nonzero (as a NaN could be anything, it could in particular contain any number that is included in the other operand).
-
int
arb_contains
(const arb_t x, const arb_t y)¶ Returns nonzero iff the given number (or ball) y is contained in the interval represented by x.
If x is contains NaN, this function always returns nonzero (as it could represent anything, and in particular could represent all the points included in y). If y contains NaN and x does not, it always returns zero.
-
int
arb_contains_int
(const arb_t x)¶ Returns nonzero iff the interval represented by x contains an integer.
-
int
arb_contains_nonnegative
(const arb_t x)¶ Returns nonzero iff there is any point p in the interval represented by x satisfying, respectively, \(p = 0\), \(p < 0\), \(p \le 0\), \(p > 0\), \(p \ge 0\). If x contains NaN, returns nonzero.
-
int
arb_ge
(const arb_t x, const arb_t y)¶ Respectively performs the comparison \(x = y\), \(x \ne y\), \(x < y\), \(x \le y\), \(x > y\), \(x \ge y\) in a mathematically meaningful way. If the comparison \(t \, (\operatorname{op}) \, u\) holds for all \(t \in x\) and all \(u \in y\), returns 1. Otherwise, returns 0.
The balls x and y are viewed as subintervals of the extended real line. Note that balls that are formally different can compare as equal under this definition: for example, \([-\infty \pm 3] = [-\infty \pm 0]\). Also \([-\infty] \le [\infty \pm \infty]\).
The output is always 0 if either input has NaN as midpoint.
Arithmetic¶
-
void
arb_abs
(arb_t y, const arb_t x)¶ Sets y to the absolute value of x. No attempt is made to improve the interval represented by x if it contains zero.
-
void
arb_sgn
(arb_t y, const arb_t x)¶ Sets y to the sign function of x. The result is \([0 \pm 1]\) if x contains both zero and nonzero numbers.
-
void
arb_max
(arb_t z, const arb_t x, const arb_t y, slong prec)¶ Sets z respectively to the minimum and the maximum of x and y.
-
void
arb_add_fmpz
(arb_t z, const arb_t x, const fmpz_t y, slong prec)¶ Sets \(z = x + y\), rounded to prec bits. The precision can be ARF_PREC_EXACT provided that the result fits in memory.
-
void
arb_add_fmpz_2exp
(arb_t z, const arb_t x, const fmpz_t m, const fmpz_t e, slong prec)¶ Sets \(z = x + m \cdot 2^e\), rounded to prec bits. The precision can be ARF_PREC_EXACT provided that the result fits in memory.
-
void
arb_sub_fmpz
(arb_t z, const arb_t x, const fmpz_t y, slong prec)¶ Sets \(z = x - y\), rounded to prec bits. The precision can be ARF_PREC_EXACT provided that the result fits in memory.
-
void
arb_mul_fmpz
(arb_t z, const arb_t x, const fmpz_t y, slong prec)¶ Sets \(z = x \cdot y\), rounded to prec bits. The precision can be ARF_PREC_EXACT provided that the result fits in memory.
-
void
arb_addmul_fmpz
(arb_t z, const arb_t x, const fmpz_t y, slong prec)¶ Sets \(z = z + x \cdot y\), rounded to prec bits. The precision can be ARF_PREC_EXACT provided that the result fits in memory.
-
void
arb_submul_fmpz
(arb_t z, const arb_t x, const fmpz_t y, slong prec)¶ Sets \(z = z - x \cdot y\), rounded to prec bits. The precision can be ARF_PREC_EXACT provided that the result fits in memory.
-
void
arb_ui_div
(arb_t z, ulong x, const arb_t y, slong prec)¶ Sets \(z = x / y\), rounded to prec bits. If y contains zero, z is set to \(0 \pm \infty\). Otherwise, error propagation uses the rule
\[\left| \frac{x}{y} - \frac{x+\xi_1 a}{y+\xi_2 b} \right| = \left|\frac{x \xi_2 b - y \xi_1 a}{y (y+\xi_2 b)}\right| \le \frac{|xb|+|ya|}{|y| (|y|-b)}\]where \(-1 \le \xi_1, \xi_2 \le 1\), and where the triangle inequality has been applied to the numerator and the reverse triangle inequality has been applied to the denominator.
Powers and roots¶
-
void
arb_sqrt_ui
(arb_t z, ulong x, slong prec)¶ Sets z to the square root of x, rounded to prec bits.
If \(x = m \pm x\) where \(m \ge r \ge 0\), the propagated error is bounded by \(\sqrt{m} - \sqrt{m-r} = \sqrt{m} (1 - \sqrt{1 - r/m}) \le \sqrt{m} (r/m + (r/m)^2)/2\).
-
void
arb_sqrtpos
(arb_t z, const arb_t x, slong prec)¶ Sets z to the square root of x, assuming that x represents a nonnegative number (i.e. discarding any negative numbers in the input interval).
-
void
arb_rsqrt_ui
(arb_t z, ulong x, slong prec)¶ Sets z to the reciprocal square root of x, rounded to prec bits. At high precision, this is faster than computing a square root.
-
void
arb_sqrt1pm1
(arb_t z, const arb_t x, slong prec)¶ Sets \(z = \sqrt{1+x}-1\), computed accurately when \(x \approx 0\).
-
void
arb_root_ui
(arb_t z, const arb_t x, ulong k, slong prec)¶ Sets z to the k-th root of x, rounded to prec bits. This function selects between different algorithms. For large k, it evaluates \(\exp(\log(x)/k)\). For small k, it uses
arf_root()
at the midpoint and computes a propagated error bound as follows: if input interval is \([m-r, m+r]\) with \(r \le m\), the error is largest at \(m-r\) where it satisfies\[ \begin{align}\begin{aligned}m^{1/k} - (m-r)^{1/k} = m^{1/k} [1 - (1-r/m)^{1/k}]\\= m^{1/k} [1 - \exp(\log(1-r/m)/k)]\\\le m^{1/k} \min(1, -\log(1-r/m)/k)\\= m^{1/k} \min(1, \log(1+r/(m-r))/k).\end{aligned}\end{align} \]This is evaluated using
mag_log1p()
.
-
void
arb_root
(arb_t z, const arb_t x, ulong k, slong prec)¶ Alias for
arb_root_ui()
, provided for backwards compatibility.
-
void
arb_si_pow_ui
(arb_t y, slong b, ulong e, slong prec)¶ Sets \(y = b^e\) using binary exponentiation (with an initial division if \(e < 0\)). Provided that b and e are small enough and the exponent is positive, the exact power can be computed by setting the precision to ARF_PREC_EXACT.
Note that these functions can get slow if the exponent is extremely large (in such cases
arb_pow()
may be superior).
-
void
arb_pow_fmpq
(arb_t y, const arb_t x, const fmpq_t a, slong prec)¶ Sets \(y = b^e\), computed as \(y = (b^{1/q})^p\) if the denominator of \(e = p/q\) is small, and generally as \(y = \exp(e \log b)\).
Note that this function can get slow if the exponent is extremely large (in such cases
arb_pow()
may be superior).
Exponentials and logarithms¶
-
void
arb_log
(arb_t z, const arb_t x, slong prec)¶ Sets \(z = \log(x)\).
At low to medium precision (up to about 4096 bits),
arb_log_arf()
uses table-based argument reduction and fast Taylor series evaluation via_arb_atan_taylor_rs()
. At high precision, it falls back to MPFR. The functionarb_log()
simply callsarb_log_arf()
with the midpoint as input, and separately adds the propagated error.
-
void
arb_log_ui_from_prev
(arb_t log_k1, ulong k1, arb_t log_k0, ulong k0, slong prec)¶ Computes \(\log(k_1)\), given \(\log(k_0)\) where \(k_0 < k_1\). At high precision, this function uses the formula \(\log(k_1) = \log(k_0) + 2 \operatorname{atanh}((k_1-k_0)/(k_1+k_0))\), evaluating the inverse hyperbolic tangent using binary splitting (for best efficiency, \(k_0\) should be large and \(k_1 - k_0\) should be small). Otherwise, it ignores \(\log(k_0)\) and evaluates the logarithm the usual way.
-
void
arb_log1p
(arb_t z, const arb_t x, slong prec)¶ Sets \(z = \log(1+x)\), computed accurately when \(x \approx 0\).
-
void
arb_log_base_ui
(arb_t res, const arb_t x, ulong b, slong prec)¶ Sets res to \(\log_b(x)\). The result is computed exactly when possible.
-
void
arb_exp
(arb_t z, const arb_t x, slong prec)¶ Sets \(z = \exp(x)\). Error propagation is done using the following rule: assuming \(x = m \pm r\), the error is largest at \(m + r\), and we have \(\exp(m+r) - \exp(m) = \exp(m) (\exp(r)-1) \le r \exp(m+r)\).
Trigonometric functions¶
-
void
arb_sin_cos
(arb_t s, arb_t c, const arb_t x, slong prec)¶ Sets \(s = \sin(x)\), \(c = \cos(x)\). Error propagation uses the rule \(|\sin(m \pm r) - \sin(m)| \le \min(r,2)\).
-
void
arb_sin_cos_pi
(arb_t s, arb_t c, const arb_t x, slong prec)¶ Sets \(s = \sin(\pi x)\), \(c = \cos(\pi x)\).
-
void
arb_cos_pi_fmpq
(arb_t c, const fmpq_t x, slong prec)¶ Sets \(s = \sin(\pi x)\), \(c = \cos(\pi x)\) where \(x\) is a rational number (whose numerator and denominator are assumed to be reduced). We first use trigonometric symmetries to reduce the argument to the octant \([0, 1/4]\). Then we either multiply by a numerical approximation of \(\pi\) and evaluate the trigonometric function the usual way, or we use algebraic methods, depending on which is estimated to be faster. Since the argument has been reduced to the first octant, the first of these two methods gives full accuracy even if the original argument is close to some root other the origin.
Inverse trigonometric functions¶
-
void
arb_atan
(arb_t z, const arb_t x, slong prec)¶ Sets \(z = \operatorname{atan}(x)\).
At low to medium precision (up to about 4096 bits),
arb_atan_arf()
uses table-based argument reduction and fast Taylor series evaluation via_arb_atan_taylor_rs()
. At high precision, it falls back to MPFR. The functionarb_atan()
simply callsarb_atan_arf()
with the midpoint as input, and separately adds the propagated error.The function
arb_atan_arf()
uses lookup tables if possible, and otherwise falls back toarb_atan_arf_bb()
.
-
void
arb_atan2
(arb_t z, const arb_t b, const arb_t a, slong prec)¶ Sets r to an the argument (phase) of the complex number \(a + bi\), with the branch cut discontinuity on \((-\infty,0]\). We define \(\operatorname{atan2}(0,0) = 0\), and for \(a < 0\), \(\operatorname{atan2}(0,a) = \pi\).
Hyperbolic functions¶
-
void
arb_sinh_cosh
(arb_t s, arb_t c, const arb_t x, slong prec)¶ Sets \(s = \sinh(x)\), \(c = \cosh(x)\). If the midpoint of \(x\) is close to zero and the hyperbolic sine is to be computed, evaluates \((e^{2x}\pm1) / (2e^x)\) via
arb_expm1()
to avoid loss of accuracy. Otherwise evaluates \((e^x \pm e^{-x}) / 2\).
-
void
arb_tanh
(arb_t y, const arb_t x, slong prec)¶ Sets \(y = \tanh(x) = \sinh(x) / \cosh(x)\), evaluated via
arb_expm1()
as \(\tanh(x) = (e^{2x} - 1) / (e^{2x} + 1)\) if \(|x|\) is small, and as \(\tanh(\pm x) = 1 - 2 e^{\mp 2x} / (1 + e^{\mp 2x})\) if \(|x|\) is large.
-
void
arb_coth
(arb_t y, const arb_t x, slong prec)¶ Sets \(y = \coth(x) = \cosh(x) / \sinh(x)\), evaluated using the same strategy as
arb_tanh()
.
Inverse hyperbolic functions¶
Constants¶
The following functions cache the computed values to speed up repeated calls at the same or lower precision. For further implementation details, see Algorithms for mathematical constants.
-
void
arb_const_euler
(arb_t z, slong prec)¶ Computes Euler’s constant \(\gamma = \lim_{k \rightarrow \infty} (H_k - \log k)\) where \(H_k = 1 + 1/2 + \ldots + 1/k\).
-
void
arb_const_catalan
(arb_t z, slong prec)¶ Computes Catalan’s constant \(C = \sum_{n=0}^{\infty} (-1)^n / (2n+1)^2\).
Gamma function and factorials¶
-
void
arb_rising
(arb_t z, const arb_t x, const arb_t n, slong prec)¶ Computes the rising factorial \(z = x (x+1) (x+2) \cdots (x+n-1)\).
The bs version uses binary splitting. The rs version uses rectangular splitting. The rec version uses either bs or rs depending on the input. The default version uses the gamma function unless n is a small integer.
The rs version takes an optional step parameter for tuning purposes (to use the default step length, pass zero).
-
void
arb_rising_fmpq_ui
(arb_t z, const fmpq_t x, ulong n, slong prec)¶ Computes the rising factorial \(z = x (x+1) (x+2) \cdots (x+n-1)\) using binary splitting. If the denominator or numerator of x is large compared to prec, it is more efficient to convert x to an approximation and use
arb_rising_ui()
.
-
void
arb_rising2_ui
(arb_t u, arb_t v, const arb_t x, ulong n, slong prec)¶ Letting \(u(x) = x (x+1) (x+2) \cdots (x+n-1)\), simultaneously compute \(u(x)\) and \(v(x) = u'(x)\), respectively using binary splitting, rectangular splitting (with optional nonzero step length step to override the default choice), and an automatic algorithm choice.
-
void
arb_fac_ui
(arb_t z, ulong n, slong prec)¶ Computes the factorial \(z = n!\) via the gamma function.
-
void
arb_doublefac_ui
(arb_t z, ulong n, slong prec)¶ Computes the double factorial \(z = n!!\) via the gamma function.
-
void
arb_bin_uiui
(arb_t z, ulong n, ulong k, slong prec)¶ Computes the binomial coefficient \(z = {n \choose k}\), via the rising factorial as \({n \choose k} = (n-k+1)_k / k!\).
-
void
arb_gamma_fmpz
(arb_t z, const fmpz_t x, slong prec)¶ Computes the gamma function \(z = \Gamma(x)\).
-
void
arb_lgamma
(arb_t z, const arb_t x, slong prec)¶ Computes the logarithmic gamma function \(z = \log \Gamma(x)\). The complex branch structure is assumed, so if \(x \le 0\), the result is an indeterminate interval.
Zeta function¶
-
void
arb_zeta_ui_vec_borwein
(arb_ptr z, ulong start, slong num, ulong step, slong prec)¶ Evaluates \(\zeta(s)\) at \(\mathrm{num}\) consecutive integers s beginning with start and proceeding in increments of step. Uses Borwein’s formula ([Bor2000], [GS2003]), implemented to support fast multi-evaluation (but also works well for a single s).
Requires \(\mathrm{start} \ge 2\). For efficiency, the largest s should be at most about as large as prec. Arguments approaching LONG_MAX will cause overflows. One should therefore only use this function for s up to about prec, and then switch to the Euler product.
The algorithm for single s is basically identical to the one used in MPFR (see [MPFR2012] for a detailed description). In particular, we evaluate the sum backwards to avoid storing more than one \(d_k\) coefficient, and use integer arithmetic throughout since it is convenient and the terms turn out to be slightly larger than \(2^\mathrm{prec}\). The only numerical error in the main loop comes from the division by \(k^s\), which adds less than 1 unit of error per term. For fast multi-evaluation, we repeatedly divide by \(k^{\mathrm{step}}\). Each division reduces the input error and adds at most 1 unit of additional rounding error, so by induction, the error per term is always smaller than 2 units.
-
void
arb_zeta_ui_euler_product
(arb_t z, ulong s, slong prec)¶ Computes \(\zeta(s)\) using the Euler product. This is fast only if s is large compared to the precision. Both methods are trivial wrappers for
_acb_dirichlet_euler_product_real_ui()
.
-
void
arb_zeta_ui_bernoulli
(arb_t x, ulong s, slong prec)¶ Computes \(\zeta(s)\) for even s via the corresponding Bernoulli number.
-
void
arb_zeta_ui_borwein_bsplit
(arb_t x, ulong s, slong prec)¶ Computes \(\zeta(s)\) for arbitrary \(s \ge 2\) using a binary splitting implementation of Borwein’s algorithm. This has quasilinear complexity with respect to the precision (assuming that \(s\) is fixed).
-
void
arb_zeta_ui_vec_odd
(arb_ptr x, ulong start, slong num, slong prec)¶ Computes \(\zeta(s)\) at num consecutive integers (respectively num even or num odd integers) beginning with \(s = \mathrm{start} \ge 2\), automatically choosing an appropriate algorithm.
-
void
arb_zeta_ui
(arb_t x, ulong s, slong prec)¶ Computes \(\zeta(s)\) for nonnegative integer \(s \ne 1\), automatically choosing an appropriate algorithm. This function is intended for numerical evaluation of isolated zeta values; for multi-evaluation, the vector versions are more efficient.
-
void
arb_zeta
(arb_t z, const arb_t s, slong prec)¶ Sets z to the value of the Riemann zeta function \(\zeta(s)\).
For computing derivatives with respect to \(s\), use
arb_poly_zeta_series()
.
Bernoulli numbers and polynomials¶
-
void
arb_bernoulli_fmpz
(arb_t b, const fmpz_t n, slong prec)¶ Sets \(b\) to the numerical value of the Bernoulli number \(B_n\) approximated to prec bits.
The internal precision is increased automatically to give an accurate result. Note that, with huge fmpz input, the output will have a huge exponent and evaluation will accordingly be slower.
A single division from the exact fraction of \(B_n\) is used if this value is in the global cache or the exact numerator roughly is larger than prec bits. Otherwise, the Riemann zeta function is used (see
arb_bernoulli_ui_zeta()
).This function reads \(B_n\) from the global cache if the number is already cached, but does not automatically extend the cache by itself.
-
void
arb_bernoulli_ui_zeta
(arb_t b, ulong n, slong prec)¶ Sets \(b\) to the numerical value of \(B_n\) accurate to prec bits, computed using the formula \(B_{2n} = (-1)^{n+1} 2 (2n)! \zeta(2n) / (2 \pi)^n\).
To avoid potential infinite recursion, we explicitly call the Euler product implementation of the zeta function. This method will only give high accuracy if the precision is small enough compared to \(n\) for the Euler product to converge rapidly.
-
void
arb_bernoulli_poly_ui
(arb_t res, ulong n, const arb_t x, slong prec)¶ Sets res to the value of the Bernoulli polynomial \(B_n(x)\).
Warning: this function is only fast if either n or x is a small integer.
This function reads Bernoulli numbers from the global cache if they are already cached, but does not automatically extend the cache by itself.
-
void
arb_power_sum_vec
(arb_ptr res, const arb_t a, const arb_t b, slong len, slong prec)¶ For n from 0 to len - 1, sets entry n in the output vector res to
\[S_n(a,b) = \frac{1}{n+1}\left(B_{n+1}(b) - B_{n+1}(a)\right)\]where \(B_n(x)\) is a Bernoulli polynomial. If a and b are integers and \(b \ge a\), this is equivalent to
\[S_n(a,b) = \sum_{k=a}^{b-1} k^n.\]The computation uses the generating function for Bernoulli polynomials.
Polylogarithms¶
Other special functions¶
-
void
arb_fib_ui
(arb_t z, ulong n, slong prec)¶ Computes the Fibonacci number \(F_n\). Uses the binary squaring algorithm described in [Tak2000]. Provided that n is small enough, an exact Fibonacci number can be computed by setting the precision to ARF_PREC_EXACT.
-
void
arb_agm
(arb_t z, const arb_t x, const arb_t y, slong prec)¶ Sets z to the arithmetic-geometric mean of x and y.
-
void
arb_chebyshev_u_ui
(arb_t a, ulong n, const arb_t x, slong prec)¶ Evaluates the Chebyshev polynomial of the first kind \(a = T_n(x)\) or the Chebyshev polynomial of the second kind \(a = U_n(x)\).
-
void
arb_chebyshev_u2_ui
(arb_t a, arb_t b, ulong n, const arb_t x, slong prec)¶ Simultaneously evaluates \(a = T_n(x), b = T_{n-1}(x)\) or \(a = U_n(x), b = U_{n-1}(x)\). Aliasing between a, b and x is not permitted.
-
void
arb_bell_sum_bsplit
(arb_t res, const fmpz_t n, const fmpz_t a, const fmpz_t b, const fmpz_t mmag, slong prec)¶
-
void
arb_bell_sum_taylor
(arb_t res, const fmpz_t n, const fmpz_t a, const fmpz_t b, const fmpz_t mmag, slong prec)¶ Helper functions for Bell numbers, evaluating the sum \(\sum_{k=a}^{b-1} k^n / k!\). If mmag is non-NULL, it may be used to indicate that the target error tolerance should be \(2^{mmag - prec}\).
-
void
arb_bell_ui
(arb_t res, ulong n, slong prec)¶ Sets res to the Bell number \(B_n\). If the number is too large to fit exactly in prec bits, a numerical approximation is computed efficiently.
The algorithm to compute Bell numbers, including error analysis, is described in detail in [Joh2015].
-
void
arb_euler_number_ui
(arb_t res, ulong n, slong prec)¶ Sets res to the Euler number \(E_n\), which is defined by having the exponential generating function \(1 / \cosh(x)\).
The Euler product for the Dirichlet beta function (
_acb_dirichlet_euler_product_real_ui()
) is used to compute a numerical approximation. If prec is more than enough to represent the result exactly, the exact value is automatically determined from a lower-precision approximation.
-
void
arb_partitions_ui
(arb_t res, ulong n, slong prec)¶ Sets res to the partition function \(p(n)\). When n is large and \(\log_2 p(n)\) is more than twice prec, the leading term in the Hardy-Ramanujan asymptotic series is used together with an error bound. Otherwise, the exact value is computed and rounded.
Internals for computing elementary functions¶
-
void
_arb_atan_taylor_naive
(mp_ptr y, mp_limb_t * error, mp_srcptr x, mp_size_t xn, ulong N, int alternating)¶
-
void
_arb_atan_taylor_rs
(mp_ptr y, mp_limb_t * error, mp_srcptr x, mp_size_t xn, ulong N, int alternating)¶ Computes an approximation of \(y = \sum_{k=0}^{N-1} x^{2k+1} / (2k+1)\) (if alternating is 0) or \(y = \sum_{k=0}^{N-1} (-1)^k x^{2k+1} / (2k+1)\) (if alternating is 1). Used internally for computing arctangents and logarithms. The naive version uses the forward recurrence, and the rs version uses a division-avoiding rectangular splitting scheme.
Requires \(N \le 255\), \(0 \le x \le 1/16\), and xn positive. The input x and output y are fixed-point numbers with xn fractional limbs. A bound for the ulp error is written to error.
-
void
_arb_exp_taylor_rs
(mp_ptr y, mp_limb_t * error, mp_srcptr x, mp_size_t xn, ulong N)¶ Computes an approximation of \(y = \sum_{k=0}^{N-1} x^k / k!\). Used internally for computing exponentials. The naive version uses the forward recurrence, and the rs version uses a division-avoiding rectangular splitting scheme.
Requires \(N \le 287\), \(0 \le x \le 1/16\), and xn positive. The input x is a fixed-point number with xn fractional limbs, and the output y is a fixed-point number with xn fractional limbs plus one extra limb for the integer part of the result.
A bound for the ulp error is written to error.
-
void
_arb_sin_cos_taylor_naive
(mp_ptr ysin, mp_ptr ycos, mp_limb_t * error, mp_srcptr x, mp_size_t xn, ulong N)¶
-
void
_arb_sin_cos_taylor_rs
(mp_ptr ysin, mp_ptr ycos, mp_limb_t * error, mp_srcptr x, mp_size_t xn, ulong N, int sinonly, int alternating)¶ Computes approximations of \(y_s = \sum_{k=0}^{N-1} (-1)^k x^{2k+1} / (2k+1)!\) and \(y_c = \sum_{k=0}^{N-1} (-1)^k x^{2k} / (2k)!\). Used internally for computing sines and cosines. The naive version uses the forward recurrence, and the rs version uses a division-avoiding rectangular splitting scheme.
Requires \(N \le 143\), \(0 \le x \le 1/16\), and xn positive. The input x and outputs ysin, ycos are fixed-point numbers with xn fractional limbs. A bound for the ulp error is written to error.
If sinonly is 1, only the sine is computed; if sinonly is 0 both the sine and cosine are computed. To compute sin and cos, alternating should be 1. If alternating is 0, the hyperbolic sine is computed (this is currently only intended to be used together with sinonly).
-
int
_arb_get_mpn_fixed_mod_log2
(mp_ptr w, fmpz_t q, mp_limb_t * error, const arf_t x, mp_size_t wn)¶ Attempts to write \(w = x - q \log(2)\) with \(0 \le w < \log(2)\), where w is a fixed-point number with wn limbs and ulp error error. Returns success.
-
int
_arb_get_mpn_fixed_mod_pi4
(mp_ptr w, fmpz_t q, int * octant, mp_limb_t * error, const arf_t x, mp_size_t wn)¶ Attempts to write \(w = |x| - q \pi/4\) with \(0 \le w < \pi/4\), where w is a fixed-point number with wn limbs and ulp error error. Returns success.
The value of q mod 8 is written to octant. The output variable q can be NULL, in which case the full value of q is not stored.
-
slong
_arb_exp_taylor_bound
(slong mag, slong prec)¶ Returns n such that \(\left|\sum_{k=n}^{\infty} x^k / k!\right| \le 2^{-\mathrm{prec}}\), assuming \(|x| \le 2^{\mathrm{mag}} \le 1/4\).
-
void
arb_exp_arf_bb
(arb_t z, const arf_t x, slong prec, int m1)¶ Computes the exponential function using the bit-burst algorithm. If m1 is nonzero, the exponential function minus one is computed accurately.
Aborts if x is extremely small or large (where another algorithm should be used).
For large x, repeated halving is used. In fact, we always do argument reduction until \(|x|\) is smaller than about \(2^{-d}\) where \(d \approx 16\) to speed up convergence. If \(|x| \approx 2^m\), we thus need about \(m+d\) squarings.
Computing \(\log(2)\) costs roughly 100-200 multiplications, so is not usually worth the effort at very high precision. However, this function could be improved by using \(\log(2)\) based reduction at precision low enough that the value can be assumed to be cached.
-
void
_arb_exp_sum_bs_simple
(fmpz_t T, fmpz_t Q, mp_bitcnt_t * Qexp, const fmpz_t x, mp_bitcnt_t r, slong N)¶
-
void
_arb_exp_sum_bs_powtab
(fmpz_t T, fmpz_t Q, mp_bitcnt_t * Qexp, const fmpz_t x, mp_bitcnt_t r, slong N)¶ Computes T, Q and Qexp such that \(T / (Q 2^{\text{Qexp}}) = \sum_{k=1}^N (x/2^r)^k/k!\) using binary splitting. Note that the sum is taken to N inclusive and omits the constant term.
The powtab version precomputes a table of powers of x, resulting in slightly higher memory usage but better speed. For best efficiency, N should have many trailing zero bits.
-
void
_arb_atan_sum_bs_simple
(fmpz_t T, fmpz_t Q, mp_bitcnt_t * Qexp, const fmpz_t x, mp_bitcnt_t r, slong N)¶
-
void
_arb_atan_sum_bs_powtab
(fmpz_t T, fmpz_t Q, mp_bitcnt_t * Qexp, const fmpz_t x, mp_bitcnt_t r, slong N)¶ Computes T, Q and Qexp such that \(T / (Q 2^{\text{Qexp}}) = \sum_{k=1}^N (-1)^k (x/2^r)^{2k} / (2k+1)\) using binary splitting. Note that the sum is taken to N inclusive, omits the linear term, and requires a final multiplication by \((x/2^r)\) to give the true series for atan.
The powtab version precomputes a table of powers of x, resulting in slightly higher memory usage but better speed. For best efficiency, N should have many trailing zero bits.
-
void
arb_atan_arf_bb
(arb_t z, const arf_t x, slong prec)¶ Computes the arctangent of x. Initially, the argument-halving formula
\[\operatorname{atan}(x) = 2 \operatorname{atan}\left(\frac{x}{1+\sqrt{1+x^2}}\right)\]is applied up to 8 times to get a small argument. Then a version of the bit-burst algorithm is used. The functional equation
\[\operatorname{atan}(x) = \operatorname{atan}(p/q) + \operatorname{atan}(w), \quad w = \frac{qx-p}{px+q}, \quad p = \lfloor qx \rfloor\]is applied repeatedly instead of integrating a differential equation for the arctangent, as this appears to be more efficient.
Vector functions¶
-
int
_arb_vec_is_zero
(arb_srcptr vec, slong len)¶ Returns nonzero iff all entries in x are zero.
-
int
_arb_vec_is_finite
(arb_srcptr x, slong len)¶ Returns nonzero iff all entries in x certainly are finite.
-
void
_arb_vec_set
(arb_ptr res, arb_srcptr vec, slong len)¶ Sets res to a copy of vec.
-
void
_arb_vec_set_round
(arb_ptr res, arb_srcptr vec, slong len, slong prec)¶ Sets res to a copy of vec, rounding each entry to prec bits.
-
void
_arb_vec_neg
(arb_ptr B, arb_srcptr A, slong n)¶
-
void
_arb_vec_sub
(arb_ptr C, arb_srcptr A, arb_srcptr B, slong n, slong prec)¶
-
void
_arb_vec_add
(arb_ptr C, arb_srcptr A, arb_srcptr B, slong n, slong prec)¶
-
void
_arb_vec_scalar_mul_2exp_si
(arb_ptr res, arb_srcptr src, slong len, slong c)¶
-
void
_arb_vec_scalar_addmul
(arb_ptr res, arb_srcptr vec, slong len, const arb_t c, slong prec)¶ Performs the respective scalar operation elementwise.
-
void
_arb_vec_dot
(arb_t res, arb_srcptr vec1, arb_srcptr vec2, slong len2, slong prec)¶ Sets res to the dot product of vec1 and vec2.
-
void
_arb_vec_get_mag
(mag_t bound, arb_srcptr vec, slong len, slong prec)¶ Sets bound to an upper bound for the entries in vec.
-
slong
_arb_vec_bits
(arb_srcptr x, slong len)¶ Returns the maximum of
arb_bits()
for all entries in vec.
-
void
_arb_vec_set_powers
(arb_ptr xs, const arb_t x, slong len, slong prec)¶ Sets xs to the powers \(1, x, x^2, \ldots, x^{len-1}\).
-
void
_arb_vec_add_error_mag_vec
(arb_ptr res, mag_srcptr err, slong len)¶ Adds the magnitude of each entry in err to the radius of the corresponding entry in res.
-
void
_arb_vec_indeterminate
(arb_ptr vec, slong len)¶ Applies
arb_indeterminate()
elementwise.
-
void
_arb_vec_trim
(arb_ptr res, arb_srcptr vec, slong len)¶ Applies
arb_trim()
elementwise.
-
int
_arb_vec_get_unique_fmpz_vec
(fmpz * res, arb_srcptr vec, slong len)¶ Calls
arb_get_unique_fmpz()
elementwise and returns nonzero if all entries can be rounded uniquely to integers. If any entry in vec cannot be rounded uniquely to an integer, returns zero.