arf.h – arbitrary-precision floating-point numbers¶
A variable of type arf_t
holds an arbitrary-precision binary
floating-point number, i.e. a rational number of the form
\(x \times 2^y\) where \(x, y \in \mathbb{Z}\) and \(x\) is odd;
or one of the special values zero, plus infinity, minus infinity,
or NaN (not-a-number).
The exponent of a finite and nonzero floating-point number can be
defined in different
ways: for example, as the component y above, or as the unique
integer e such that
\(x \times 2^y = m \times 2^e\) where \(1/2 \le |m| < 1\).
The internal representation of an arf_t
stores the
exponent in the latter format.
The conventions for special values largely follow those of the IEEE floating-point standard. At the moment, there is no support for negative zero, unsigned infinity, or a NaN with a payload, though some of these might be added in the future.
Except where otherwise noted, the output of an operation is the floating-point number obtained by taking the inputs as exact numbers, in principle carrying out the operation exactly, and rounding the resulting real number to the nearest representable floating-point number whose mantissa has at most the specified number of bits, in the specified direction of rounding. Some operations are always or optionally done exactly.
The arf_t
type is almost identical semantically to
the legacy fmpr_t
type, but uses a more efficient
internal representation.
The most significant differences that the user
has to be aware of are:
- The mantissa is no longer represented as a FLINT
fmpz
, and the internal exponent points to the top of the binary expansion of the mantissa instead of of the bottom. Code designed to manipulate components of anfmpr_t
directly can be ported to thearf_t
type by making use ofarf_get_fmpz_2exp()
andarf_set_fmpz_2exp()
. - Some
arf_t
functions return anint
indicating whether a result is inexact, whereas the correspondingfmpr_t
functions return anslong
encoding the relative exponent of the error.
Types, macros and constants¶
-
arf_struct
¶
-
arf_t
¶ An
arf_struct
contains four words: anfmpz
exponent (exp), a size field tracking the number of limbs used (one bit of this field is also used for the sign of the number), and two more words. The last two words hold the value directly if there are at most two limbs, and otherwise contain one alloc field (tracking the total number of allocated limbs, not all of which might be used) and a pointer to the actual limbs. Thus, up to 128 bits on a 64-bit machine and 64 bits on a 32-bit machine, no space outside of thearf_struct
is used.An
arf_t
is defined as an array of length one of typearf_struct
, permitting anarf_t
to be passed by reference.
-
arf_rnd_t
¶ Specifies the rounding mode for the result of an approximate operation.
-
ARF_RND_DOWN
¶ Specifies that the result of an operation should be rounded to the nearest representable number in the direction towards zero.
-
ARF_RND_UP
¶ Specifies that the result of an operation should be rounded to the nearest representable number in the direction away from zero.
-
ARF_RND_FLOOR
¶ Specifies that the result of an operation should be rounded to the nearest representable number in the direction towards minus infinity.
-
ARF_RND_CEIL
¶ Specifies that the result of an operation should be rounded to the nearest representable number in the direction towards plus infinity.
-
ARF_RND_NEAR
¶ Specifies that the result of an operation should be rounded to the nearest representable number, rounding to even if there is a tie between two values.
-
ARF_PREC_EXACT
¶ If passed as the precision parameter to a function, indicates that no rounding is to be performed. Warning: use of this value is unsafe in general. It must only be passed as input under the following two conditions:
- The operation in question can inherently be viewed as an exact operation in \(\mathbb{Z}[\tfrac{1}{2}]\) for all possible inputs, provided that the precision is large enough. Examples include addition, multiplication, conversion from integer types to arbitrary-precision floating-point types, and evaluation of some integer-valued functions.
- The exact result of the operation will certainly fit in memory. Note that, for example, adding two numbers whose exponents are far apart can easily produce an exact result that is far too large to store in memory.
The typical use case is to work with small integer values, double precision constants, and the like. It is also useful when writing test code. If in doubt, simply try with some convenient high precision instead of using this special value, and check that the result is exact.
Memory management¶
Special values¶
-
int
arf_is_nan
(const arf_t x)¶ Returns nonzero iff x respectively equals 0, 1, \(+\infty\), \(-\infty\), NaN.
-
int
arf_is_normal
(const arf_t x)¶ Returns nonzero iff x is a finite, nonzero floating-point value, i.e. not one of the special values 0, \(+\infty\), \(-\infty\), NaN.
-
int
arf_is_special
(const arf_t x)¶ Returns nonzero iff x is one of the special values 0, \(+\infty\), \(-\infty\), NaN, i.e. not a finite, nonzero floating-point value.
-
int
arf_is_finite
(arf_t x)¶ Returns nonzero iff x is a finite floating-point value, i.e. not one of the values \(+\infty\), \(-\infty\), NaN. (Note that this is not equivalent to the negation of
arf_is_inf()
.)
Assignment, rounding and conversions¶
-
int
arf_set_round_fmpz
(arf_t y, const fmpz_t x, slong prec, arf_rnd_t rnd)¶ Sets y to x, rounded to prec bits in the direction specified by rnd.
-
int
arf_set_round_fmpz_2exp
(arf_t y, const fmpz_t x, const fmpz_t e, slong prec, arf_rnd_t rnd)¶ Sets y to \(x \times 2^e\), rounded to prec bits in the direction specified by rnd.
-
void
arf_get_fmpz_2exp
(fmpz_t m, fmpz_t e, const arf_t x)¶ Sets m and e to the unique integers such that \(x = m \times 2^e\) and m is odd, provided that x is a nonzero finite fraction. If x is zero, both m and e are set to zero. If x is infinite or NaN, the result is undefined.
-
void
arf_frexp
(arf_t m, fmpz_t e, const arf_t x)¶ Writes x as \(m \times 2^e\), where \(1/2 \le |m| < 1\) if x is a normal value. If x is a special value, copies this to m and sets e to zero. Note: for the inverse operation (ldexp), use
arf_mul_2exp_fmpz()
.
-
double
arf_get_d
(const arf_t x, arf_rnd_t rnd)¶ Returns x rounded to a double in the direction specified by rnd. This method rounds correctly when overflowing or underflowing the double exponent range (this was not the case in an earlier version).
-
int
arf_get_mpfr
(mpfr_t y, const arf_t x, mpfr_rnd_t rnd)¶ Sets the MPFR variable y to the value of x. If the precision of x is too small to allow y to be represented exactly, it is rounded in the specified MPFR rounding mode. The return value (-1, 0 or 1) indicates the direction of rounding, following the convention of the MPFR library.
-
int
arf_get_fmpz
(fmpz_t z, const arf_t x, arf_rnd_t rnd)¶ Sets z to x rounded to the nearest integer in the direction specified by rnd. If rnd is ARF_RND_NEAR, rounds to the nearest even integer in case of a tie. Returns inexact (beware: accordingly returns whether x is not an integer).
This method aborts if x is infinite or NaN, or if the exponent of x is so large that allocating memory for the result fails.
Warning: this method will allocate a huge amount of memory to store the result if the exponent of x is huge. Memory allocation could succeed even if the required space is far larger than the physical memory available on the machine, resulting in swapping. It is recommended to check that x is within a reasonable range before calling this method.
-
slong
arf_get_si
(const arf_t x, arf_rnd_t rnd)¶ Returns x rounded to the nearest integer in the direction specified by rnd. If rnd is ARF_RND_NEAR, rounds to the nearest even integer in case of a tie. Aborts if x is infinite, NaN, or the value is too large to fit in a slong.
-
int
arf_get_fmpz_fixed_si
(fmpz_t y, const arf_t x, slong e)¶ Converts x to a mantissa with predetermined exponent, i.e. computes an integer y such that \(y \times 2^e \approx x\), truncating if necessary. Returns 0 if exact and 1 if truncation occurred.
The warnings for
arf_get_fmpz()
apply.
-
void
arf_ceil
(arf_t y, const arf_t x)¶ Sets y to \(\lfloor x \rfloor\) and \(\lceil x \rceil\) respectively. The result is always represented exactly, requiring no more bits to store than the input. To round the result to a floating-point number with a lower precision, call
arf_set_round()
afterwards.
Comparisons and bounds¶
-
int
arf_equal_si
(const arf_t x, slong y)¶ Returns nonzero iff x and y are exactly equal. This function does not treat NaN specially, i.e. NaN compares as equal to itself.
-
int
arf_cmp
(const arf_t x, const arf_t y)¶ Returns negative, zero, or positive, depending on whether x is respectively smaller, equal, or greater compared to y. Comparison with NaN is undefined.
-
int
arf_cmpabs_2exp_si
(const arf_t x, slong e)¶ Compares x (respectively its absolute value) with \(2^e\).
-
int
arf_sgn
(const arf_t x)¶ Returns \(-1\), \(0\) or \(+1\) according to the sign of x. The sign of NaN is undefined.
-
void
arf_max
(arf_t z, const arf_t a, const arf_t b)¶ Sets z respectively to the minimum and the maximum of a and b.
-
slong
arf_bits
(const arf_t x)¶ Returns the number of bits needed to represent the absolute value of the mantissa of x, i.e. the minimum precision sufficient to represent x exactly. Returns 0 if x is a special value.
-
int
arf_is_int_2exp_si
(const arf_t x, slong e)¶ Returns nonzero iff x equals \(n 2^e\) for some integer n.
-
void
arf_abs_bound_lt_2exp_fmpz
(fmpz_t b, const arf_t x)¶ Sets b to the smallest integer such that \(|x| < 2^b\). If x is zero, infinity or NaN, the result is undefined.
Magnitude functions¶
-
void
arf_get_mag_lower
(mag_t y, const arf_t x)¶ Sets y to a lower bound for the absolute value of x.
-
void
mag_fast_init_set_arf
(mag_t y, const arf_t x)¶ Initializes y and sets it to an upper bound for x. Assumes that the exponent of y is small.
-
void
arf_mag_set_ulp
(mag_t z, const arf_t y, slong prec)¶ Sets z to the magnitude of the unit in the last place (ulp) of y at precision prec.
Shallow assignment¶
-
void
arf_init_set_mag_shallow
(arf_t z, const mag_t x)¶ Initializes z to a shallow copy of x. A shallow copy just involves copying struct data (no heap allocation is performed).
The target variable z may not be cleared or modified in any way (it can only be used as constant input to functions), and may not be used after x has been cleared. Moreover, after x has been assigned shallowly to z, no modification of x is permitted as slong as z is in use.
Random number generation¶
-
void
arf_randtest
(arf_t x, flint_rand_t state, slong bits, slong mag_bits)¶ Generates a finite random number whose mantissa has precision at most bits and whose exponent has at most mag_bits bits. The values are distributed non-uniformly: special bit patterns are generated with high probability in order to allow the test code to exercise corner cases.
-
void
arf_randtest_not_zero
(arf_t x, flint_rand_t state, slong bits, slong mag_bits)¶ Identical to
arf_randtest()
, except that zero is never produced as an output.
-
void
arf_randtest_special
(arf_t x, flint_rand_t state, slong bits, slong mag_bits)¶ Identical to
arf_randtest()
, except that the output occasionally is set to an infinity or NaN.
Input and output¶
-
void
arf_printd
(const arf_t y, slong d)¶ Prints x as a decimal floating-point number, rounding to d digits. This function is currently implemented using MPFR, and does not support large exponents.
Addition and multiplication¶
-
int
arf_neg_round
(arf_t y, const arf_t x, slong prec, arf_rnd_t rnd)¶ Sets \(y = -x\), rounded to prec bits in the direction specified by rnd, returning nonzero iff the operation is inexact.
-
int
arf_mul_fmpz
(arf_t z, const arf_t x, const fmpz_t y, slong prec, arf_rnd_t rnd)¶ Sets \(z = x \times y\), rounded to prec bits in the direction specified by rnd, returning nonzero iff the operation is inexact.
-
int
arf_add_fmpz
(arf_t z, const arf_t x, const fmpz_t y, slong prec, arf_rnd_t rnd)¶ Sets \(z = x + y\), rounded to prec bits in the direction specified by rnd, returning nonzero iff the operation is inexact.
-
int
arf_add_fmpz_2exp
(arf_t z, const arf_t x, const fmpz_t y, const fmpz_t e, slong prec, arf_rnd_t rnd)¶ Sets \(z = x + y 2^e\), rounded to prec bits in the direction specified by rnd, returning nonzero iff the operation is inexact.
-
int
arf_sub_fmpz
(arf_t z, const arf_t x, const fmpz_t y, slong prec, arf_rnd_t rnd)¶ Sets \(z = x - y\), rounded to prec bits in the direction specified by rnd, returning nonzero iff the operation is inexact.
Summation¶
-
int
arf_sum
(arf_t s, arf_srcptr terms, slong len, slong prec, arf_rnd_t rnd)¶ Sets s to the sum of the array terms of length len, rounded to prec bits in the direction specified by rnd. The sum is computed as if done without any intermediate rounding error, with only a single rounding applied to the final result. Unlike repeated calls to
arf_add()
with infinite precision, this function does not overflow if the magnitudes of the terms are far apart. Warning: this function is implemented naively, and the running time is quadratic with respect to len in the worst case.
Division¶
Square roots¶
-
int
arf_sqrt_fmpz
(arf_t z, const fmpz_t x, slong prec, arf_rnd_t rnd)¶ Sets \(z = \sqrt{x}\), rounded to prec bits in the direction specified by rnd, returning nonzero iff the operation is inexact. The result is NaN if x is negative.
-
int
arf_rsqrt
(arf_t z, const arf_t x, slong prec, arf_rnd_t rnd)¶ Sets \(z = 1/\sqrt{x}\), rounded to prec bits in the direction specified by rnd, returning nonzero iff the operation is inexact. The result is NaN if x is negative, and \(+\infty\) if x is zero.
-
int
arf_root
(arf_t z, const arf_t x, ulong k, slong prec, arf_rnd_t rnd)¶ Sets \(z = x^{1/k}\), rounded to prec bits in the direction specified by rnd, returning nonzero iff the operation is inexact. The result is NaN if x is negative. Warning: this function is a wrapper around the MPFR root function. It gets slow and uses much memory for large k.
Complex arithmetic¶
-
int
arf_complex_mul
(arf_t e, arf_t f, const arf_t a, const arf_t b, const arf_t c, const arf_t d, slong prec, arf_rnd_t rnd)¶
-
int
arf_complex_mul_fallback
(arf_t e, arf_t f, const arf_t a, const arf_t b, const arf_t c, const arf_t d, slong prec, arf_rnd_t rnd)¶ Computes the complex product \(e + fi = (a + bi)(c + di)\), rounding both \(e\) and \(f\) correctly to prec bits in the direction specified by rnd. The first bit in the return code indicates inexactness of \(e\), and the second bit indicates inexactness of \(f\).
If any of the components a, b, c, d is zero, two real multiplications and no additions are done. This convention is used even if any other part contains an infinity or NaN, and the behavior with infinite/NaN input is defined accordingly.
The fallback version is implemented naively, for testing purposes. No squaring optimization is implemented.
Low-level methods¶
-
int
_arf_get_integer_mpn
(mp_ptr y, mp_srcptr xp, mp_size_t xn, slong exp)¶ Given a floating-point number x represented by xn limbs at xp and an exponent exp, writes the integer part of x to y, returning whether the result is inexact. The correct number of limbs is written (no limbs are written if the integer part of x is zero). Assumes that
xp[0]
is nonzero and that the top bit ofxp[xn-1]
is set.
-
int
_arf_set_mpn_fixed
(arf_t z, mp_srcptr xp, mp_size_t xn, mp_size_t fixn, int negative, slong prec, arf_rnd_t rnd)¶ Sets z to the fixed-point number having xn total limbs and fixn fractional limbs, negated if negative is set, rounding z to prec bits in the direction rnd and returning whether the result is inexact. Both xn and fixn must be nonnegative and not so large that the bit shift would overflow an slong, but otherwise no assumptions are made about the input.
-
int
_arf_set_round_ui
(arf_t z, ulong x, int sgnbit, slong prec, arf_rnd_t rnd)¶ Sets z to the integer x, negated if sgnbit is 1, rounded to prec bits in the direction specified by rnd. There are no assumptions on x.
-
int
_arf_set_round_uiui
(arf_t z, slong * fix, mp_limb_t hi, mp_limb_t lo, int sgnbit, slong prec, arf_rnd_t rnd)¶ Sets the mantissa of z to the two-limb mantissa given by hi and lo, negated if sgnbit is 1, rounded to prec bits in the direction specified by rnd. Requires that not both hi and lo are zero. Writes the exponent shift to fix without writing the exponent of z directly.
-
int
_arf_set_round_mpn
(arf_t z, slong * exp_shift, mp_srcptr x, mp_size_t xn, int sgnbit, slong prec, arf_rnd_t rnd)¶ Sets the mantissa of z to the mantissa given by the xn limbs in x, negated if sgnbit is 1, rounded to prec bits in the direction specified by rnd. Returns the inexact flag. Requires that xn is positive and that the top limb of x is nonzero. If x has leading zero bits, writes the shift to exp_shift. This method does not write the exponent of z directly. Requires that x does not point to the limbs of z.