fredrikj.net / blog /

Computing special functions using numerical integration

December 12, 2021

In this post, I will discuss progress to improve large-parameter special functions in Arb by using direct numerical integration in some cases. For example, Arb can now evaluate the modified Bessel function $K_{\nu}(z)$ at $\nu = z = 10^{15}$ to 10 digits in about 0.01 seconds and to 1000 digits in 1.5 seconds; this would previously have been infeasible.

Contents

The problem

One of the most common bug reports (or requests for improvements) for both Arb and mpmath is that hypergeometric-type functions are slow or fail to convergence when the parameters are large. For example, a bug report from earlier this year points out that Arb's regularized incomplete beta function $I_x(a,b) = \frac{1}{B(a,b)} \int_0^x t^{a-1} (1-t)^{b-1}$ (a special case of the hypergeometric function ${}_2F_1(a,b,c,z)$) struggles when given not-too-unreasonable parameters $x = 4999/10000$, $a = b = 10^5$. A simple test program:

#include "arb_hypgeom.h"

int main()
{
    slong prec;
    arb_t z, a, b, res;

    arb_init(z);
    arb_init(a);
    arb_init(b);
    arb_init(res);

    for (prec = 64; ; prec *= 2)
    {
        arb_set_d(a, 1e5);
        arb_set_d(b, 1e5);
        arb_set_ui(z, 4999);
        arb_div_ui(z, z, 10000, prec);
        arb_hypgeom_beta_lower(res, a, b, z, 1, prec);
        printf("prec = %ld:  ", prec); arb_printn(res, 50, 0); printf("\n");
        if (arb_rel_accuracy_bits(res) >= 53)
            break;
    }
}

This does eventually converge, but only at 262144 bits of precision (the running time is 8 seconds):

prec = 64:  nan
prec = 128:  nan
prec = 256:  nan
prec = 512:  nan
prec = 1024:  nan
prec = 2048:  nan
prec = 4096:  nan
prec = 8192:  nan
prec = 16384:  nan
prec = 32768:  nan
prec = 65536:  nan
prec = 131072:  nan
prec = 262144:  [0.46436508135202051998898147610643724173619051317049 +/- 4.52e-51]

mpmath has the same problem (I do not know how high the precision needs to be set to force a result here):

>>> betainc(1e5, 1e5, 0, 4999/10000., regularized=True)
Traceback (most recent call last):
  ...
mpmath.libmp.libhyper.NoConvergence: Hypergeometric series converges too slowly. Try increasing maxterms.
Similarly, a couple of weeks ago, someone also asked me about the confluent hypergeometric function ${}_1F_1(a,b,z)$ with parameters $a, b$ and argument $z$ all of order $10^5$. It is possible to get a result with Arb, but it is not very efficient.

The problem is that both Arb and mpmath mainly compute such functions using hypergeometric series $${}_pF_q(\textbf{a},\textbf{b},z) = \sum_{n=0}^{\infty} \frac{(a_1)_n \cdots (a_p)_n}{(b_1)_n \cdots (b_q)_n} \frac{z^n}{n!}$$ and the number of terms needed for convergence typically increases linearly with the size of the parameters. If there are cancellation problems, the required working precision also increases linearly. In addition, both Arb and mpmath implement some precision-dependent "work limits", meaning that series evaluations typically will be halted after $O(\text{prec})$ terms (this can be suboptimal, as seen here, but avoids pathological behavior in other cases). Parameters of order $10^{2}$ are not much a problem, parameters of order $10^{5}$ lead to quite noticeable slowdown, and parameters of order $10^{10}$ will be quite a challenge. Asymptotic expansions and transformations allow the main argument $z$ to be arbitrarily large, but this only helps when the parameters are comparatively small.

Possible cures

It would be nice to have a master algorithm for the hypergeometric function ${}_pF_q(\textbf{a}, \textbf{b}, z)$ (as well as transformations and linear combinations of this function) that works efficiently for any combination of small or huge complex parameters and arguments. This is an unsolved problem; different cases are best handled by different methods which include the following:

  1. Convergent and asymptotic series expansions at singular points.
  2. Connection formulas.
  3. Limit calculations.
  4. Asymptotic expansions with respect to parameters.
  5. Exponentially-improved (uniform) asymptotic expansions.
  6. Expansions in terms of series of special functions (Laguerre polynomials, etc.)
  7. Continued fractions.
  8. Resummation.
  9. Numerical solution of differential equations.
  10. Numerical integration of integral representations.
  11. Euler-Maclaurin summation and other convergence acceleration techniques.

Arb mainly uses methods 1-3; mpmath uses methods 1-3 and in some special cases 8-11. It would be a lot of work to implement all the remaining methods, let alone figure out reasonable heuristics to choose between them automatically (as well as fine-tuning the existing methods). With some of the alternative methods, it is not even clear how to obtain reasonable error bounds which would be required for an implementation in Arb.

The prudent question to ask, then, is not "which methods are optimal?" but rather "which methods will yield the best return for a modest implementation effort?" I believe that numerical integration of integral representations is a good candidate.

Direct numerical integration is a time-honored approach to special function calculation. It is rarely the optimal approach, but it often works well enough, even with tricky parameter values, so it is at least a good fallback when other methods break down. And fortunately, Arb already has a very powerful arbitrary-precision numerical integration engine (see my blog post from 2017 and 2018 ICMS paper) combining Gauss-Legendre quadrature, adaptive subdivision and near-optimal rigorous error bounds based on complex analysis. All that needs to be done is putting this engine to good use.

Handling simple cases in Arb

The most commonly used hypergeometric functions have the following integral representations, valid with some restrictions on the parameters:

$${}_1F_1(a,b,z) = \frac{\Gamma(b)}{\Gamma(a) \Gamma(b-a)} \int_0^1 e^{zt} t^{a-1} (1-t)^{b-a-1} dt$$ $$U(a,b,z) = \frac{1}{\Gamma(a)} \int_0^{\infty} e^{-zt} t^{a-1} (1+t)^{b-a-1} dt$$ $${}_2F_1(a,b,c,z) = \frac{\Gamma(a)}{\Gamma(b) \Gamma(c-b)} \int_0^1 t^{b-1} (1-t)^{c-b-1} (1-zt)^{-a} dt$$

In a recent series of commits, I have implemented these formulas in Arb as fallbacks for various special functions, including ${}_1F_1$, $U$ and ${}_2F_1$ themselves as well as derived functions like the incomplete gamma and beta functions, exponential integrals, orthogonal polynomials and modified Bessel functions. Integeration is literally used as a fallback: the code first tries the usual series expansions and then attempts numerical integration if those methods do not converge to reasonable accuracy. (It does not make much sense to try numerical integration first since it may be 100 times slower than series expansions at the same level of precision.)

I have so far only implemented numerical integration for real parameters and argument, and with some restrictions on the parameters. I always like when I can do something general that works in the complex domain, but the real case is tricky enough that I think it makes sense to try to get it right before generalizing.

The code currently assumes that the argument $z$ is real (subject to $z > 0$ for $U$ and $z < 1$ for ${}_2F_1$) and that the parameters are such that the exponents of $t^{\alpha}$ or $(1-t)^{\beta}$ satisfy $\alpha, \beta \ge 0$. The last condition could be relaxed to allow exponents $\alpha, \beta > -1$, but this requires some more implementation work to truncate or transform away the blow-up singularities (Gauss-Jacobi quadrature would also be really nice here, but Arb does not yet have an implementation). These restrictions can in some cases be bypassed by applying contiguous relations or variable transformations. A trivial example is to exchange $a$ and $b$ in ${}_2F_1$; the code does this particular transformation automatically. The integral for $U$ needs to be truncated at a finite upper limit; the code also does this automatically (it is really easy to bound the truncation error here, and the bound does not even need to be tight since the adaptive numerical integration algorithm easily takes care of subintervals where the integrand is negligible).

Magnitude estimates and enclosures

What makes the numerical integration a bit tricky is that we specifically want to deal with large parameters, and some precautions are needed to ensure fast convergence. In essence, the integrand can vary by millions of orders of magnitude, but efficient Gauss-Legendre quadrature requires subintervals and surrounding complex rectangles (for error bounds) where the logarithmic variation is of the same order of magnitude as the precision, so it is crucial to compute accurate tolerances and bounds.

All the integrands here can be written as $\exp(g(t))$ where $g(t)$ is a "well-scaled" function. For example, $$g(t) = z t + (a-1) \log(t) + (b-a-1) \log(1-t)$$ in the ${}_1F_1$ integral, and $$g'(t) = t + \frac{a-1}{t} - \frac{b-a-1}{1-t}$$ is a simple rational function of $t$. If $x$ is the location of the maximum of $g$, then $e^{g(x)}$ is a good order-of-magnitude estimate of the integral, so $e^{g(x)} / 2^{p}$ is a good error tolerance for the numerical integration at $p$-bit precision. Finding the maximum is easy: we just need to check the endpoints of the integration interval and the solutions of the quadratic equation $g'(t) = 0$ (if they are inside the integration interval), and it is enough to do these calculations with approximate arithmetic.

For error bounds, recall that the integration algorithms in Arb as follows: given a subinterval $[a,b]$, check that the integrand is holomorphic inside a surrounding ellipse $E$ and bound its magnitude on this ellipse. Since Arb uses rectangular intervals, the integrand will be called with a surrounding rectangle $R$ as input:

I found that direct evaluation in ball arithmetic on $R$ does not work well here, even when using a first-order or second-order Taylor bound $g(m) + g'(R) (R-m)$ or $g(m) + g'(m) (R-m) + \tfrac{1}{2} g''(m) (R-m)^2$. I am not sure whether this is because a short Taylor expansion does not properly mitigate the dependency problem here or because of the fixed 30-bit radius precision of an arb_t (or something else). In any case, I found a simple solution: I evaluate the absolute value of the integrand on the edge $C$ using first-order Taylor bounds (half the boundary of $R$ is sufficient by assumption that the parameters are real so that we have conjugate symmetry). Writing $t = u + vi$, we have $$\log |\exp(g(t))| = \operatorname{Re}(g(t)) = h(u,v) = z u + \tfrac{1}{2} \left((a-1) \log(u^2+v^2) + (b-a-1) \log((u-1)^2+v^2)\right)$$ and $$\frac{d}{du} h(u,v) = z + \frac{u (a-1)}{u^2+v^2} + \frac{(u-1)(b-a-1)}{v^2+(1-u)^2},$$ $$\frac{d}{dv} h(u,v) = v \left[\frac{a-1}{u^2+v^2} + \frac{b-a-1}{v^2+(1-u)^2} \right].$$ which can be used on the horizontal and vertical segments respectively. I simply implemented these formulas using machine-precision interval arithmetic with a few subdivisions and it seems to do the job. We could actually find the exact maxima here instead of doing "brute force" interval arithmetic; it requires finding the roots of degree-4 polynomials. Arb's polynomial root-finding is not well optimized for finding real roots of low-degree polynomials, so I did not attempt to implement this method.

Evaluation examples

Incomplete beta function

With the new integration code implemented, let us try the incomplete beta test program again:

prec = 64:  [0.4643650813520 +/- 5.17e-14]
prec = 128:  [0.46436508135202051998898147610644 +/- 3.87e-33]

It works perfectly: the computations are instantaneous, and the result is nearly accurate to the working precision. Here is the output to compute $I_x(a,b)$ with $x = 4999/10000$ and $a = b = 10^{10}$:

prec = 64:  [2.6979112e-176 +/- 4.05e-184]
prec = 128:  [2.69791122252793228610950163e-176 +/- 2.47e-203]

Again with $x = 4999/10000$ and $a = b = 10^{15}$:

prec = 64:  [1.06e-17371784 +/- 1.98e-17371787]
prec = 128:  [1.0612052723416047758478e-17371784 +/- 7.21e-17371807]

There is some precision loss which is to expected since the function is sensitive to perturbations of the parameters.

The evaluation will still fail at $a = b = 10^{20}$ due to the fact that some computations are done with machine-precision arithmetic meaning parameters larger than about $10^{16}$ will cause magnitude variations that are too large to track accurately; fixing this will be a future project.

Let us modify the test program to do some benchmarketing, checking the time to evaluate the function to 10, 100 or 1000 digits with the above test parameters:

#include "arb_hypgeom.h"
#include "flint/profiler.h"

int main()
{
    slong prec, digits;
    double aa;
    arb_t z, a, b, res;

    arb_init(z);
    arb_init(a);
    arb_init(b);
    arb_init(res);

    for (aa = 1e5; aa <= 1e15; aa *= 1e5)
    {
        for (digits = 10; digits <= 1000; digits *= 10)
        {
            prec = digits * 3.333 + 2 * log(aa);   /* heuristic extra precision */

            arb_set_d(a, aa);
            arb_set_d(b, aa);
            arb_set_ui(z, 4999);
            arb_div_ui(z, z, 10000, prec);

            TIMEIT_START
            arb_hypgeom_beta_lower(res, a, b, z, 1, prec);
            TIMEIT_STOP
            TIMEIT_START
            arb_hypgeom_beta_lower(res, a, b, z, 1, prec);
            TIMEIT_STOP
            printf("digits = %ld:  ", digits); arb_printn(res, digits, ARB_STR_CONDENSE * 20); printf("\n");
        }

        printf("\n");
    }
}

I run the timing twice because at 1000 digits, the first call may be dominated by computing Gauss-Legendre nodes for the numerical integration. These nodes will be cached so that it will be fast to do repeated evaluations with similar parameters at the same level of precision.

This is the output:

cpu/wall(s): 0.000965 0.000966
cpu/wall(s): 0.000967 0.000967
digits = 10:  [0.4643650814 +/- 5.56e-11]
cpu/wall(s): 0.00282 0.00282
cpu/wall(s): 0.00285 0.00285
digits = 100:  [0.46436508135202051998{...60 digits...}18093974247328441056 +/- 4.20e-101]
cpu/wall(s): 3.217 3.217
cpu/wall(s): 0.687 0.687
digits = 1000:  [0.46436508135202051998{...960 digits...}80329562285284616944 +/- 1.60e-1001]

cpu/wall(s): 0.00335 0.00335
cpu/wall(s): 0.00334 0.00334
digits = 10:  [2.697911223e-176 +/- 4.73e-186]
cpu/wall(s): 0.00598 0.00598
cpu/wall(s): 0.006 0.006
digits = 100:  [2.69791122252793228610{...59 digits...}64310431721946381164e-176 +/- 3.79e-276]
cpu/wall(s): 6.393 6.394
cpu/wall(s): 0.869 0.868
digits = 1000:  [2.69791122252793228610{...959 digits...}45295504215933903150e-176 +/- 1.47e-1176]

cpu/wall(s): 0.00676 0.00676
cpu/wall(s): 0.00674 0.00674
digits = 10:  [1.061205272e-17371784 +/- 3.42e-17371794]
cpu/wall(s): 0.00952 0.00951
cpu/wall(s): 0.00953 0.00953
digits = 100:  [1.06120527234160477584{...59 digits...}41648509710952917020e-17371784 +/- 4.05e-17371884]
cpu/wall(s): 1.436 1.436
cpu/wall(s): 0.593 0.592
digits = 1000:  [1.06120527234160477584{...959 digits...}75550013805956125590e-17371784 +/- 4.76e-17372784]

It takes a few milliseconds to get 10 digits, a few more milliseconds to get 100 digits, and roughly a second (plus a few seconds of precomputation) to get 1000 digits. Surely not as fast as a specialized algorithm, but should be usable.

Modified Bessel function

To test another function, here is the analogous benchmark for the modified Bessel function $K_{\nu}(z)$ with $\nu = z = 10^5$, $\nu = z = 10^{10}$ and $\nu = z = 10^{15}$ (which internally will be computed via the $U$-integral):

#include "arb_hypgeom.h"
#include "flint/profiler.h"

int main()
{
    slong prec, digits;
    double aa;
    arb_t nu, z, res;

    arb_init(nu);
    arb_init(z);
    arb_init(res);

    for (aa = 1e5; aa <= 1e15; aa *= 1e5)
    {
        for (digits = 10; digits <= 1000; digits *= 10)
        {
            prec = digits * 3.333 + 2 * log(aa);

            arb_set_d(nu, aa);
            arb_set_d(z, aa);

            TIMEIT_START
            arb_hypgeom_bessel_k(res, nu, z, prec);
            TIMEIT_STOP
            TIMEIT_START
            arb_hypgeom_bessel_k(res, nu, z, prec);
            TIMEIT_STOP
            printf("digits = %ld:  ", digits); arb_printn(res, digits, ARB_STR_CONDENSE * 20); printf("\n");
        }

        printf("\n");
    }
}

Output:

cpu/wall(s): 0.00176 0.00176
cpu/wall(s): 0.00175 0.00175
digits = 10:  [3.773106158e-23144 +/- 5.25e-23154]
cpu/wall(s): 0.0065 0.00651
cpu/wall(s): 0.00651 0.0065
digits = 100:  [3.77310615753534768001{...59 digits...}56526989615914708462e-23144 +/- 7.13e-23245]
cpu/wall(s): 6.36 6.361
cpu/wall(s): 1.555 1.555
digits = 1000:  [3.77310615753534768001{...959 digits...}57764403313503624242e-23144 +/- 2.17e-24144]

cpu/wall(s): 0.00594 0.00595
cpu/wall(s): 0.00593 0.00594
digits = 10:  [4.871682180e-2314094616 +/- 1.70e-2314094627]
cpu/wall(s): 0.011 0.011
cpu/wall(s): 0.011 0.011
digits = 100:  [4.87168217998391362236{...59 digits...}79665114288262829528e-2314094616 +/- 5.52e-2314094717]
cpu/wall(s): 8.707 8.707
cpu/wall(s): 1.265 1.266
digits = 1000:  [4.87168217998391362236{...959 digits...}89118722232949816809e-2314094616 +/- 3.17e-2314095616]

cpu/wall(s): 0.0109 0.011
cpu/wall(s): 0.0109 0.0109
digits = 10:  [1.491544889e-231409461033520 +/- 4.31e-231409461033530]
cpu/wall(s): 0.0162 0.0162
cpu/wall(s): 0.0166 0.0166
digits = 100:  [1.49154488856962001460{...59 digits...}07313013687578758983e-231409461033520 +/- 3.44e-231409461033620]
cpu/wall(s): 1.615 1.615
cpu/wall(s): 1.436 1.435
digits = 1000:  [1.49154488856962001460{...959 digits...}87974235707595478542e-231409461033520 +/- 1.89e-231409461034520]

Extending to complex or super-huge parameters

The implementation so far just deals with easy cases; it would be nice to support the following:

I believe the first task just requires implementing the tolerance and bounds calculations with higher-precision arithmetic, though there could potentially be some more subtle issues that turn up when you consider parameters of size $10^{100}$ or $10^{1000}$. Supporting general complex parameters is much harder; to begin with, the integral representations used above are not even valid for all parameter values, so it is necessary to work with different representations (typically complex contour integrals). Furthermore, with large complex parameters it is absolutely necessary to deform the integration contour to follow paths of steepest descent.

For example, Arb already uses numerical integration to compute Stieltjes constants $\gamma_n(a)$ via the representation $$\gamma_n(a) = -\frac{\pi}{2(n+1)} \int_{-\infty}^{\infty} \frac{\log^{n+1}(a-\tfrac12 + ix)}{\cosh^2(\pi x)}\,dx, \quad \operatorname{Re}(a) > \tfrac{1}{2}$$ as discussed in my joint paper with Iaroslav Blagouchine. For large $n$, the integrand starts to oscillate along the real line, leading to catastrophic cancellation. To avoid this, we deform the path to pass through a saddle point in the upper plane. The following plot shows the integrand for $n = 500$; on the left, following the real line; on the right, passing through the saddle point.

Here, we have 5 digits of cancellation on the real line, this gets exponentially worse with larger $n$. With the deformed path, there is no cancellation whatsoever; we can easily evaluate $\gamma_n(1)$ at $n = 10^{100}$, for example.

Finding good integration contours to compute special functions is a classical problem in numerical analysis; there are a lot of papers on this. For integrands depending on several parameters, it seems to be difficult to design robust algorithms; many of the algorithms in the literature are really just heuristic recipes that work until they don't. For example, the paper Numerical calculation of Bessel, Hankel and Airy functions (2011) by Jentschura and Lötstedt discusses paths for computing Bessel functions with large parameters; their algorithms work well enough on some examples, but when I played with it a few years ago I found cases where they seemed to be wrong (emphasis because it is possible that I just messed up somewhere). Quoting from an email I sent to the authors in 2017:

Take, for example, nu = 200+100i, z = 50-20i.

Then t_plus = 2.12 + 0.86i and t_minus = -2.12 - 0.86i.

In this case, since Im(t_minus) < 0, Case 2 applies, where you claim
that one has to pass through both saddle points and obtain the
contribution from both to compute J_nu(z).

However, in this case

|exp(g(t_plus))| = 6.47e+60
|exp(g(t_minus))| = 1.55e-61

and J_nu(z) = 1.33e-63 + 3.89e-63i.

So one should clearly not be passing through t_plus.

I wonder if in this case t_plus should actually be changed t_plus +
2pi i in the path for J_nu(z)? You write that the imaginary parts of
t_plus and t_minus should be in the range [-pi,pi] (Smink, de Hon, and
Tijhuis make the same claim) but perhaps this is not always correct.
Note that the Hankel functions are both large in this case so the
problem might just affect J_nu(z) (I have not attempted to check this
carefully).

I did not get a reply, and also have not had time to look at this again. Figuring out something reliable just for Bessel functions would be really nice since these are widely-used functions, and this could perhaps be a starting point for systematic algorithms covering more general functions.



fredrikj.net  |  Blog index  |  RSS feed  |  Follow me on Twitter  |  Become a sponsor