# Fast combinatorial and number-theoretic functions with FLINT 2

September 5, 2010

Time for a development update! Recently, I’ve done only a limited amount of work on mpmath (I have a some almost-finished Cython code for sage.libs.mpmath and new code for numerical integration in mpmath, both to be committed fairly soon — within a couple of weeks, hopefully).

The last few weeks, I’ve mostly been contributing to FLINT 2. For those unfamiliar with it, FLINT is a fast C library for computational number theory developed by Bill Hart and others (the other active developers right now are Sebastian Pancratz and Andy Novocin). In particular, FLINT implements ridiculously fast multiprecision integer vectors and polynomials. It also provides very fast primality testing and factorization for word-size integers (32 or 64 bits), among other things. FLINT 2 is an in-progress rewrite of FLINT 1.x, a current standard component in Sage.

What does this have to do with numerical evaluation of special functions (the usual theme of this blog)? In short, my goal is to add code to FLINT 2 for exact special function computations — combinatorial and number-theoretic functions, special polynomials and the like. Such functions benefit tremendously from the fast integer and polynomial arithmetic available in FLINT 2.

All my code can be found in my public GitHub repository (the most recent commits as of this writing are in the ‘factor’ branch).

Functions I’ve implemented so far include:

• Möbius μ and Euler φ (totient) functions for word-size and arbitrary-size integers
• Divisor sum function σk for arbitrary-size integers
• Ramanujan τ function (Δ-function q-expansion)
• Harmonic numbers 1 + 1/2 + 1/3 + … + 1/n
• Primorials 2 · 3 · 5 · … · pn
• Stirling numbers (1st and 2nd kind)

The versions in FLINT 2 of these functions should now be faster than all other implementations I’ve tried (GAP, Pari, Mathematica, the Sage library) for all ranges of arguments, except for those requiring factorization of large integers.

Some of these functions depend fundamentally on the ability to factorize integers efficiently. So far I’ve only implemented trial division for large integers in FLINT 2, with some clever code to extract large powers of small factors quickly. Sufficiently small cofactors are handled by calling Bill Hart’s single-word factoring routines. The resulting code is very fast for “artificial” numbers like factorials, and will eventually be complemented with prime and perfect power detection code, plus fast implementations of Brent’s algorithm and other methods. Later on the quadratic sieve from FLINT 1 will probably be ported to FLINT 2, so that FLINT 2 will be able to factor any reasonable number reasonably quickly.

Below, I’ve posted some benchmark results. A word of caution: all Mathematica timings were done on a different system, which is faster than my own laptop (typically by 30% or so). So in reality, Mathematica performs slightly worse relatively than indicated below. Everything else is timed on my laptop. I have not included test code for the FLINT2 functions (but it’s just straightforward C code — a function call or two between timeit_start and timeit_stop using FLINT 2′s profiler module).

Möbius function (the following is basically a raw exercise of the small-integer factoring code):

Pari:
sage: %time pari('sum(n=1,10^6,moebius(n))');
CPU times: user 1.04 s, sys: 0.00 s, total: 1.04 s
Wall time: 1.04 s

Mathematica:
In[1]:= Timing[Sum[MoebiusMu[n], {n,1,10^6}];]
Out[1]= {0.71, Null}

flint2:650 ms


Divisor sum:

Sage (uses Cython code):
sage: %time sigma(factorial(1000),1000);
CPU times: user 0.47 s, sys: 0.00 s, total: 0.47 s
Wall time: 0.46 s

Mathematica:
In[1]:= Timing[DivisorSigma[1000,1000!];]Out[1]= {3.01, Null}

flint2:
350 ms


Ramanujan τ function:

Sage (uses FLINT 1):
sage: %time delta_qexp(100000);
CPU times: user 0.42 s, sys: 0.01 s, total: 0.43 s
Wall time: 0.42 s
sage: %time delta_qexp(1000000);
CPU times: user 6.02 s, sys: 0.37 s, total: 6.39 s
Wall time: 6.40 s

flint2:
100000: 230 ms
1000000: 4500 ms


An isolated value (Mathematica seems to be the only other software that knows how to compute this):

Mathematica:
In[1]:= Timing[RamanujanTau[10000!];]
Out[1]= {8.74, Null}

flint2:
280 ms


Harmonic numbers (again, only Mathematica seems to implement these). See also my old blog post How (not) to compute harmonic numbers. I’ve included the fastest version from there, harmonic5:

Mathematica:
In[1]:= Timing[HarmonicNumber[100000];]
Out[1]= {0.22, Null}
In[2]:= Timing[HarmonicNumber[1000000];]
Out[2]= {6.25, Null}
In[3]:= Timing[HarmonicNumber[10000000];]
Out[3]= {129.13, Null}

harmonic5:
100000: 0.471 s
1000000: 8.259 s
10000000: 143.639 s

flint2:
100000: 100 ms
1000000: 2560 ms
10000000: 49400 ms


The FLINT 2 function benefits from an improved algorithm that eliminates terms and reduces the size of the temporary numerators and denominators, as well as low-level optimization (the basecase summation directly uses the MPIR mpn interface).

Isolated Stirling numbers of the first kind:

Mathematica:
In[1]:= Timing[StirlingS1[1000,500];]
Out[1]= {0.24, Null}
In[2]:= Timing[StirlingS1[2000,1000];]
Out[2]= {1.79, Null}
In[3]:= Timing[StirlingS1[3000,1500];]
Out[3]= {5.13, Null}

flint 2:
100,500: 100 ms
2000,1000: 740 ms
3000,1500: 1520 ms


Isolated Stirling numbers of the second kind:

Mathematica:
In[1]:= Timing[StirlingS2[1000,500];]
Out[1]= {0.21, Null}
In[2]:= Timing[StirlingS2[2000,1000];]
Out[2]= {1.54, Null}
In[3]:= Timing[StirlingS2[3000,1500];]
Out[3]= {4.55, Null}
In[4]:= Timing[StirlingS2[5000,2500];]
Out[4]= {29.25, Null}

flint2:
1000,500: 2 ms
2000,1000: 17 ms
3000,1500: 50 ms
5000,2500: 240 ms


In addition, fast functions are provided for computing a whole row or matrix of Stirling numbers. For example, computing the triangular matrix of ~1.1 million Stirling numbers of the first kind up to S(1500,1500) takes only 1.3 seconds. In Mathematica (again, on the faster system):

In[1]:= Timing[Table[StirlingS1[n,k], {n,0,1500}, {k,0,n}];]
Out[1]= {2.13, Null}


The benchmarks above mostly demonstrate performance for large inputs. Another nice aspect of the FLINT 2 functions is that there typically is very little overhead for small inputs. The high performance is due to a combination of algorithms, low-level optimization, and (most importantly) the fast underlying arithmetic in FLINT 2. I will perhaps write some more about the algorithms (for e.g. Stirling numbers) in a later post.