fredrikj.net / blog /

# Additional results

*August 26, 2016*

The preprint *Short addition
sequences
for theta functions (arXiv:1608.06810)* by Andreas Enge, Bill
Hart and myself
is now available.
We prove some nice new theorems about integer sequences
of a very simple kind, namely successive
values of quadratic polynomials such as the ordinary squares ($f(k) =
k^2$).
The purpose is actually to use the structure
present in these sequences to speed up numerical evaluation of classical
modular forms and functions such as the Dedekind eta function
$\eta(\tau)$
and the *j*-invariant $j(\tau)$.
The latest versions of
Arb
and CM
implement the methods in the paper, resulting in
a 4x speedup in practice over the classical method for evaluating these
functions.

## The eta function

Consider the series expansion of the eta function:
$$\eta(\tau) = q^{1/24} \sum_{k=-\infty}^{\infty} (-1)^k q^{k(3k-1)/2} =
q^{1/24} \left[ 1 - q - q^2 + q^5 + q^7 - q^{12} - q^{15} + \ldots
\right]$$
The exponents $0, 1, 2, 5, 7, 12, 15, 22, 26, 35, 40, \ldots$ are the
*generalized pentagonal numbers*, of Euler fame.
Our goal is to compute the occurring powers of $q$ using as few complex
multiplications
as possible ($q = e^{2\pi i \tau}$ is a given complex number).
Equivalently, we want to generate the occurring exponents using as few
additions
as possible: if $c = a + b$, then we get $q^c$ by computing $q^a \cdot
q^b$.
With different terminology, we want to find a short *addition
sequence*
that includes all the generalized pentagonal numbers as a subsequence.
This is a variation of the well-known problem of finding a short
*addition chain* to compute a single power $q^k$, but
instead of considering a single exponent, we consider
a sequence of exponents together.

The classical solution for the eta series is to introduce auxiliary
sequences
of finite differences:
$$ 5 - 1 = 4, \quad 12 - 5 = 7, \quad 22 - 12 = 10, \ldots$$
$$ 7 - 2 = 5, \quad 15 - 7 = 8, \quad 26 - 15 = 11, \ldots$$
Since these sequences can be generated using one addition
per term (they have a constant step size 3), we need two additions per
term to construct
the generalized pentagonal number sequence.
In other words, we need
$2N$ multiplications to compute the first $N$ terms of the eta
function's $q$-series.
However, inspection shows that some of the generalized pentagonal
numbers
can be written as the sum of two smaller generalized pentagonal numbers
without
relying on the auxiliary sequences.
For example,
$2 = 1 + 1, 7 = 2 + 5, 12 = 5 + 7, 22 = 7 + 15$.
This doesn't work for all entries, but if it works for *most*
entries,
and if the remaining entries can be computed quickly enough, then
asymptotically we
should be able to get away with just $N + o(N)$ multiplications.
We show that this, essentially, is the case:

**Theorem**: *Assuming a standard conjecture about primes
(Bateman-Horn),
almost all generalized
pentagonal numbers $c$ can be written as $c = a+b$ where $a,b$ are
smaller
generalized pentagonal numbers; the probability that no such
representation
exists is heuristically $O(1/\log c)$.*

**Theorem**: *Every generalized pentagonal number $c \ge 2$ can
be written as
$c = 2a+b$ where $a, b$ are smaller generalized pentagonal
numbers.*

In other words, we can compute each new term $q^c$ in the eta function's $q$-series as $q^c = (q^a)^2 \cdot q^b$, where $q^a$ and $q^b$ are previously computed terms. Moreover, if we use this only as a fallback and try $q^c = q^a \cdot q^b$ first, then (assuming Bateman-Horn) computing the first $N$ terms takes only $N + O(N / \log N)$ multiplications, compared to $2N$ with the classical method using finite differences.

You can easily check that solutions of $c=2a+b$ exist for the first few $c$. Here is some Python code and the output for $c < 1000$:

from itertools import takewhile def pentagonal(): a = 0; b = 1; c = 2; d = 4 while 1: yield a; a += c; c += 3 yield b; b += d; d += 3 # naive code; the search can easily be done much faster def solutions(c): return [(a,b) for a in takewhile(lambda a: a < c, pentagonal()) for b in takewhile(lambda b: b < c, pentagonal()) if 2*a + b == c] for c in takewhile(lambda c: c < 1000, pentagonal()): print c, solutions(c)

0 [] 1 [] 2 [(1, 0)] 5 [(2, 1)] 7 [(1, 5)] 12 [(5, 2)] 15 [(5, 5), (7, 1)] 22 [(5, 12)] 26 [(2, 22), (7, 12), (12, 2)] 35 [(15, 5)] 40 [(7, 26)] 51 [(22, 7)] 57 [(26, 5)] 70 [(15, 40), (22, 26), (35, 0)] 77 [(35, 7)] 92 [(26, 40), (35, 22), (40, 12)] 100 [(15, 70)] 117 [(51, 15)] 126 [(57, 12)] 145 [(70, 5)] 155 [(5, 145), (70, 15), (77, 1)] 176 [(77, 22)] 187 [(35, 117)] 210 [(70, 70), (92, 26)] 222 [(100, 22)] 247 [(51, 145)] 260 [(117, 26)] 287 [(126, 35)] 301 [(7, 287), (57, 187), (92, 117)] 330 [(35, 260), (77, 176), (145, 40)] 345 [(22, 301), (100, 145), (155, 35)] 376 [(77, 222), (100, 176), (187, 2)] 392 [(176, 40)] 425 [(40, 345), (187, 51), (210, 5)] 442 [(210, 22)] 477 [(26, 425), (145, 187), (210, 57)] 495 [(35, 425), (222, 51), (247, 1)] 532 [(70, 392), (155, 222), (260, 12)] 551 [(247, 57)] 590 [(260, 70)] 610 [(117, 376)] 651 [(287, 77)] 672 [(70, 532), (301, 70), (330, 12)] 715 [(145, 425)] 737 [(330, 77)] 782 [(345, 92)] 805 [(77, 651), (155, 495), (330, 145)] 852 [(35, 782), (376, 100), (425, 2)] 876 [(12, 852), (392, 92), (425, 26)] 925 [(187, 551)] 950 [(287, 376), (345, 260), (425, 100)]

It's a bit surprising that this *always* works.
Many $c$ have just a single solution; there is no asymptotic trend
to suggest that the first few cases are anything but coincidences.
Meanwhile, some $c$ do have many solutions,
so there is no obvious bijection at work either.
We discovered the result empirically:
while tracing some code that tried several
different term combinations for constructing addition sequences,
we noticed that the $2a+b$ branch alone
worked for the first several thousand $c$.
Proving the conjecture then took a bit of work (the proof
uses mostly basic algebraic number theory, but it's not trivial).

## The same for theta

We also prove similar results relevant for the
computation of Jacobi theta functions.
Here we are mainly interested in the so-called theta constants
which are given by the three series
$$\sum_{k=1}^{\infty} q^{k^2}, \quad
\sum_{k=1}^{\infty} (-1)^k q^{k^2}, \quad
\sum_{k=1}^{\infty} (-1)^k q^{k(k+1)}.$$
These theta series are not the most general possible,
but they are sufficient
for computation of $\operatorname{SL}_2(\mathbb{Z})$ modular forms and
functions,
and for special values of elliptic functions.
The exponents $k^2 = 1,4,9,\ldots$ are of course the *squares*,
and we call $k(k+1) = 2,6,12,\ldots$ the *trigonal numbers* (these
are twice the more well-known *triangular numbers*).

Just like with the pentagonal numbers, the classical algorithm to evaluate any one of the theta series uses the differences between successive exponents. This requires $2N$ multiplications for $N$ terms. We prove (again subject to Bateman-Horn) that $N + O(N / \log N)$ multiplications suffice. Our counterparts of the "$2a+b$" theorem for pentagonal numbers are the following:

**Theorem**: *Every trigonal number $c \ge 6$ is the sum of at
most three smaller ones.*

**Theorem**: *Every almost-square $c \ge 24$ is the sum of at
most three smaller almost-squares (an almost-square is an integer $c =
k^2 - 1$).*

In practice, the three theta series are usually needed together,
so it's useful to consider
the *quarter-squares* $1, 2, 4, 6, 9, 12, 16, \ldots$
formed by interleaving squares
and trigonal numbers. We show the following:

**Theorem**: *Every quarter-square $c > 1$ is the sum of a
smaller one and twice a smaller one.*

We can do a little better yet if we interleave the exponents $k(k+1)$ with $(2k+1)^2-1$ and $(2k)^2$, where we have split the squares into the odd and even terms and peeled off one factor $q$ from the powers of $q$ with an odd exponent so that all exponents are even. The interleaved sequence $2, 4, 6, 8, 12, 16, 20, 24, 30, 36, \ldots$ can be written in closed form as $2 \lfloor k^2 / 8 \rfloor$. We show the following:

**Theorem**: *Every element $c \ge 4$ in the sequence $2 \lfloor
k^2 / 8 \rfloor$ is the sum of two smaller ones.*

The last theorem shows that we can compute all three theta series to $N$ terms each using $2N$ multiplications altogether, in contrast to the $4N$ multiplications required with the classical finite difference method. Note that this complexity result does not depend on any conjectures.

## Baby steps

There is one more trick before we get to the end of the paper.
We clearly need at least $N$ multiplications to compute $N$ distinct
powers of $q$, but we might get away with fewer multiplications
to compute a
*sum* of powers if we can avoid computing every single power
explicitly.

The idea of the baby-step giant-step algorithm for polynomial evaluation, also called rectangular splitting, is to write $$ \begin{align*} \sum_{k=0}^T c_k q^k & = [c_0 + c_1 q + \ldots + c_{m-1} q^{m-1}] \\ & + [c_m + c_{m+1} q + \ldots + c_{2m-1} q^{m-1}] q^m \\ & + [c_{2m} + c_{2m+1} q + \ldots + c_{3m-1} q^{m-1}] q^{2m} \\ & \vdots \\ \end{align*} $$ for some tuning parameter $m$. The coefficients $c_k$ are assumed to be cheap to multiply with, in our case $c_k \in \{-1,0,+1\}$. We need about $m + T / m$ complex multiplications with this scheme: $m$ to precompute the powers $q^2, q^3, \ldots q^{m-1}$ which allows evaluating all the polynomials inside brackets quickly, and $T / m$ complex multiplications to evaluate the outer polynomial in $q^m$. In the dense case, that is when most $c_k \ne 0$, it is optimal to take $m \approx T^{1/2}$, giving $2 T^{1/2}$ total complex multiplications. The truncated eta and theta series are very sparse, however: we have only $N \sim T^{1/2}$ nonzero $c_k$ for $k \le T$, and it turns out that this choice of $m$ gives no improvement over the classical approach of computing all powers explicitly.

In the sparse case, we can do better by choosing $m$ in such a way that only a subset of $q^2, q^3, \ldots, q^{m-1}$ appear with a nonzero coefficient in the inner polynomials. This means that we want to find parameters $m$ such that the squares, trigonal numbers or generalized pentagonal numbers respectively take few distinct values modulo $m$. For example, if $s(m)$ is the number of distinct square residues modulo $m$, we need about $s(m) + N^2 / m$ complex multiplications to evaluate $\sum_{k=1}^N q^{k^2}$ rather than $m + N^2 / m$. It now makes sense to look for $m$ such that $s(m) / m$ is small. The list of optimal $m$ values for the sequence of squares is OEIS A085635, also listed in Table 4 in the paper. For example, $s(3600) = 176$, so if we pick $m = 3600$, only $176 / 3600 \approx 5\%$ of the powers $q^2, q^3, \ldots, q^{m-1}$ need to be computed.

In the paper, we determine how to choose $m$ nearly optimally for an arbitrary quadratic exponent sequence $F(k) = Ak^2 + Bk + C \in \mathbb{Q}[k]$, giving squares, trigonal numbers and pentagonal numbers as a special case, and we show that $\sum_{k=1}^N q^{F(k)}$ can be computed using only $O(N / \log N)$ multiplications by this approach. Actually, the asymptotic complexity is slightly better than this: it is $N^{1 - c / \log \log N}$ for a certain constant $c > 0$, and this is better than $O(N / \log^r N)$ for any $r > 0$.

## The conclusion

To summarize, we have gone from $2N$ (the classical method) to $N + o(N)$ (short addition sequences) to $O(N / \log N)$ (baby-step giant-step) multiplications for summing the first $N$ terms in the $q$-series of the Dedekind eta function or the Jacobi theta function constants.

Asymptotically, arithmetic-geometric mean (AGM) iteration is better than evaluation of the $q$-series, as it only costs $O(\log N)$ multiplications. However, the AGM method has a large implied big-O constant, so the $q$-series are usually faster in practice, particularly when optimized.The end of the paper presents some benchmark results. Here is a copy of the final table, which shows the time in seconds to compute one value of the Dedekind eta function to varying levels of precision:

Bits | Pari/GP | CM-0.2.1 | AGM | New CM-0.3 |
New Arb |

$10^4$ | 0.00869 | 0.00457 | 0.00789 | 0.00297 | 0.00272 |

$10^5$ | 0.654 | 0.284 | 0.245 | 0.164 | 0.147 |

$10^6$ | 29.9 | 10.9 | 6.78 | 4.77 | 4.85 |

$10^7$ | 1310 | 440 | 124 | 150 | 151 |

Both CM-0.3 and Arb use the new baby-step giant-step algorithm, and have nearly identical timings. The old CM-0.2.1 uses a short addition sequence to evaluate the $q$-series using approximately one multiplication per term. Pari/GP uses the classical method. In practice, about a factor two is saved both when going from Pari/GP to CM-0.2.1. and from CM-0.2.1 to CM-0.3.

An implementation of the AGM method is also timed for comparison. Note that the crossover for the AGM method is less than $10^4$ with the unoptimized $q$-series evaluation in Pari/GP but close to $10^7$ bits with the optimized $q$-series evaluation in the latest CM and Arb.

In other settings, baby-step giant-step techniques can easily save a factor 100. The reason why the improvement is not so dramatic here is that the $q$-series are so sparse that it takes very little work to evaluate them even with naive methods. There just isn't much work left to remove by being clever! Nonetheless, in applications where we need tens of thousands of function values, saving a small factor can make a difference.

This paper generated a few interesting "new" integer sequences. We should probably submit them to OEIS...