fredrikj.net / blog /

Faster high-precision arctangents

December 7, 2014

A quick Arb development update here. Recently, I wrote a fast implementation of the arctangent function for precision up to roughly 4096 bits. At higher precision, the code would fall back to MPFR. I have now written a custom implementation for the greater-than-4096-bit range. For an argument close to unity, the new implementation is 1.5 to 1.8 times faster than MPFR and asymptotically uses 1/3 as much memory. Some timing measurements (with $x = \exp(1)/3$):

Precision (bits) Old (s) New (s) Speedup
4096 0.000191 0.000189 1.01
8192 0.00147 0.000871 1.69
16384 0.00451 0.00274 1.65
32768 0.0151 0.00861 1.75
65536 0.0554 0.0308 1.80
131072 0.162 0.0928 1.75
262144 0.492 0.278 1.77
524288 1.31 0.757 1.73
1048576 3.2 2.01 1.59
2097152 7.54 4.97 1.52
4194304 17.8 11.9 1.49
8388608 43.2 28.6 1.51
16777216 102 68.6 1.48
33554432 246 165 1.49

How is $\operatorname{atan}(x)$ computed? The first step is to replace $x$ by $1/x$ if $|x| > 1$. Secondly, we apply the argument-halving formula $$\operatorname{atan}(x) = 2 \operatorname{atan}\left(\frac{x}{1+\sqrt{1+x^2}}\right)$$ up to 8 times, that is until $|x| \le 2^{-8}$. The number 8 is a tuning parameter: it is sufficient to do a single step, but doing up to 8 steps turns out to be optimal at any precision between a few thousand bits and ten million bits. Next, we use the "bit-burst algorithm", repeatedly writing $$\operatorname{atan}(x) = \operatorname{atan}(u/2^r) + \operatorname{atan}(x')$$ where $$x' = \frac{x 2^r-u}{2^r+u x}$$ for $r = 16, 32, 64, \ldots$ to reduce the problem to evaluating $\operatorname{atan}(u/2^r)$ where $u$ is an integer with at most $r / 2$ bits. Each such arctangent is computed using binary splitting, and the power-of-two steps guarantee that this scheme has softly optimal asymptotic complexity.

The speedup is largely due to the following three differences compared to MPFR: