fredrikj.net / blog /
Faster high-precision arctangents
December 7, 2014
A quick Arb development update here. Recently, I wrote a fast implementation of the arctangent function for precision up to roughly 4096 bits. At higher precision, the code would fall back to MPFR. I have now written a custom implementation for the greater-than-4096-bit range. For an argument close to unity, the new implementation is 1.5 to 1.8 times faster than MPFR and asymptotically uses 1/3 as much memory. Some timing measurements (with $x = \exp(1)/3$):
Precision (bits) | Old (s) | New (s) | Speedup |
4096 | 0.000191 | 0.000189 | 1.01 |
8192 | 0.00147 | 0.000871 | 1.69 |
16384 | 0.00451 | 0.00274 | 1.65 |
32768 | 0.0151 | 0.00861 | 1.75 |
65536 | 0.0554 | 0.0308 | 1.80 |
131072 | 0.162 | 0.0928 | 1.75 |
262144 | 0.492 | 0.278 | 1.77 |
524288 | 1.31 | 0.757 | 1.73 |
1048576 | 3.2 | 2.01 | 1.59 |
2097152 | 7.54 | 4.97 | 1.52 |
4194304 | 17.8 | 11.9 | 1.49 |
8388608 | 43.2 | 28.6 | 1.51 |
16777216 | 102 | 68.6 | 1.48 |
33554432 | 246 | 165 | 1.49 |
How is $\operatorname{atan}(x)$ computed? The first step is to replace $x$ by $1/x$ if $|x| > 1$. Secondly, we apply the argument-halving formula $$\operatorname{atan}(x) = 2 \operatorname{atan}\left(\frac{x}{1+\sqrt{1+x^2}}\right)$$ up to 8 times, that is until $|x| \le 2^{-8}$. The number 8 is a tuning parameter: it is sufficient to do a single step, but doing up to 8 steps turns out to be optimal at any precision between a few thousand bits and ten million bits. Next, we use the "bit-burst algorithm", repeatedly writing $$\operatorname{atan}(x) = \operatorname{atan}(u/2^r) + \operatorname{atan}(x')$$ where $$x' = \frac{x 2^r-u}{2^r+u x}$$ for $r = 16, 32, 64, \ldots$ to reduce the problem to evaluating $\operatorname{atan}(u/2^r)$ where $u$ is an integer with at most $r / 2$ bits. Each such arctangent is computed using binary splitting, and the power-of-two steps guarantee that this scheme has softly optimal asymptotic complexity.
The speedup is largely due to the following three differences compared to MPFR:
- In the argument-halving stage, the argument is kept as a numerator-denominator pair until the end, which saves a division in each iteration. This gives a 15% speedup.
- MPFR uses a different form of the bit-burst algorithm, which amounts to integrating a differential equation (the algorithm is described in Modern Computer Arithmetic) instead of using a functional equation for the arctangent. It appears that the functional equation approach is faster. I got the idea to use the functional equation from these presentation slides by Richard Brent.
- The binary splitting is done recursively, rather than iteratively.
fredrikj.net | Blog index | RSS feed | Follow me on Mastodon | Become a sponsor