fredrikj.net / blog /

# Python 3.1 twice as fast as 3.0

*February 19, 2009*

For large-integer arithmetic, that is. Mark Dickinson has beeen working on a patch that changes the internal representation of long integers from 15-bit to 30-bit digits, and there is an additional patch that adds a few algorithmic optimizations, written by Mark and Mario Pernici. Likely both patches will pass review and be included in Python 3.1. I asked Mark to run the mpmath unit tests to find out how big a difference the patches make, which he kindly did. Here are the results:

32-bit build:

unpatched (15-bit digits): 42.91 seconds

patched (30-bit digits): 40.57 seconds

patched+optimized: 30.15 seconds

64-bit build:

unpatched: 38.39 seconds

patched: 22.36 seconds

patched+optimized: 21.55 seconds

patched+optimized+bitlength: 20.20 seconds

The last result includes the use of the new int.bit_length() method (which I had a small part in implementing) instead of the pure-Python version in mpmath.

It’s not surprising that the patches make high-precision arithmetic much faster, but most of the mpmath tests actually work at low precision and depend mostly on general speed of the Python interpreter. There the speedup is perhaps of order 0-30%. For the tests working at very high precision, the improvement is a factor 4 or 5 with the 64-bit build. Then there are a number of tests in between. With some imagination, all unit tests together provide a plausible estimate of actual performance for a wide range of applications.

An excerpt of before/after results for some particular tests, comparing 64-bit unpatched and 64-bit patched+optimized+bitlength:

# Only tests 53-bit precision

double_compatibility ok 1.3149240 s

double_compatibility ok 1.1979949 s

# Logarithms up to 10,000 digits

log_hp ok 4.0845711 s

log_hp ok 0.7967579 s

# Large Bernoulli numbers

bernoulli ok 6.8625491 s

bernoulli ok 1.4261260 s

# Gamma function, high precision

gamma_huge_2 ok 2.4907949 s

gamma_huge_2 ok 0.5781031 s

# Numerical quadrature, 15 to 100 digit precision

basic_integrals ok 3.0117619 s

basic_integrals ok 1.8687689 s

Mpmath does not yet run on Python 3.x; the benchmarking was made with a version of mpmath hacked slightly for the tests to pass. It would be nice to provide a 3.x compatible version of mpmath soon. The 2to3 tool fortunately handles almost all necessary patching; a handful of fairly simple additional changes need to be made. The most annoying problem is the lack of `cmp` (and particularly the `cmp` argument for `sorted`), which has no trivial workaround, but still should not be too hard to fix. In any case, it seems likely that the 30-bit patch will also be backported to Python 2.7, so most users should be able to take advantage of it.

It would be interesting to compare these benchmarks with the GMPY mode in mpmath. GMPY too provides a ~2x speedup for the unit tests. (This factor would be much larger if tests at even higher precision were included. At a million digits, GMPY is orders of magnitude faster than pure Python.) Originally GMPY only sped up the very high precision tests, and was otherwise slower than pure Python, probably due to Python’s internal optimizations for `int` and `long` instances. This disadvantage was eliminated by implementing some helper functions for mpmath in C in GMPY. Mario Pernici has recently worked on further optimizations along the same lines, which should substantially improve low-precision performance.