Python 3.1 twice as fast as 3.0
February 19, 2009
For large-integer arithmetic, that is. Mark Dickinson has beeen working on a patch that changes the internal representation of long integers from 15-bit to 30-bit digits, and there is an additional patch that adds a few algorithmic optimizations, written by Mark and Mario Pernici. Likely both patches will pass review and be included in Python 3.1. I asked Mark to run the mpmath unit tests to find out how big a difference the patches make, which he kindly did. Here are the results:
unpatched (15-bit digits): 42.91 seconds
patched (30-bit digits): 40.57 seconds
patched+optimized: 30.15 seconds
unpatched: 38.39 seconds
patched: 22.36 seconds
patched+optimized: 21.55 seconds
patched+optimized+bitlength: 20.20 seconds
The last result includes the use of the new int.bit_length() method (which I had a small part in implementing) instead of the pure-Python version in mpmath.
It’s not surprising that the patches make high-precision arithmetic much faster, but most of the mpmath tests actually work at low precision and depend mostly on general speed of the Python interpreter. There the speedup is perhaps of order 0-30%. For the tests working at very high precision, the improvement is a factor 4 or 5 with the 64-bit build. Then there are a number of tests in between. With some imagination, all unit tests together provide a plausible estimate of actual performance for a wide range of applications.
An excerpt of before/after results for some particular tests, comparing 64-bit unpatched and 64-bit patched+optimized+bitlength:
# Only tests 53-bit precision
double_compatibility ok 1.3149240 s
double_compatibility ok 1.1979949 s
# Logarithms up to 10,000 digits
log_hp ok 4.0845711 s
log_hp ok 0.7967579 s
# Large Bernoulli numbers
bernoulli ok 6.8625491 s
bernoulli ok 1.4261260 s
# Gamma function, high precision
gamma_huge_2 ok 2.4907949 s
gamma_huge_2 ok 0.5781031 s
# Numerical quadrature, 15 to 100 digit precision
basic_integrals ok 3.0117619 s
basic_integrals ok 1.8687689 s
Mpmath does not yet run on Python 3.x; the benchmarking was made with a version of mpmath hacked slightly for the tests to pass. It would be nice to provide a 3.x compatible version of mpmath soon. The 2to3 tool fortunately handles almost all necessary patching; a handful of fairly simple additional changes need to be made. The most annoying problem is the lack of cmp (and particularly the cmp argument for sorted), which has no trivial workaround, but still should not be too hard to fix. In any case, it seems likely that the 30-bit patch will also be backported to Python 2.7, so most users should be able to take advantage of it.
It would be interesting to compare these benchmarks with the GMPY mode in mpmath. GMPY too provides a ~2x speedup for the unit tests. (This factor would be much larger if tests at even higher precision were included. At a million digits, GMPY is orders of magnitude faster than pure Python.) Originally GMPY only sped up the very high precision tests, and was otherwise slower than pure Python, probably due to Python’s internal optimizations for int and long instances. This disadvantage was eliminated by implementing some helper functions for mpmath in C in GMPY. Mario Pernici has recently worked on further optimizations along the same lines, which should substantially improve low-precision performance.