# Division: the sequel (with bonus material)

July 27, 2008

I have neglected to write a GSoC update for two weeks (where “two” was computed using round-to-floor), so let me first apologize for that.

In my last post, I wrote about how Newton’s method (coupled with a fast multiplication algorithm, such as Karatsuba multiplication) can be used for asymptotically fast division of long integers. I have now fixed up the code to compute a correctly rounded quotient. In fact, it now performs the full divmod operation, returning both the quotient and a correct remainder.

The trick is to perform an extra exact multiplication at the end to determine the remainder. By definition, the quotient r of p/q is correct if the remainder m = pr·q satisfies 0 ≤ m < q. If this inequality does not hold, one need only perform an additional divmod operation (which can be performed using standard long division, since the error will almost certainly fit in a single limb) to correct both the quotient and the remainder.

The extra multiplication causes some slowdown, but it’s not too bad. The new idivmod function still breaks even with the builtin divmod somewhere around 1000-2000 digits and is 10 times faster at half a million digits (i.e. when dividing a million digit number by a half-million digit number):

>>> time_division(int, 2.0, 20)      size     old time     new time       faster  correct------------------------------------------------------------        16     0.000008     0.000052     0.155080     True        32     0.000005     0.000052     0.102151     True        64     0.000008     0.000059     0.132075     True       128     0.000013     0.000070     0.190476     True       256     0.000035     0.000107     0.325521     True       512     0.000126     0.000215     0.583658     True      1024     0.000431     0.000532     0.810399     True      2048     0.001855     0.001552     1.195104     True      4096     0.007154     0.005050     1.416708     True      8192     0.028505     0.015449     1.845033     True     16384     0.111193     0.046938     2.368925     True     32768     0.443435     0.142551     3.110706     True     65536     1.778292     0.432412     4.112497     True    131072     7.110184     1.305771     5.445200     True    262144    28.596926     3.919399     7.296253     True    524288   116.069764    11.804032     9.833061     True

As a bonus, a fast divmod also provides a fast way to convert long integers to decimal strings, by recursively splitting n in half using L, R = divmod(n, 10b/2) where b is the number of digits in n:

>>> time_str(int, 20)      size     old time     new time       faster  correct------------------------------------------------------------        16     0.000005     0.000013     0.413043     True        32     0.000006     0.000009     0.588235     True        64     0.000008     0.000012     0.674419     True       128     0.000020     0.000023     0.865854     True       256     0.000059     0.000133     0.442105     True       512     0.000204     0.000333     0.613255     True      1024     0.000895     0.001194     0.749708     True      2048     0.003505     0.002252     1.556824     True      4096     0.013645     0.006600     2.067429     True      8192     0.052386     0.018334     2.857358     True     16384     0.209164     0.052233     4.004412     True     32768     0.834201     0.153238     5.443827     True     65536     3.339629     0.450897     7.406639     True    131072    13.372223     1.339044     9.986392     True    262144    53.547894     3.998352    13.392491     True    524288   214.847486    11.966933    17.953429     True

So printing an integer with half a million digits takes 12 seconds instead of 3.5 minutes. This can be quite a usability improvement.

The code can be found in this file: div.py.

I have now submitted a request for these algorithms to be implemented in the Python core. Since the pure Python implementation is very simple, I don’t think porting it to C would be all that difficult. By “request”, I mean that I might eventually do it myself, but there are many others who are much more familiar with both the Python codebase and the C language, and if any of those persons happens to have a few free hours, they could certainly do it both faster and better. If you are interested in helping out with this, please post to the issue tracker.

The division code is just one of several small projects I’ve been working on lately. Basically, I’ve found that there are some arithmetic functions that are needed very frequently in all kinds of mathematical code. These include:

• Pure convenience functions like sign, product, …
• Bit counting