# New partition function record: p(1020) computed

March 2, 2014

The partition function p(n) counts the number of partitions of n, i.e. the number of ways n can be written as a sum of positive integers when disregarding the order of the terms. For example, 4 has the five distinct partitions 4, 3+1, 2+2, 2+1+1, 1+1+1+1, so p(4) = 5. This week, I set a new record by computing the exact number of partitions of n = 1020 = 100,000,000,000,000,000,000 (100 quintillion). The number p(1020) has more than 10 billion digits (11,140,086,260 digits to be precise), starting with

1838176508344882643646057515196394970366128860187133818794921830680916179355851922605087258953579721...

and ending with
...9597661250174602479861524302262001955970770703287582462984472325700899198905833521126231756788091448


The computation was done using git versions of Arb (slightly tweaked) and FLINT, built against MPIR 2.6.0 and MPFR 3.1.1. It took four and a half days on a system with a Xeon E5-2650 CPU and 256 GB of RAM (the "popeye" server at RISC).

## Why?

In 1946, banknotes of 1020 Hungarian pengő were issued as a result of hyperinflation. Now, the world finally has the answer to the pressing question: in how many different ways can you make change for a 1020-pengő banknote using stacks of 1-pengő coins? = +   +   + 100000000000000000000 1 3 3 99999999999999999993

Illustration: one of the approximately 1.8381765 ×1011,140,086,259 partitions of 1020. Image credits: 1, 2 (CC-BY-SA).

The skeptical scientist might be tempted to verify the result experimentally, by explicitly counting all the possible ways to make change. The pre-war 1-pengő coin has a diameter of 23 mm and a thickness of 1.5 mm. A single vertical stack of 1020 coins requires 16 light-years of headroom, so you would have to choose a location for the experiment that stays clear of various celestial bodies such as Proxima Centauri (4.24 light-years away). Even if you split the stacks up, the total coin volume of 68720 cubic kilometers (assuming a circle-packing ratio of 0.9069) is sufficient to fill Lake Superior five times over and still have change left. The silver content of the coins alone would make the budget for such a project very difficult to justify to many funding agencies. Besides, even using your favorite cryptocurrency instead of physical coins, you would not have time to go through a significant fraction of the (more than) 1011,140,086,259 partitions before the presumed heat death of the universe sets in 10150 years from now.

More seriously, a motivation for computing the partition function is to study congruences. Ramanujan famously discovered the identities $$p(5k+4)\equiv 0 \pmod {5}$$ $$p(7k+5)\equiv 0 \pmod {7}$$ $$p(11k+6)\equiv 0 \pmod {11}$$ when inspecting a table of p(n) for n up to 200, which MacMahon had computed by hand! Today an extensive body of knowledge exists about partition congruences (they are closely related to modular forms), but several important problems are open, and experimental investigations could potentially lead to new insights. With current technology, the most efficient way to determine p(n) modulo a small integer is to compute the full value p(n) and then reduce it.

Calculating the partition function is also a quite nice benchmark problem for number theory software. The basic idea is to compute a sufficiently precise numerical approximation of p(n) from an infinite series (the Hardy-Ramanujan-Rademacher formula), and rounding to the nearest integer. The Hardy-Ramanujan-Rademacher formula states that $$p(n)=\frac{1}{\pi \sqrt{2}} \sum_{k=1}^\infty \sqrt{k}\, A_k(n)\, \frac{d}{dn} \left( \frac {1} {\sqrt{n-\frac{1}{24}}} \sinh \left[ \frac{\pi}{k} \sqrt{\frac{2}{3}\left(n-\frac{1}{24}\right)}\right] \right)$$ where $A_k(n)$ is a certain sum over roots of unity with a complicated definition.

This requires arbitrary-precision arithmetic, including the ability to compute $\pi$ and the exponential function $\exp(x)$ to high precision (in this case over 10 billion digits). In fact, the computation requires a small number of arithmetic operations at very high precision and a large number of operations at decreasing precision, with roughly equal time spent doing arithmetic with numbers of all sizes. It also requires doing a lot of arithmetic with word-size integers (modular arithmetic, factoring, GCDs, modular square roots, etc.). The same core operations are used for all kinds of tasks in computational number theory, so optimizing and stress-testing them is useful in a general sense.

## Correctness

Since numerical approximations are used, it's a challenge to guarantee that the result actually is correct. For example, past versions of Maple gave the wrong value of p(n) for n = 11269, 11566, 12376, 12430..., presumably due to floating-point roundoff error and/or too few terms in the Hardy-Ramanujan-Rademacher series being used.

To avoid such problems, my implementation uses ball (mid-rad interval) arithmetic to bound the contribution of all rounding errors. The error that results from truncating the infinite series after a finite number of terms is also accounted for (using Rademacher's rigorous error bound). The final numerical result is a ball $[m-\varepsilon, m+\varepsilon]$ guaranteed to contain p(n), and the code checks that this interval contains exactly one integer, which then provably is the correct value. Of course, "provably correct" comes with the usual caveat that there could be bugs in the code (or even hardware errors). In this computation, the distance between the approximation $m$ and the nearest integer was smaller than 10−10, which provides a strong consistency check.

With computations of this magnitude, all kinds of things risk breaking. For example, after dumping p(1020) to a file, reading it back gave a different value! It turns out that the size information written by the GMP/MPIR mpz_out_raw function overflows when the number takes more than 232 bytes. Luckily, mpz_out_raw still writes all the extra bytes, so the correct value could be recovered (just not directly with mpz_inp_raw).

## Software improvements

The computation was done with the same general algorithm as the previous record of p(1019), described in my 2012 paper, but with completely rewritten code for numerical evaluation. The implementation used for the 2011 record (still available in FLINT and easily accessed via Sage) uses floating-point arithmetic along with (incomplete) a priori error bounds. Deriving error bounds this way was cumbersome, and interfered with code and algorithm optimizations (changing the algorithm required doing a new analysis). This was one of my main motivations for developing Arb and reimplementing the partition function there.

The algorithm has time and space complexity O(n0.5+ε), which is softly optimal since p(n) has Θ(n0.5) bits. In theory, p(1020) is therefore 100.5 ≈ 3.16 times more expensive to compute than p(1019). If you account for the nε factor, the time increases by a factor closer to 4 in practice when n increases by a factor 10. Memory increases quite precisely by a factor 100.5. In fact, the computation of p(1020) only required 110 hours and 130 GB of memory, comparable to the 100 hours and 150 GB used for the computation of p(1019) done in 2011. Thus the efficiency (both time and space) has improved by more than a factor three. Computing p(1019) is now about as expensive as computing p(1018) used to be, and so on. Some of the speedup (about 30%) is simply due to using faster hardware, but most of it is due to the following improvements:

• I wrote a new high-precision implementation of the exponential function (commit 1, 2), requiring 25% less time and 75% less memory than the old implementation (which called MPFR). This change reduced the memory requirements for the partition function by two thirds, making p(1020) achievable on a system with 256 GB memory without swapping. There is also slightly faster code in Arb for low-precision exponentials.
• Another speedup came from using a more efficient algorithm to compute roots of unity.
• Since ball arithmetic is used instead of a priori error estimates to check that the final numerical approximation is close enough to an integer, the number N of terms in the Hardy-Ramanujan-Rademacher series is chosen slightly less conservatively (further optimization is possible here).
• I also used the latest version (2.6.0) of MPIR, benefiting from Bill Hart's new Schönhage-Strassen FFT.
• Finally, I cut the wall time nearly in half by using two threads (one thread for the first 16 terms in the Hardy-Ramanujan-Rademacher series, and one for all the remaining terms). I kept randomly getting "NaN"s as output when I tried writing multithreaded code 2011, but this time everything seems to work!

Not everything has gotten faster. Computing the error bounds automatically at runtime adds some overhead. The original FLINT implementation also uses hardware floating-point arithmetic and libm transcendental functions for low-precision calculations, which Arb avoids in order to guarantee the error bounds (actually, the libm log is used in one place right now, but in a safe way). These de-optimizations only turn out to make a significant difference for small n, up to 108 or so.

Since 1020 exceeds the maximum value of a 64-bit unsigned integer (264−1 ≈ 1.8 × 1019), I had to change the implementation to take a multiprecision integer as input. This is a trivial edit and doesn't cause any slowdown, because all "index" variables in the algorithm are smaller than some fixed multiple of n0.5 and thus comfortably fit in a single word on a 64-bit system. As a byproduct, the implementation now works for input greater than 232−1 ≈ 4.3 × 109 on 32-bit systems. I tested computing p(1010), p(1011), ..., p(1016) on a 32-bit system, and all came out with the correct value. The fact that nothing broke when n crossed the word size boundary on a 32-bit system provided some reassurance that the word size boundary wouldn't pose a problem on a 64-bit system either. I haven't tested what p(1017) does to a 32-bit system yet; presumably, malloc will fail at some point since the computation requires more than 4 GB of memory.

I will push the latest modifications to the partition function code to the Arb git repository in the near future.

## Going further

At the moment, my implementation can't be used to compute p(1021). The limitation is the GMP/MPIR mpz type, which overflows around 40 billion digits. The output would have 35 billion digits, and the internal calculations involve numbers up to twice as large.

To compute bigger values, you would also want better parallelization (the edit-run-debug cycle gets cumbersome when the "run" step starts taking weeks). The "tail" in the Hardy-Ramanujan-Rademacher series parallelizes embarrassingly well, but the first term takes nearly 1/2 of the total CPU time, so by Amdahl's law, it's only possible to gain roughly a factor two by distributing the terms. There is an "easy" way to parallelize the computation of the first term, but it would use much more memory. What you really want is parallel (and memory-efficient) FFT code to split individual bignum multiplications across several cores. You also need multithreaded code for the leaf computations in the binary splitting algorithms for $\pi$ and exp, but that's trivial to implement.

With a more sophisticated implementation, present-day hardware should allow computing numbers at least as large as p(1024) (1.1 trillion digits). The $\pi$ record is 12 trillion digits (set by Shigeru Kondo and Alexander Yee over 94 days in 2013). Computing the partition function to a similar number of digits as $\pi$ requires slightly more memory (or swap space), and 10-20 times more computation time.

## Computation statistics

The partition function was computed for a range of powers of ten to get an accurate picture of the efficiency, and to compare results with the previous computation up to n = 1019 (see the paper for the old timings).

n Digits N Mem (MB) Wall (s) CPU (s) T1 (s) T2 (s) Pi (s) Exp (s)
108 11132 3328 2.8 0.037 0.07 0.031 0.032 0.004 0.016
109 35219 9721 3.05 0.237 0.29 0.061 0.227 0.008 0.035
1010 111391 28601 4.89 0.441 0.75 0.307 0.429 0.039 0.184
1011 352269 84632 9.82 1.834 3.29 1.449 1.805 0.168 0.841
1012 1113996 251600 19.82 6.247 10.39 6.169 4.159 1.004 3.475
1013 3522791 750925 61.31 25.637 43.06 25.436 17.435 3.282 15.834
1014 11140072 2248842 202.81 104.07 188.33 103.57 84.346 12.389 66.326
1015 35228031 6754864 553.77 441.603 845.42 404.175 440.417 47.035 263.662
1016 111400846 20343453 1469.73 1614.84 2858.31 1611.65 1244.68 194.188 1099.58
1017 352280442 61413562 4593.42 7398.2 13643.7 6251.35 7389.06 762.067 4167.94
1018 1114008610 185795691 12937.41 23354.6 44330.3 23327.1 20997 2913.12 16171.8
1019 3522804578 563188404 38881.84 87825.3 170934 87742.6 83184.9 10860.9 61796.3
1020 11140086260 1710193158 131071 399206 738425 339610 398959 42394.2 239099

Digits: number of decimal digits in p(n)
N: number of terms used in the Hardy-Ramanujan-Rademacher series
Mem: peak memory usage (resident set) during the computation
Wall: total wall time of the computation
CPU: total CPU time of the computation
T1: time for thread 1 (terms 1 to 16)
T2: time for thread 2 (terms 17 to N)
Pi: time to compute $\pi$ to full precision (T1)
Exp: time to compute $\exp(x)$ to full precision (T1)

Edit: Ondrej Certik has put up some plots of the timing data.

## Data about p(1020)

The full number is not currently available for download, as it is 4.6 GB in packed binary format. If you really want a copy, contact me.

First (most significant) 1000 digits:

18381765083448826436460575151963949703661288601871
33818794921830680916179355851922605087258953579721
40223563322445693766015017321098501101538956018970
56750704881757794605360897959220562692957211960097
79165931101694124929410280437321791578706853243973
22343094930548700253436959560776678370296907881368
37733272436267730567242440339288196974095792769740
01925654844091988401959507036970411771619993123017
55657669949970363935278042362879420060693386042760
13262928827958224606797122816605218568004925911126
29907354977685310316372900346545431298736807328236
50133058574692900439957866102150361523427469350908
59399440163148946448629438752730130574754978538031
93701800260362511405139309272932783176699960093172
44722089087499304505113827984945896266787952281116
32461474616064076042784673307459319183309808217053
62780971602345567755068535553357149769564868882234
06599266703753282060975654469606040406400277593325
73615936642247999221376534428162036372745061861599
90865952198639022438268000053417742912214985581755


Last 1000 digits:

78197224010162778696625071481567773251610572738944
98214946475632569813210181967142533195900338936713
00729210224771807814522142198236195554260002554854
55685193690894876043353240892759544284884999526365
17052321124780138610958503741195160905409982613603
99012436018076726035176887245814122589275259511505
76750908721608880594390177630345942812168691609194
42278684582644873256506162684010823277784461258669
33049338620374335600858723383185679297378002523565
35242780532986925919014811158854151260764076718769
12107459394370915408731078173696015862413625559934
64357221260733772591049732717331101157274865785996
97365738341459496183914647648616383023616407003926
60829384832774130551031058595191226473452491491856
86443513570173392572278853185935691354917765496681
36185969884296385252170324208992786041520483060786
38273697959512945079613267334079249909549771893343
24588609360545342374268332247926111640460856581625
95976612501746024798615243022620019559707707032875
82462984472325700899198905833521126231756788091448


Digit distribution (fun fact: radix conversion takes over 3 hours):

Digit Occurrences
0 1114003393
1 1114004671
2 1114044302
3 1113995927
4 1114027762
5 1114020496
6 1113993310
7 1113982144
8 1114004097
9 1114010158

First 1000 hexadecimal digits (total hex length: 9251641382):

116045344298e737ba5da19152a5285e35ca8d47406081d91a
0289604dbb4c5d1ebe513da006ddeaa116a22b76903822c9db
1797a70dcc6a5a6976ce89a7fd4bb8c2db5192b5e7ff04fcc5
a571dd6250eae6be39c2f0c1b22616e171300d16b362ac4fc7
051c6e5a1115c075316c51425c666a84467557ab501e85d193
d1c23b23d6a483cc8d1413a99bb4e4f066a0cd803bed7ab832
a6f2eaa2fbd0fdbf867967b4cd57d4346705746c4a179788d7
1190dfc06ef7ef7a2b6bf747ff68e6296225e450bb97db4748
32d3eb270c028857f511b197e7b25ae0ab946eaf8aa5b08ce2
68d4a5c633d699ace4f38b9b12ecbdd7be643323f3532a863a
e99bfe37416761902a6a5aebc62b9e9e310d87fb3436cf4b1d
0c37c67f7aa6daf052214d6768a0e9ab93f2dab3d98d6e7954
d66e161a5b0eb3834d24db0d8a7dc9d02a861337ebec308b41
ab8704d940850892f55fddbe15676981f70d4ddc2c713e37c5
cb1417f15322ea5993f268227251fc43f5de135a876fd42fdc
028bc371ec5164bd01b8be057822692c8ca5bafbd08f7f4910
04e294464ec7f4f55a8a38e3656997765b5e19ae774b6cdc9d


Last 1000 hexadecimal digits:

73ba2f6029363349801fc09a4b0273afd1a29296b76867b997
c28461ae194cd690d2f32b73d4e4798f42c27549927d2a4d72
4616b066c4d681f89275601d4a2a74054ca372bf994d590142
a3a8ca2fde857e286fbdca0636f22b315406da61738d94ef9b
758be0447d67e032b7901867143cf9041b63986faab0cec017
ed9c8c26157be06d8eeb44572402db6f4681e9158fea0e97e7
1e760ecbf494c6399b6bb74b8c49872a855f3dc2207a3f1426
5ba77121fb3cac8dc1346d37dd8ea1a6b2c1c499965621f4fa
b8e98e94c0faf557860c74ce133dcc22859363920fb1bbe05c
5993c92f0d65192b126ba993d596b7c1360c4b38338b1f9092
8f5e62ede86157db408e0264c3bf27f4724c76b1ab1b80f58b
94cd970e119be5eaf6d4a4c2b88f2846e81294e8b98c5cf7fe
924213c507a14ac7f23a86ac3e7c953d5a9e931e28ea50c1ae
77499d3afd1e43653b29fd563045da4f08ff0a47462ec262cc