New partition function record: p(10²⁰) computed

March 2, 2014

The partition function p(n) counts the number of partitions of n, i.e. the number of ways n can be written as a sum of positive integers when disregarding the order of the terms. For example, 4 has the five distinct partitions 4, 3+1, 2+2, 2+1+1, 1+1+1+1, so p(4) = 5. This week, I set a new record by computing the exact number of partitions of n = 10²⁰ = 100,000,000,000,000,000,000 (100 quintillion). The number p(10²⁰) has more than 10 billion digits (11,140,086,260 digits to be precise), starting with

1838176508344882643646057515196394970366128860187133818794921830680916179355851922605087258953579721...

and ending with

...9597661250174602479861524302262001955970770703287582462984472325700899198905833521126231756788091448

The computation was done using git versions of Arb (slightly tweaked) and FLINT, built against MPIR 2.6.0 and MPFR 3.1.1. It took four and a half days on a system with a Xeon E5-2650 CPU and 256 GB of RAM (the "popeye" server at RISC).

Why?

In 1946, banknotes of 10²⁰ Hungarian pengő were issued as a result of hyperinflation. Now, the world finally has the answer to the pressing question: in how many different ways can you make change for a 10²⁰-pengő banknote using stacks of 1-pengő coins?

	=		+		+		+
100000000000000000000	=	1	+	3	+	3	+	99999999999999999993

Illustration: one of the approximately 1.8381765 ×10^{11,140,086,259} partitions of 10²⁰. Image credits: 1, 2 (CC-BY-SA).

The skeptical scientist might be tempted to verify the result experimentally, by explicitly counting all the possible ways to make change. The pre-war 1-pengő coin has a diameter of 23 mm and a thickness of 1.5 mm. A single vertical stack of 10²⁰ coins requires 16 light-years of headroom, so you would have to choose a location for the experiment that stays clear of various celestial bodies such as Proxima Centauri (4.24 light-years away). Even if you split the stacks up, the total coin volume of 68720 cubic kilometers (assuming a circle-packing ratio of 0.9069) is sufficient to fill Lake Superior five times over and still have change left. The silver content of the coins alone would make the budget for such a project very difficult to justify to many funding agencies. Besides, even using your favorite cryptocurrency instead of physical coins, you would not have time to go through a significant fraction of the (more than) 10^{11,140,086,259} partitions before the presumed heat death of the universe sets in 10¹⁵⁰ years from now.

More seriously, a motivation for computing the partition function is to study congruences. Ramanujan famously discovered the identities $$p(5k+4)\equiv 0 \pmod {5}$$ $$p(7k+5)\equiv 0 \pmod {7}$$ $$p(11k+6)\equiv 0 \pmod {11}$$ when inspecting a table of p(n) for n up to 200, which MacMahon had computed by hand! Today an extensive body of knowledge exists about partition congruences (they are closely related to modular forms), but several important problems are open, and experimental investigations could potentially lead to new insights. With current technology, the most efficient way to determine p(n) modulo a small integer is to compute the full value p(n) and then reduce it.

Calculating the partition function is also a quite nice benchmark problem for number theory software. The basic idea is to compute a sufficiently precise numerical approximation of p(n) from an infinite series (the Hardy-Ramanujan-Rademacher formula), and rounding to the nearest integer. The Hardy-Ramanujan-Rademacher formula states that $$p(n)=\frac{1}{\pi \sqrt{2}} \sum_{k=1}^\infty \sqrt{k}\, A_k(n)\,
\frac{d}{dn} \left(
\frac {1} {\sqrt{n-\frac{1}{24}}}
\sinh \left[ \frac{\pi}{k}
\sqrt{\frac{2}{3}\left(n-\frac{1}{24}\right)}\right]
\right)$$ where $A_k(n)$ is a certain sum over roots of unity with a complicated definition.

This requires arbitrary-precision arithmetic, including the ability to compute $\pi$ and the exponential function $\exp(x)$ to high precision (in this case over 10 billion digits). In fact, the computation requires a small number of arithmetic operations at very high precision and a large number of operations at decreasing precision, with roughly equal time spent doing arithmetic with numbers of all sizes. It also requires doing a lot of arithmetic with word-size integers (modular arithmetic, factoring, GCDs, modular square roots, etc.). The same core operations are used for all kinds of tasks in computational number theory, so optimizing and stress-testing them is useful in a general sense.

Correctness

Since numerical approximations are used, it's a challenge to guarantee that the result actually is correct. For example, past versions of Maple gave the wrong value of p(n) for n = 11269, 11566, 12376, 12430..., presumably due to floating-point roundoff error and/or too few terms in the Hardy-Ramanujan-Rademacher series being used.

To avoid such problems, my implementation uses ball (mid-rad interval) arithmetic to bound the contribution of all rounding errors. The error that results from truncating the infinite series after a finite number of terms is also accounted for (using Rademacher's rigorous error bound). The final numerical result is a ball $[m-\varepsilon, m+\varepsilon]$ guaranteed to contain p(n), and the code checks that this interval contains exactly one integer, which then provably is the correct value. Of course, "provably correct" comes with the usual caveat that there could be bugs in the code (or even hardware errors). In this computation, the distance between the approximation $m$ and the nearest integer was smaller than 10⁻¹⁰, which provides a strong consistency check.

With computations of this magnitude, all kinds of things risk breaking. For example, after dumping p(10²⁰) to a file, reading it back gave a different value! It turns out that the size information written by the GMP/MPIR mpz_out_raw function overflows when the number takes more than 2³² bytes. Luckily, mpz_out_raw still writes all the extra bytes, so the correct value could be recovered (just not directly with mpz_inp_raw).

Software improvements

The computation was done with the same general algorithm as the previous record of p(10¹⁹), described in my 2012 paper, but with completely rewritten code for numerical evaluation. The implementation used for the 2011 record (still available in FLINT and easily accessed via Sage) uses floating-point arithmetic along with (incomplete) a priori error bounds. Deriving error bounds this way was cumbersome, and interfered with code and algorithm optimizations (changing the algorithm required doing a new analysis). This was one of my main motivations for developing Arb and reimplementing the partition function there.

The algorithm has time and space complexity O(n^0.5+ε), which is softly optimal since p(n) has Θ(n^0.5) bits. In theory, p(10²⁰) is therefore 10^0.5 ≈ 3.16 times more expensive to compute than p(10¹⁹). If you account for the n^ε factor, the time increases by a factor closer to 4 in practice when n increases by a factor 10. Memory increases quite precisely by a factor 10^0.5. In fact, the computation of p(10²⁰) only required 110 hours and 130 GB of memory, comparable to the 100 hours and 150 GB used for the computation of p(10¹⁹) done in 2011. Thus the efficiency (both time and space) has improved by more than a factor three. Computing p(10¹⁹) is now about as expensive as computing p(10¹⁸) used to be, and so on. Some of the speedup (about 30%) is simply due to using faster hardware, but most of it is due to the following improvements:

I wrote a new high-precision implementation of the exponential function (commit 1, 2), requiring 25% less time and 75% less memory than the old implementation (which called MPFR). This change reduced the memory requirements for the partition function by two thirds, making p(10²⁰) achievable on a system with 256 GB memory without swapping. There is also slightly faster code in Arb for low-precision exponentials.
Another speedup came from using a more efficient algorithm to compute roots of unity.
Since ball arithmetic is used instead of a priori error estimates to check that the final numerical approximation is close enough to an integer, the number N of terms in the Hardy-Ramanujan-Rademacher series is chosen slightly less conservatively (further optimization is possible here).
I also used the latest version (2.6.0) of MPIR, benefiting from Bill Hart's new Schönhage-Strassen FFT.
Finally, I cut the wall time nearly in half by using two threads (one thread for the first 16 terms in the Hardy-Ramanujan-Rademacher series, and one for all the remaining terms). I kept randomly getting "NaN"s as output when I tried writing multithreaded code 2011, but this time everything seems to work!

Not everything has gotten faster. Computing the error bounds automatically at runtime adds some overhead. The original FLINT implementation also uses hardware floating-point arithmetic and libm transcendental functions for low-precision calculations, which Arb avoids in order to guarantee the error bounds (actually, the libm log is used in one place right now, but in a safe way). These de-optimizations only turn out to make a significant difference for small n, up to 10⁸ or so.

Since 10²⁰ exceeds the maximum value of a 64-bit unsigned integer (2⁶⁴−1 ≈ 1.8 × 10¹⁹), I had to change the implementation to take a multiprecision integer as input. This is a trivial edit and doesn't cause any slowdown, because all "index" variables in the algorithm are smaller than some fixed multiple of n^0.5 and thus comfortably fit in a single word on a 64-bit system. As a byproduct, the implementation now works for input greater than 2³²−1 ≈ 4.3 × 10⁹ on 32-bit systems. I tested computing p(10¹⁰), p(10¹¹), ..., p(10¹⁶) on a 32-bit system, and all came out with the correct value. The fact that nothing broke when n crossed the word size boundary on a 32-bit system provided some reassurance that the word size boundary wouldn't pose a problem on a 64-bit system either. I haven't tested what p(10¹⁷) does to a 32-bit system yet; presumably, malloc will fail at some point since the computation requires more than 4 GB of memory.

I will push the latest modifications to the partition function code to the Arb git repository in the near future.

Going further

At the moment, my implementation can't be used to compute p(10²¹). The limitation is the GMP/MPIR mpz type, which overflows around 40 billion digits. The output would have 35 billion digits, and the internal calculations involve numbers up to twice as large.

To compute bigger values, you would also want better parallelization (the edit-run-debug cycle gets cumbersome when the "run" step starts taking weeks). The "tail" in the Hardy-Ramanujan-Rademacher series parallelizes embarrassingly well, but the first term takes nearly 1/2 of the total CPU time, so by Amdahl's law, it's only possible to gain roughly a factor two by distributing the terms. There is an "easy" way to parallelize the computation of the first term, but it would use much more memory. What you really want is parallel (and memory-efficient) FFT code to split individual bignum multiplications across several cores. You also need multithreaded code for the leaf computations in the binary splitting algorithms for $\pi$ and exp, but that's trivial to implement.

With a more sophisticated implementation, present-day hardware should allow computing numbers at least as large as p(10²⁴) (1.1 trillion digits). The $\pi$ record is 12 trillion digits (set by Shigeru Kondo and Alexander Yee over 94 days in 2013). Computing the partition function to a similar number of digits as $\pi$ requires slightly more memory (or swap space), and 10-20 times more computation time.

Computation statistics

The partition function was computed for a range of powers of ten to get an accurate picture of the efficiency, and to compare results with the previous computation up to n = 10¹⁹ (see the paper for the old timings).

n	Digits	N	Mem (MB)	Wall (s)	CPU (s)	T1 (s)	T2 (s)	Pi (s)	Exp (s)
10⁸	11132	3328	2.8	0.037	0.07	0.031	0.032	0.004	0.016
10⁹	35219	9721	3.05	0.237	0.29	0.061	0.227	0.008	0.035
10¹⁰	111391	28601	4.89	0.441	0.75	0.307	0.429	0.039	0.184
10¹¹	352269	84632	9.82	1.834	3.29	1.449	1.805	0.168	0.841
10¹²	1113996	251600	19.82	6.247	10.39	6.169	4.159	1.004	3.475
10¹³	3522791	750925	61.31	25.637	43.06	25.436	17.435	3.282	15.834
10¹⁴	11140072	2248842	202.81	104.07	188.33	103.57	84.346	12.389	66.326
10¹⁵	35228031	6754864	553.77	441.603	845.42	404.175	440.417	47.035	263.662
10¹⁶	111400846	20343453	1469.73	1614.84	2858.31	1611.65	1244.68	194.188	1099.58
10¹⁷	352280442	61413562	4593.42	7398.2	13643.7	6251.35	7389.06	762.067	4167.94
10¹⁸	1114008610	185795691	12937.41	23354.6	44330.3	23327.1	20997	2913.12	16171.8
10¹⁹	3522804578	563188404	38881.84	87825.3	170934	87742.6	83184.9	10860.9	61796.3
10²⁰	11140086260	1710193158	131071	399206	738425	339610	398959	42394.2	239099

Digits: number of decimal digits in p(n)
N: number of terms used in the Hardy-Ramanujan-Rademacher series
Mem: peak memory usage (resident set) during the computation
Wall: total wall time of the computation
CPU: total CPU time of the computation
T1: time for thread 1 (terms 1 to 16)
T2: time for thread 2 (terms 17 to N)
Pi: time to compute $\pi$ to full precision (T1)
Exp: time to compute $\exp(x)$ to full precision (T1)

Edit: Ondrej Certik has put up some plots of the timing data.

Data about p(10²⁰)

The full number is not currently available for download, as it is 4.6 GB in packed binary format. If you really want a copy, contact me.

First (most significant) 1000 digits:

18381765083448826436460575151963949703661288601871
33818794921830680916179355851922605087258953579721
40223563322445693766015017321098501101538956018970
56750704881757794605360897959220562692957211960097
79165931101694124929410280437321791578706853243973
22343094930548700253436959560776678370296907881368
37733272436267730567242440339288196974095792769740
01925654844091988401959507036970411771619993123017
55657669949970363935278042362879420060693386042760
13262928827958224606797122816605218568004925911126
29907354977685310316372900346545431298736807328236
50133058574692900439957866102150361523427469350908
59399440163148946448629438752730130574754978538031
93701800260362511405139309272932783176699960093172
44722089087499304505113827984945896266787952281116
32461474616064076042784673307459319183309808217053
62780971602345567755068535553357149769564868882234
06599266703753282060975654469606040406400277593325
73615936642247999221376534428162036372745061861599
90865952198639022438268000053417742912214985581755

Last 1000 digits:

78197224010162778696625071481567773251610572738944
98214946475632569813210181967142533195900338936713
00729210224771807814522142198236195554260002554854
55685193690894876043353240892759544284884999526365
17052321124780138610958503741195160905409982613603
99012436018076726035176887245814122589275259511505
76750908721608880594390177630345942812168691609194
42278684582644873256506162684010823277784461258669
33049338620374335600858723383185679297378002523565
35242780532986925919014811158854151260764076718769
12107459394370915408731078173696015862413625559934
64357221260733772591049732717331101157274865785996
97365738341459496183914647648616383023616407003926
60829384832774130551031058595191226473452491491856
86443513570173392572278853185935691354917765496681
36185969884296385252170324208992786041520483060786
38273697959512945079613267334079249909549771893343
24588609360545342374268332247926111640460856581625
95976612501746024798615243022620019559707707032875
82462984472325700899198905833521126231756788091448

Digit distribution (fun fact: radix conversion takes over 3 hours):

Digit	Occurrences
0	1114003393
1	1114004671
2	1114044302
3	1113995927
4	1114027762
5	1114020496
6	1113993310
7	1113982144
8	1114004097
9	1114010158

First 1000 hexadecimal digits (total hex length: 9251641382):

116045344298e737ba5da19152a5285e35ca8d47406081d91a
0289604dbb4c5d1ebe513da006ddeaa116a22b76903822c9db
1797a70dcc6a5a6976ce89a7fd4bb8c2db5192b5e7ff04fcc5
a571dd6250eae6be39c2f0c1b22616e171300d16b362ac4fc7
051c6e5a1115c075316c51425c666a84467557ab501e85d193
1499a5f2ba1df7d3f1ad05b6a124275ab9a8ae68770190e559
d1c23b23d6a483cc8d1413a99bb4e4f066a0cd803bed7ab832
a6f2eaa2fbd0fdbf867967b4cd57d4346705746c4a179788d7
4a9ccafac78be12016fd5855afade01dfb9945bdc746f8ee7c
1190dfc06ef7ef7a2b6bf747ff68e6296225e450bb97db4748
32d3eb270c028857f511b197e7b25ae0ab946eaf8aa5b08ce2
68d4a5c633d699ace4f38b9b12ecbdd7be643323f3532a863a
e99bfe37416761902a6a5aebc62b9e9e310d87fb3436cf4b1d
0c37c67f7aa6daf052214d6768a0e9ab93f2dab3d98d6e7954
f39475e11b9ca54cad2fac43a4b44353422654a8cf70805bda
d66e161a5b0eb3834d24db0d8a7dc9d02a861337ebec308b41
ab8704d940850892f55fddbe15676981f70d4ddc2c713e37c5
cb1417f15322ea5993f268227251fc43f5de135a876fd42fdc
028bc371ec5164bd01b8be057822692c8ca5bafbd08f7f4910
04e294464ec7f4f55a8a38e3656997765b5e19ae774b6cdc9d

Last 1000 hexadecimal digits:

73ba2f6029363349801fc09a4b0273afd1a29296b76867b997
c28461ae194cd690d2f32b73d4e4798f42c27549927d2a4d72
4616b066c4d681f89275601d4a2a74054ca372bf994d590142
a3a8ca2fde857e286fbdca0636f22b315406da61738d94ef9b
758be0447d67e032b7901867143cf9041b63986faab0cec017
ed9c8c26157be06d8eeb44572402db6f4681e9158fea0e97e7
1e760ecbf494c6399b6bb74b8c49872a855f3dc2207a3f1426
ad2c5b0629104e594a4c87810473a9774d592e0981e1a22109
5ba77121fb3cac8dc1346d37dd8ea1a6b2c1c499965621f4fa
b8e98e94c0faf557860c74ce133dcc22859363920fb1bbe05c
2d31ae71cf4c7874c1c8dc39b32ad6e077b7805c1dd500f659
5993c92f0d65192b126ba993d596b7c1360c4b38338b1f9092
da283348e016e98b8c50874967dadb892f1d35ce4bd7a1bd95
8f5e62ede86157db408e0264c3bf27f4724c76b1ab1b80f58b
94cd970e119be5eaf6d4a4c2b88f2846e81294e8b98c5cf7fe
924213c507a14ac7f23a86ac3e7c953d5a9e931e28ea50c1ae
77499d3afd1e43653b29fd563045da4f08ff0a47462ec262cc
8a1492b9b6ac7d04d59d5c30c5d8ed60c43882e1662f3ad862
29ee775302d477370b97e7297fbb498915f8e9e2661f839cf1
bc2a572e10f9d1908f4d9f0c219892c3bc6274570698de4a38

Edit: replaced table of limbs with hex digits to avoid confusion.

Value modulo small primes:

2	0	3	0	5	3	7	3	11	10	13	7	17	1	19	9
23	22	29	25	31	26	37	14	41	19	43	38	47	7	53	10
59	6	61	36	67	62	71	53	73	29	79	23	83	63	89	51
97	79	101	52	103	9	107	64	109	43	113	81	127	9	131	96
137	21	139	132	149	132	151	11	157	123	163	129	167	42	173	8
179	161	181	173	191	162	193	106	197	66	199	40	211	24	223	181
227	20	229	111	233	85	239	8	241	7	251	25	257	33	263	202
269	222	271	100	277	264	281	176	283	109	293	190	307	277	311	308
313	217	317	248	331	16	337	233	347	56	349	342	353	79	359	146
367	6	373	294	379	44	383	362	389	103	397	49	401	80	409	63
419	65	421	320	431	318	433	295	439	401	443	54	449	394	457	99
461	281	463	43	467	18	479	273	487	372	491	392	499	453	503	362
509	483	521	30	523	219	541	527	547	254	557	213	563	499	569	420
571	358	577	174	587	332	593	228	599	573	601	334	607	533	613	352
617	263	619	226	631	88	641	361	643	201	647	126	653	520	659	614
661	178	673	171	677	274	683	532	691	58	701	286	709	601	719	716
727	489	733	546	739	610	743	707	751	690	757	495	761	71	769	385
773	237	787	602	797	590	809	293	811	668	821	258	823	394	827	801
829	642	839	288	853	750	857	51	859	730	863	312	877	801	881	335
883	120	887	232	907	355	911	728	919	145	929	15	937	154	941	859
947	186	953	311	967	455	971	872	977	963	983	93	991	572	997	152

Numberwang: yes

fredrikj.net | Blog index | RSS feed | Follow me on Mastodon | Become a sponsor

New partition function record: p(1020) computed