fredrikj.net / blog /

# Factorials mod n and Wilson’s theorem

*March 19, 2012*

Wilson’s theorem states that an integer greater than 1 is a prime if and only if $(n-1)! \equiv -1 \bmod n$. This immediately gives a simple algorithm to test primality of an integer: just multiply out $1 \times 2 \times \cdots \times (n-1)$, reducing each intermediate product modulo $n$, and check that the final result equals $n – 1$.

The running time is obviously $O(n)$ (assuming that $n$ fits in a single machine word so that arithmetic has constant cost). While elegant, this algorithm is useless in practice because even the most basic non-stupid primality test, trial division, only costs $O(n^{1/2})$ in the worst case.

Perhaps surprisingly, the complexity of primality testing using Wilson’s theorem can be brought down to about $O(n^{1/2})$ as well. The idea is to use fast polynomial arithmetic to compute the factorial faster than by the naive method. Assuming for simplicity that $n – 1$ is a perfect square, let $m = (n-1)^{1/2}$. We form the polynomial $P(x) = (x+1)(x+2) \cdots (x+m)$ and evaluate $P(x)$ simultaneously at the points $x = 0, m, \ldots m(m-1)$. Then $P(0) P(m) \cdots P(m(m-1)) = (n-1)!$. If $n – 1$ is not a perfect square, we choose $m = \lfloor n^{1/2} \rfloor$ and fill in the missing factors by repeated naive multiplication.

All the required operations can be done in roughly $O(n^{1/2})$ time using FFT-based polynomial multiplication and balanced subproduct trees. The complexity is actually slightly higher, about $O(n^{1/2} \log^3 n)$, and the constant overhead is huge, so the method is still considerably slower than trial division. But it is interesting to see how it performs in practice.

Since I recently implemented fast multipoint evaluation in FLINT, the fast factorial algorithm became easy to implement as well. In my repository, it is now enabled by default for computing factorials modulo an integer (`n_factorial_mod2_preinv`) when the input is large enough; the code is here. Here is how it compares to the naive algorithm for computing $(n-1)!$ modulo $n$:

n |
Naive factorial | Fast factorial |
---|---|---|

10 | 12 ns | 1.2 μs |

10^{2} |
0.46 μs | 4.4 μs |

10^{3} |
7 μs | 22 μs |

10^{4} |
78 μs | 100 μs |

10^{5} |
0.89 ms | 0.52 ms |

10^{6} |
9.8 ms | 3.1 ms |

10^{7} |
110 ms | 18 ms |

10^{8} |
1.2 s | 0.12 s |

10^{9} |
12 s | 0.71 s |

10^{10} |
151 s | 3.5 s |

10^{11} |
1709 s | 15 s |

10^{12} |
5 h (est.) | 70 s |

10^{13} |
50 h (est.) | 307 s |

10^{14} |
500 h (est.) | 1282 s |

The numbers agree reasonably well with theory (the naive algorithm does not quite take a constant multiple of $n$ time because the arithmetic is done slightly faster when several factors can be accumulated in a single word, and the ratio per order of magnitude for the fast algorithm is a bit larger than 4 and not $10^{1/2} \approx 3.16$, but that is reasonable considering the extra log factors in the complexity). Around 10^{15}, my laptop is running out of memory. To compute larger factorials, one could choose $m$ smaller than the square root of the input, performing several multipoint evaluations. Choosing a fixed large $m$ (say 10^{7}) gives an algorithm with $O(n)$ complexity, but a much smaller constant than the naive factorial algorithm.

Although we have achieved a factor 1000 speedup over the naive factorial algorithm and made Wilson’s theorem a *feasible* primality test for numbers as large as 15 digits without requiring special hardware or patience, it remains completely useless for practical purposes. Trial division is around 10,000 times faster (dividing by all integers up to $(10^{14})^{1/2}$ at 10 ns per iteration takes 0.1 seconds), and sophisticated algorithms are much faster still (the FLINT function `n_is_prime` tests primality of a number this size in about 7 μs, and `n_factor` factors it in about 500 μs).