# Benchmarking exact DFT computation

September 14, 2020

The discrete Fourier transform (DFT) is a standard numerical benchmark. What about the DFT with exact numbers? This post will compare the performance of Calcium, Sage's QQbar (exact field of algebraic numbers), Sage's SymbolicRing, SymPy, and Mathematica. The benchmark problem will be to evaluate $\textbf{x} - \operatorname{DFT}^{-1}(\operatorname{DFT}(\textbf{x}))$ where $$\operatorname{DFT}(\textbf{x})_n = \sum_{k=0}^{N-1} \omega^{-kn} x_k, \quad \operatorname{DFT}^{-1}(\textbf{x})_n = \frac{1}{N} \sum_{k=0}^{N-1} \omega^{kn} x_k, \quad \omega = e^{2 \pi i / N}$$

for a few different input sequences $\mathbf{x} = (x_n)_{n=0}^{N-1}$. To make the benchmark meaningful, we require that $\textbf{x} - \operatorname{DFT}^{-1}(\operatorname{DFT}(\textbf{x}))$ simplifies to the zero vector. We thereby test the ability to contract (simplify) expressions and not just to expand. If a system fails to prove that some entry in the output vector is zero, it gets a (FAIL).

We test the following inputs: $x_n = n+2$, $x_n = \sqrt{n+2}$, $x_n = \log(n+2)$, $x_n = e^{2 \pi i / (n+2)}$, $x_n = \frac{1}{1+(n+2) \pi}$, and $x_n = \frac{1}{1+\sqrt{n+2} \pi}$, for lengths $N = 2,\ldots,20$ and $N = 30, 50, 100$ where reasonable.

## Calcium example

This benchmark is implemented in the dft.c example program for Calcium. Here is a sample run with $N = 4$ and the sequence $x_n = \log(n+2)$:

## Appendix: benchmarking code snippets

Thanks to Guillaume Moroz for the following Maple implementation:

idftdft := proc(coefficients)
local N, unityroots, dft, idft, zeroes, result;
N := nops(coefficients);
unityroots := [seq(exp(I*2*Pi*k/N), k in 0..N-1)];
dft := [ seq( add(coefficients[1+k]*unityroots[1+(k*n mod N)],
k in 0..N-1),
n in 0..N-1) ];
idft := [ seq( add(dft[1+k]*unityroots[1+(-k*n mod N)],
k in 0..N-1)/N,
n in 0..N-1) ];
zeroes := map(simplify, coefficients - idft):
result := andmap(=, zeroes, 0);
printnonzero(zeroes);
return result;
end proc:

printnonzero := proc(L)
local k;
if not andmap(=, L, 0) then
for k from 1 to nops(L) do
if not evalb(L[k] = 0) then
print(k, L[k]);
end if;
end do;
end if;
end proc:

bench := proc(f, N)
local coefficients, st, check, timing;
forget(..); # Does not clear enough cache data
lprint(f, N);
coefficients := [seq(f(n), n in 0..N-1)];
st := time():
check := idftdft(coefficients);
timing := time() - st;
lprint(check);
printf("%.3f\n\n",timing);
end proc:

bench(n -> n+2, 6):
bench(n -> n+2, 8):
bench(n -> n+2, 16):
bench(n -> n+2, 20):
bench(n -> n+2, 100):

#bench(n -> sqrt(n+2), 6):
#bench(n -> sqrt(n+2), 8):
#bench(n -> sqrt(n+2), 16):
#bench(n -> sqrt(n+2), 20):
#bench(n -> sqrt(n+2), 100): # long

#bench(n -> log(n+2), 6):
#bench(n -> log(n+2), 8):
#bench(n -> log(n+2), 16):
#bench(n -> log(n+2), 20):
#bench(n -> log(n+2), 100): # long

#bench(n -> exp(2*Pi*I/(n+2)), 6):
#bench(n -> exp(2*Pi*I/(n+2)), 8):
#bench(n -> exp(2*Pi*I/(n+2)), 16):
#bench(n -> exp(2*Pi*I/(n+2)), 20):
#bench(n -> exp(2*Pi*I/(n+2)), 100): # long

#bench(n -> 1/(1-(n+2)*Pi), 6):
#bench(n -> 1/(1-(n+2)*Pi), 8):
#bench(n -> 1/(1-(n+2)*Pi), 16):
#bench(n -> 1/(1-(n+2)*Pi), 20):
#bench(n -> 1/(1-(n+2)*Pi), 100): # long

#bench(n -> 1/(1+sqrt(n+2)*Pi), 6):
#bench(n -> 1/(1+sqrt(n+2)*Pi), 8):
#bench(n -> 1/(1+sqrt(n+2)*Pi), 16): # long
#bench(n -> 1/(1+sqrt(n+2)*Pi), 20):
#bench(n -> 1/(1+sqrt(n+2)*Pi), 100):


Mathematica:

check[NNN_] := Module[{NN = NNN},
ClearSystemCache[];
xs = Table[i+2, {i, 0, NN-1}];
ws = Table[Exp[2 Pi I k / NN], {k, 0, 2 NN - 1}];
ys = Table[Sum[xs[[k+1]]* ws[[1+Mod[-k*n, 2 NN]]], {k, 0, NN-1}], {n, 0, NN - 1}];
xs2 = Table[FullSimplify[Sum[ys[[k+1]]* ws[[1+Mod[k*n, 2 NN]]], {k, 0, NN-1}] / NN == xs[[n+1]]], {n, 0, NN - 1}]];
n=11; reps=1; {n, Timing[Do[check[n],{i,1,reps}]][]/reps}


Comment: Daniel Schultz suggested using omega = ToNumberField[(-1)^(2/NN)]; ws = omega^Range[0, 2*NN-1]; in Mathematica. This seems to help on test A (expectedly) while it slows down the others.

SymPy and SageMath:

def benchSymPy(N, which=0, verbose=False):
if which == 0:
x = [S(i+2) for i in range(N)]
elif which == 1:
x = [sqrt(i+2) for i in range(N)]
elif which == 2:
x = [log(i+2) for i in range(N)]
elif which == 3:
x = [exp(2*pi*I/(i+2)) for i in range(N)]
elif which == 4:
x = [1/(1+(i+1)*pi) for i in range(N)]
elif which == 5:
x = [1/(1+sqrt(i+1)*pi) for i in range(N)]
w1 = exp(2*pi*I/N)
w = [w1**i for i in range(2*N)]
y = [S(0) for i in range(N)]
z = [S(0) for i in range(N)]
for k in range(N):
for n in range(N):
y[k] += x[n] * w[(-k * n) % (2*N)]
for k in range(N):
for n in range(N):
z[k] += y[n] * w[(k * n) % (2*N)]
z[k] /= N
eq = simplify(z[k] - x[k])
if eq != 0:
print(eq)
raise ValueError

def benchQQbar(N, which=0):
if which == 0:
x = [QQbar(i+2) for i in range(N)]
elif which == 1:
x = [QQbar(i+2).sqrt() for i in range(N)]
elif which == 3:
x = [QQbar.zeta(i+2) for i in range(N)]
w1 = QQbar.zeta(N)
w = [w1**i for i in range(2*N)]
y = [QQbar(0) for i in range(N)]
z = [QQbar(0) for i in range(N)]
for k in range(N):
for n in range(N):
y[k] += x[n] * w[(-k * n) % (2*N)]
for k in range(N):
for n in range(N):
z[k] += y[n] * w[(k * n) % (2*N)]
z[k] /= N
if z[k] - x[k] != 0:
raise ValueError

def benchSR(N, which=0, verbose=False):
if which == 0:
x = [SR(i+2) for i in range(N)]
elif which == 1:
x = [SR(i+2).sqrt() for i in range(N)]
elif which == 2:
x = [SR(i+2).log() for i in range(N)]
elif which == 3:
x = [SR(exp(2*pi*I/(i+2))) for i in range(N)]
elif which == 4:
x = [SR(1/(1+(i+1)*pi)) for i in range(N)]
elif which == 5:
x = [SR(1/(1+sqrt(i+1)*pi)) for i in range(N)]
w1 = SR(exp(2*pi*I/N))
w = [w1**i for i in range(2*N)]
y = [SR(0) for i in range(N)]
z = [SR(0) for i in range(N)]
for k in range(N):
for n in range(N):
y[k] += x[n] * w[(-k * n) % (2*N)]
for k in range(N):
for n in range(N):
z[k] += y[n] * w[(k * n) % (2*N)]
z[k] /= N
eq = (z[k] - x[k]).full_simplify()
if eq != 0:
raise ValueError