# The Arb matrix revolutions

July 3, 2018

The previous post on linear algebra in Arb ended with the cliffhanger that the improvements for multiplication and linear solving had yet to be adapted for computing determinants and for complex matrices. That is now done.

## Complex matrix multiplication

Except for very small matrices, acb_mat_mul now performs a complex matrix multiplication by reordering the data and doing four real matrix multiplications via arb_mat_mul. This benefits from the faster real matrix multiplication algorithm described in the previous post.

As a benchmark of acb_mat_mul, we multiply the size-$n$ complex DFT matrix $$A_{j,k} = \frac{\omega^{jk}}{\sqrt{n}}, \quad \omega = e^{-2\pi i/n}$$ with a copy of itself. (The utility method acb_mat_dft that constructs this matrix has been added to Arb.)

At 53-bit precision, we observe the following speedup:

$n$ Time, Arb 2.13 Time, Arb 2.14-git Speedup
10 0.000228 0.000143 1.59
30 0.00762 0.00175 4.35
100 0.305 0.0416 7.33
300 9.375 0.668 14.0
1000 408.6 17.7 23.1

The new algorithm is not only faster, it is also more accurate. The following shows entry $(n/2,n/2)$ in the output matrix:

n      Arb 2.13                            Arb 2.14-git
==============================================================================
10     [1.0000000000000 +/- 2.01e-15]      [1.00000000000000 +/- 7.96e-16]
30     [1.0000000000000 +/- 2.79e-15]      [1.00000000000000 +/- 6.38e-16]
100    [1.000000000000 +/- 1.43e-14]       [1.00000000000000 +/- 6.11e-16]
300    [1.000000000000 +/- 3.06e-14]       [1.00000000000000 +/- 4.63e-16]
1000   [1.00000000000 +/- 1.48e-13]        [1.00000000000000 +/- 7.72e-16]


The performance improvements for different matrices and levels of precision are similar to those for real matrices, described in the previous post.

## Complex solving and inverse

Linear solving uses the same preconditioning algorithm as in the real case, and the new code for complex matrices likewise benefits from fast matrix multiplication through the use of block recursive LU factorization.

As a benchmark of acb_mat_inv and acb_mat_solve (the former just calls the latter to solve $AX = I$), we compute the inverse of the size-$n$ complex DFT matrix, starting at 53-bit precision and doubling the precision until the matrix is proved to be invertible.

$n$ Time, Arb 2.13 Time, Arb 2.14-git Speedup
10 0.0004 0.0006 0.67
30 0.0104 0.0097 1.07
100 0.896 0.22 4.07
300 54.4 3.76 14.5
1000 3996 83.8 47.7

For sizes 10, 30, 100, 300 and 1000, the old algorithm terminates at 53, 53, 212, 848, 1696 bits of precision. The new algorithm terminates at 53 bits every time for this matrix which is well-conditioned. The following shows the entry at position $(n/2,n/2)$ in the output matrix at 53-bit precision:

n      Arb 2.13                                             Arb 2.14-git
===============================================================================================================
10     [-0.3162277660 +/- 5.04e-11] + [+/- 3.35e-11]*I      [-0.31622776601684 +/- 4.22e-15] + [+/- 2.47e-15]*I
30     [+/- 10.7] + [+/- 10.5]*I                            [-0.18257418583506 +/- 8.22e-15] + [+/- 3.70e-15]*I
100                                                         [0.10000000000000 +/- 6.72e-15] + [+/- 6.63e-15]*I
300                                                         [0.0577350269190 +/- 4.91e-14] + [+/- 1.19e-14]*I
1000                                                        [0.0316227766017 +/- 4.34e-14] + [+/- 2.71e-14]*I


The new algorithm loses precision very slowly. Although the old algorithm succeeds at 53-bit precision for both $n = 10$ and $n = 30$, the accuracy with the new algorithm is far superior.

## Real and complex determinants

Determinants of large matrices now use a preconditioning algorithm described in a 2010 paper by Siegfried Rump (a different preconditioning algorithm for determinants is given in the Hansen-Smith paper from 1967, but I tested this first and it does not seem to work well). The idea is simple: we first compute an approximate LU factorization $A \approx PLU$, then compute approximate inverses $L' \approx L^{-1}$ and $U' \approx U^{-1}$, and build the matrix $B = L' P^{-1} A U'$ using ball arithmetic. $B$ is close to the identity matrix, so we can determine its determinant accurately using the Gershgorin circle theorem. The determinants of the triangular matrices $L'$ and $U'$ and the permutation matrix $P$ are of course trivial to compute, and this gives us the determinant of $A$.

There is actually a subtle issue with the algorithm as Rump describes it: Rump takes the product of the Gershgorin circles to enclose the determinant, but it does not follow from the Gershgorin circle theorem that this is correct since there is not a one-to-one correspondence between Gershgorin circles and eigenvalues (unless the circles are disjoint). I have been informed by private communication that a proof of correctness exists for real matrices (by a more elaborate argument), but the question for complex matrices is apparently an open theoretical problem! Regardless, this gap in the algorithm is easily fixed without such a proof: one just picks a single circle enclosing all the Gershgorin circles and then raises this to the $n$-th power. This gives bounds that are slightly more pessimistic but still acceptable.

As a benchmark of acb_mat_det, we compute the determinant of the size-$n$ complex DFT matrix, starting at 53-bit precision and doubling the precision until the determinant is proved to be nonzero.

$n$ Time, Arb 2.13 Time, Arb 2.14-git Speedup
10 0.000109 0.000109 1.00
30 0.0033 0.0095 0.35
100 0.429 0.254 1.69
300 27.4 3.88 7.06
1000 1712 94.5 18.1

For sizes 10, 30, 100, 300 and 1000, the old algorithm terminates at 53, 53, 212, 848, 1696 bits of precision. The new algorithm terminates at 53 bits every time for this matrix which is well-conditioned. The result at 53-bit precision is as follows:

n      Arb 2.13                                             Arb 2.14-git
===============================================================================================================
10      [-1.0000000000 +/- 5.68e-12] + [+/- 5.67e-12]*I     [-1.0000000000 +/- 8.70e-12] + [+/- 8.69e-12]*I
30      [1.0 +/- 7.08e-3] + [+/- 7.08e-3]*I                 [1.000000000 +/- 1.98e-11] + [+/- 1.98e-11]*I
100                                                         [+/- 9.43e-10] + [1.0000000 +/- 9.43e-10]*I
300                                                         [+/- 2.90e-8] + [1.000000 +/- 2.90e-8]*I
1000                                                        [+/- 1.73e-6] + [-1.0000 +/- 1.73e-6]*I


Determinants seem to lose precision slightly faster than the entries in the inverse matrix. Nevertheless, only 10 digits are lost at $n = 1000$.

With $n = 30$, the new algorithm is three times slower than the old algorithm but much more accurate. The old code is very slightly more accurate at $n = 10$ but this is essentially just numerical noise.

For completeness, we also give benchmark results for computing the determinant of the real DCT matrix using arb_mat_det:

$n$ Time, Arb 2.13 Time, Arb 2.14-git Speedup
10 0.000046 0.000048 0.96
30 0.00105 0.0024 0.44
100 0.124 0.052 2.38
300 3.761 0.783 4.80
1000 498 17.8 23.8

The 53-bit output:

n      Arb 2.13                             Arb 2.14-git
===============================================================================
10      [-1.00000000000 +/- 7.98e-13]       [-1.00000000000 +/- 7.98e-13]
30      [-1.0000 +/- 1.99e-6]               [-1.000000000 +/- 1.29e-11]
100                                         [1.00000000 +/- 5.53e-10]
300                                         [1.000000 +/- 1.65e-8]
1000                                        [1.000000 +/- 5.24e-7]