Computing with metavalues

February 24, 2021

This post will explore the distinction between values and metavalues (representations) as the objects of computation, particularly in a mathematical software context. A clear separation between values and metavalues is an idea that appears in symbolic computation systems, compilers, code analysis tools, and interval arithmetic software (among other settings), but it has arguably never been pursued to its full potential, especially at the level of programming language design.

The case can be made that many bugs and footguns in computer algebra systems and programming languages are the result of a confused treatment of values and metavalues (see Figure 1), and the title of this post could just as well have been "Why is my mathematical software full of bugs?". A satisfactory answer to that question would require considering other aspects; for example, infinities and singularities are a largely orthogonal issue (though there are points of overlap with the value-metavalue problem), and then of course there are all the usual categories of bugs that are not specific to mathematical software. Nevertheless, I believe that metavalue mismanagement is particularly fundamental.

Figure 1: A chimera, Jacopo Ligozzi (1547-1627).

This topic is closely related to the problem of defining equality/equivalence/isomorphism (see for example When is one thing equal to some other thing? by Barry Mazur) and other problems in metamathematics, though my concern in this post will be more with concrete software implementation issues than with foundations of mathematics. Complex isomorphisms are a challenge when formalizing modern abstract mathematics, but misuse of metavalues leads to incorrect present-day implementations of mathematical objects that had correct foundations 2000 years ago.

Contents

Values and metavalues
- Examples
Uncertainty propagation
Metavalue fallacies: where implementations get it wrong
Ideas for better software

Values and metavalues

I will use value to refer to any mathematically definable object and metavalue to refer to a description of a value used for computations. An isomorphic representation of a value qualifies a metavalue, but a metavalue need not unambiguously identify a specific value; it suffices for a metavalue to describe a property of a value (possibly shared by several possible values), or to provide an approximate or heuristic description. It may also carry extra information (metadata) that has nothing to do with the value itself.

This notion of metavalue is informal, and the terminology is not necessarily standard, with apologies to any professional logicians who happen to read this. I want to avoid using a term that either sounds too general or has a preexisting technical meaning (model, abstraction, representation, approximation, image, etc.). In software, metavalues may appear explicitly (as distinct entities from values), implicitly (in the semantics of operations), or informally (in usage patterns). The distinction is mainly one of intent: a metavalue is meant to be used as a substitute for a (true) value.

Values are the objects we would ideally like to represent and manipulate in software. Here are some examples of interesting values, all of which can defined formally within standard foundations of mathematics (I will ignore issues such as the meaning of definability and whether foundations are consistent):

The tuple $(1, 2, 3)$.
The boolean $\operatorname{True}$.
The real number $2 \pi$.
The set $\mathbb{Q}$.
The sequence $(0, 1, 1, 2, 3, 5, 8, 13, \ldots)$.
The function $\exp : \mathbb{C} \to \mathbb{C} \setminus \{0\}$.
The category $\operatorname{Top}$ of topological spaces.
The integer $\operatorname{TREE}(3)$.
The truth value of the proposition "white has a winning sequence of moves from the starting position in chess".
The set of nonreal zeros of $\zeta(s)$ with real part not equal to $1/2$.

Why do we need metavalues? Simple values such as the tuple $(1, 2, 3)$ correspond quite well to the values of native types in programming languages. More formally, it is easy to encode such simple values on a Turing machine in a way that permits carrying out natural operations (addition of integers, concatenation of tuples, testing equality, etc.) algorithmically. However, the vast majority of specific mathematical objects are impractical or impossible to represent as such. Indeed, most individual real numbers, sets, etc. cannot be represented in any conceivable form on an actual computer since they would require an infinite amount of data: a number like $2 \pi$ or a set like $\mathbb{Q}$ is an exception which happens to admit a simple symbolic description using standard formalisms. Even if we acknowledge that objects we cannot even describe are beyond scope for any form of computation, there are plenty of difficulties left when working with finitely definable objects:

Most objects with finite descriptions are too large to represent explicitly on a physical computer (for example, $\operatorname{TREE}(3)$ is a tad too large to store in a GMP mpz_t).
Values that are clearly representable need not be computable in an explicit, meaningful way. For example, if a predicate is undecidable, we cannot explicitly reduce it to True or False (although either is perfectly representable). If it is undecidable whether a given real number $x$ represents zero, we cannot evaluate $\operatorname{sgn}(x)$ to an explicit number $-1, 0, +1$.
Even explicitly representable and computable values may be too costly or inconvenient to actually implement.
In some cases, we deliberately want to avoid representing specific values, in order to model a range of possible outcomes.

For these reasons, it is often convenient (and sometimes necessary) to work with metavalues—indirect, partial or approximate descriptions of values—instead of (perfectly faithful representations of) the values themselves.

Computing with metavalues, in a narrow sense, means taking a (mathematically specified) program and running it with metavalues instead of values, either by explicitly executing the steps of the program with metavalues instead of values or by performing an abstract or symbolic analysis to deduce a metavalue for the return value of the program (for simplicity, I will only consider pure functional programs). More generally, it is often useful to move between levels of abstraction and treat metavalues themselves as values; unfortunately, this comes with the risk of conflating value and metavalue semantics (an extremely common error in actual software).

Using metavalues, we can simulate programs running on abstract mathematical machines manipulating perfect mathematical objects. Indeed, computing with metavalues even allows us to simulate nonstandard models of computation which cannot be realized on ideal Turing machines, let alone on physical computers with limited resources. For example, floating-point arithmetic can be viewed as a method of using approximate metavalues to simulate (imperfectly!) the operation of Blum-Shub-Smale machines (abstract machines capable of storing exact real numbers in registers).

Examples

Here are some examples of abstractions, computational paradigms and specific techniques that may be considered use of metavalues:

Abstract interpretation is a special case of computing with metavalues. In the terminology of abstract interpretation, values are called concrete values and metavalues are called abstract values. (This terminology assumes that concrete values are actually representable and computable, whereas the view I'm taking here is that values are mathematical abstractions while metavalues are concretely representable, so the use of concrete/abstract is in some sense reversed.)
Type inference: the type of a value (or a typeclass of a value) is an example of a metavalue, and type inference is essentially abstract interpretation using types as the metavalues (abstract values).
Floating-point arithmetic as a model of real arithmetic, with floating-point numbers as metavalues approximating ideal real numbers. More generally, any form of truncation or approximation can be seen as metavalue computation; this includes truncated power series as an approximation of infinite power series, Chebyshev series as a model of functions on $\mathbb{R}$ (see Chebfun), matrices as an approximation of infinite-dimensional linear operators, and so on.
Word-size integers as an approximate model of integers (or more generally, residue class rings as an approximation of various rings).
Interval arithmetic (as well as ball arithmetic) is a nonexact but rigorous model of real arithmetic, where real numbers are represented by enclosing sets. More generally, interval arithmetic can be generalized to computing enclosures for arbitrary sets of values (this is closely related to abstract interpretation and type inference).
Symbolic computation with free variables. A symbolic expression such as $3x+4y$ can be viewed as a metavalue representing the possible values resulting from substituting specific values for $x$ and $y$, and if $x$ and $y$ are specified with constraints (for example, $0 \le x, y \le 1$), this can be viewed as a kind of enclosure, which additionally allows tracking dependencies between the free variables.
Significance arithmetic is an extension of floating-point arithmetic where each approximation carries an estimate of the error.
Fuzzy logic, fuzzy sets, fuzzy intervals are generalizations of interval arithmetic that allow assigning varying probabilities to different elements of the enclosing sets.
Conjectural results, and outright guessing. Results computed using Monte Carlo algorithms or assuming unproved conjectures (such as the Riemann hypothesis) can be considered metavalues; even when the output value is represented exactly, it implicitly carries with it some degree of uncertainty.
Results carrying proof certificates or software information. More on this later.
Error values and exceptions can in some cases be viewed as metavalues representing loss of information.

Most of these examples relate to numerical computing (my own field of interest); there are probably many others in proof theory, combinatorics, etc. that I'm unfamiliar with. With so many disparate concepts under one umbrella, it might seem as if the notion of metavalue is virtually synonymous with abstraction and so broad as to be useless. However, if we look carefully, it appears that many of these concepts have very similar usage patterns and lead to similar bugs. This suggests that it might be worthwhile to think about metavalues systematically.

I must again stress that the distinction between values and metavalues is context-dependent:

Types are perfectly fine values in their own right, corresponding to mathematical sets or classes. A type only becomes a metavalue when it is used to describe an instance of itself.
Floating-point numbers are perfectly fine values in their own right; they can be viewed either as the standard dyadic or decimal rational numbers or as a distinct type of entities, either way equipped with nonstandard (rounding) binary operators $+$, $-$, $\times$, $/$, which do not satisfy the usual laws of arithmetic such as associativity. It is only when floating-point arithmetic is used to model real number arithmetic that floating-point numbers become metavalues.
An interval such as $[3.14, 3.15]$ is a perfectly fine value (a subset of $\mathbb{R}$) in its own right; it becomes a metavalue the moment we use it to describe $\pi$ or some nearby number.
A symbolic expression such as $3x+4y$ is a perfectly fine value in its own right (we can view it as a tree or string over an alphabet containing the symbols $3, 4, x, y, +, \times$); it becomes a metavalue when used to describe a set of possible real numbers. It can also be viewed as (a description of) a specific value, such as an element of the formal ring $\mathbb{Z}[x,y]$.
A more subtle example is replacing numbers by dual numbers or jets and doing forward mode automatic differentiation. This in itself rather represents a transformation of the value-level problem, with automatic differentiation providing an exact solution to the problem of calculating derivatives. However, the use of a finite number of derivatives specifically to approximate a function or to propagate uncertainty about a number qualifies as a use of metavalues. This can be seen either as a form of significance arithmetic (when used approximately) or as a form of generalized interval arithmetic (when used with rigorous error enclosures).

Although it can be useful to think strictly in terms of values or strictly in terms of metavalues, we often need both directions of abstraction (replacing values by metavalues and re-interpreting metavalues as values), explicitly in code as well as when reasoning about programs.

Uncertainty propagation

Metavalues come in many shapes and flavors. I will not attempt to give a formal definition of a metavalue, but it may be convenient to think of a metavalue $X$ for $x$ as an object $E : T$ where $E$ is a formal expression (or a concretely representable value) representing $x$ in some way and $T$ is a metatype specifying the interpretation of $E$. To illustrate, consider the following metavalues for the real number $2 \pi$:

$2 \pi : \operatorname{Exact}\;\;\;$ ($E$ is an exact symbolic expression for $x$)
$[6.28 \pm 0.01] : \operatorname{Enclosure}\;\;\;$ ($E$ describes an enclosure for $x$, namely a set $S$ such that $x \in S$)
$x \ge 0 : \operatorname{Predicate}\;\;\;$ ($E$ describes a predicate $P(x)$ such that $P(x)$ is True)
$\mathbb{R} : \operatorname{Type}\;\;\;$ ($E$ describes a type (or set) $T$ such that $x \in T$)
$6.28 : \operatorname{Approximation}\;\;\;$ ($E$ describes an approximation $\hat x$ such that $x \approx \hat x$)

These are just some examples, and many other possibilities exist for describing real numbers. The expression-metatype formalization is also simplistic, because metavalues can contain fine-grained combinations of metavalues with different metatypes: consider the tuple $((1 : \operatorname{Exact}), ([2 \pm 0.01] : \operatorname{Enclosure}), (3.00 : \operatorname{Approximation}))$, which can be considered a metavalue for $(1,2,3)$. In any case, an important general observation is that metavalues can express values with different levels of certainty or fidelity. Here is a possible classification of levels of fidelity for metavalues, ranked on a decreasing scale:

Exact and canonical
Exact
Enclosure
Guess/approximation
Unknown/error

This scale can be made more detailed, and it can also be made more coarse (for example, we might just distinguish between exact/inexact metavalues). The important points are: (1) we can place metavalues (or their metatypes) on such a linear scale, at least approximately; (2) metavalues that fall in the same place on the scale have similar characteristics even when they describe vastly different mathematical objects, and (3) understanding this scale can inform sound design of mathematical software and programming languages.

A generally useful principle seems to be: operations on metavalues should result in metavalues that have either the same fidelity or lower fidelity; moving down the scale is always correct, but moving up the scale is correct only in exceptional cases. In other words, uncertainty should propagate automatically. This principle is not a miracle cure for bugs, but I believe that following it will lead to more easily composable software that is more likely to be correct by default (interval arithmetic is a good specific example). Computer algebra systems and programming languages tend to follow this principle to some extent; for example, adding an integer (exact, canonical) to a floating-point number (approximation) usually results in a floating-point number (approximation). However, most systems are also quick to violate this principle, notably by lacking metavalue semantics for predicates, control flow and collection types. More on this later; to begin with, I will elaborate on the classification given above and describe what I believe are reasonable general semantics for metavalues.

Canonical and exact metavalues

An exact metavalue represents a specific value unambiguously. A canonical metavalue (more commonly: canonical form, or normal form) is a distinguished unique representation of a specific value. For example, the formal expressions $1+2+3$ and $1 \cdot 2 \cdot 3$ are both exact representations of the integer $6$. The integer literal $6$ (suitably encoded) is usually taken as the canonical representation; other typical examples of canonical forms include the symbols True/False for boolean values, rational numbers in normal form $p/q$ with $\gcd(p,q) = 1$ and $q \ge 1$, and polynomial ring ideal elements reduced with respect to a Gröbner basis. Metatypes that guarantee a canonical form are usually preferable, but not always: computing canonical forms can be expensive, or even theoretically very hard. Indeed, any mathematical problem can be considered a problem of finding a suitably defined canonical form; consider reducing "the truth of the Riemann hypothesis" to True/False.

Assume that $A$ and $B$ are canonical metavalues with formal expressions $E_A$ and $E_B$, representing values $a$ and $b$. Then $(E_A = E_B) \iff (a = b)$. This is why canonical forms are useful: they reduce the hard problem of comparing mathematical values to the mechanical problem of comparing formal expressions. If $A$ and $B$ are exact, but not necessarily canonical, then $(E_A = E_B) \implies (a = b)$. The reverse implication need not hold: $E_A = 1+2+3$ and $E_B = 6$ are different formal expressions although they represent the same integer.

Enclosures

An enclosure metavalue represents a range of possible values. Intervals, types and constraint predicates are examples of enclosures. In the case of a constraint predicate such as $x \ge 0$, it might be assumed implicitly that $x$ belongs to some fixed base set or type, or the predicate might be understood as defining a class among objects of unspecified type in the universe of discourse. Homomorphic images can also be enclosures (for example, $0 \pmod 2$ can be a metavalue representing any even integer). Symbolic expressions with free variables are also examples of enclosures, in which the range depends on the domain of the variables (which may be implicitly or explicitly defined). Finally, an important kind of enclosure is a finite listing of possible values: for example, booleans can be represented by enclosures $\{\operatorname{True}\}, \{\operatorname{False}\}, \{\operatorname{True}, \operatorname{False}\}$, where the last metavalue represents an unknown truth value and the metavalue operations follow triple-valued logic.

The fundamental rule for computing with enclosures is the inclusion principle: if $f : A \to B$ is a function (on values) and $X \subseteq A$ is an enclosure (understood to represent an unknown element $x \in X$), then $F(X)$ is a valid enclosure for $f(X)$ if $f(x) \in F(X)$ for all $x \in X$. Enclosures thus rigorously include all possible values, and this property is conserved under function composition or evaluation. In abstract interpretation, the corresponding principle is formulated in terms of monotonic functions on lattices.

Exact metavalues can be viewed as the special case of singleton enclosures, and when mixing exact metavalues with enclosures, the results can be defined by demoting exact metavalues to enclosures. Enclosures are thus "contagious". Conversely, an enclosure can be promoted to an exact metavalue when it is a singleton (for example, the boolean enclosure $\{\operatorname{True}\}$ can be promoted to the exact boolean $\operatorname{True}$); this might be called exactification.

Enclosures are not exact (in general), but they permit exact semidecidability for value predicates such as equality. If $A$ and $B$ are metavalues representing values $a$ and $b$ by enclosing sets $S_A$ and $S_B$ respectively, then $(S_1 \cap S_2 = \{ \}) \ \implies (a \ne b)$ (contrapositively, $(a = b) \implies (S_1 \cap S_2 \ne \{ \})$). The inverse implication need not hold: for example, if $S_A = S_B = [0,1]$, we do not necessarily have $a = b$ (the values could be different elements of the unit interval, say $a = 1/3$ and $b = 2/3$).

Guesses and approximations

A guess or approximation gives some estimate or approximation of a value, but does not rigorously guarantee anything: the true value could be anything in the universe (or implicitly any value of the type used in a given context). We may distinguish between numerical approximations such as floating-point numbers which are likely to be close to the true value in some metric but unlikely to be exact, and discrete guesses ("the predicate is probably True") which are clearly either right or wrong. A guess or approximation may be equipped with metadata indicating the degree of confidence in the result: for example, significance arithmetic indicates the likely number of correct digits in a numerical approximation.

I believe that sound semantics for guesses and approximations is to make them contagious: adding an exact metavalue or enclosure to a guess or approximation should normally result in a guess or approximation. This does not mean that approximations within a program must lead to approximate output: an approximation may be promoted to an enclosure or an exact metavalue as a result of a posteriori certification. For example, an approximate root of a function might be certified using a sign check or by applying the interval Newton method. This is a situation where it is especially useful to move explicitly between the value and metavalue levels of abstraction in a program.

Errors and unknowns

Error values such as NaN in floating-point arithmetic can be viewed either as values or as metavalues. I believe that these notions should be separate, with Undefined as a value representing a domain or type error (for example, division by zero) and Unknown as a metavalue representing a loss of computational information. Such an Unknown can be viewed as an enclosure of the whole universe (or at least of the implied type) where not even a guess is available. Undefined values and Unknown metavalues can be viewed as elements of a bottom type (for values and metavalues respectively) and should normally be maximally contagious. In floating-point arithmetic, NaN acts both as a value and as a metavalue (or several different values and metavalues), and this overloaded role can lead to some truly annoying semantics. I already discussed this in a previous post, so I will not repeat the details here.

Exceptions and error codes can be viewed as metavalues. It depends on the application whether it makes sense to interrupt execution or let error information propagate through the normal execution via metavalues ("nonstop computing").

Certificates

Metavalues can indicate the degree of belief in a result ("this is exact", "this is a guess"). A refinement of this idea is to propagate specific information about assumptions made in the computation, or even to propagate evidence to back up the claim.

For example, if an algorithm needs to guess that $X > 0$ holds for some value $X$ encountered during its execution (being unable to prove this), the assumption $X > 0$ could be attached to the output. The user then has a chance to verify the assumption by external means. Such output assumptions can be statements about choices of algorithms ("$Y$ was computed using a Monte Carlo algorithm" or "$N$ was determined to be prime using a pseudoprime test") or even general conjectures ("this result assumes the Riemann hypothesis" or "this result assumes Schanuel's conjecture"). Such output assumptions are an alternative or complement to input assumptions where the user specifies assumptions ($X > 0$ or "we may assume the Riemann hypothesis") ahead of time.

Even supposedly exact results computed without relying on unproved conjectures may be incorrect due to any number of implementation bugs or even hardware errors. Most modern programming languages and computer algebra systems depend on a huge number of libraries and components, meaning that there are many places where things can go wrong. Metavalues could contain tags about the components used in the computation of each value: "part of this result was computed using GMP", or "part of this result was computed using Pari/GP" or "part of this result was computed by mining Stack Overflow". Such a tag can be either be viewed as a proof by authority or as a reason to be skeptical. It would also provide a partial solution to the problem of giving component authors credit. SageMath has a dependency usage tracking system for citations that achieves something similar, though it based on runtime profiling instead of fine-grained dependency tracking.

Finally, instead of simply making assertions, metavalues can carry with them checkable certificates. For example, if a primality test is required during the execution of an algorithm, then the output metavalue might include a copy of an elliptic curve primality certificate. Algorithms that produce such output are known as certifying algorithms; this paradigm is reportedly being used with great success in the LEDA system.

Metavalue fallacies: where implementations get it wrong

Many bugs in mathematical software are expressions of some or all of the following fallacies:

Confusing metavalues with values.
Treating noncanonical forms as canonical.
Violating the inclusion principle.
Losing track of uncertainty.

To illustrate these fallacies (and how they appear in the wild!), I will consider the test problem of computing the rank of the following matrix:

$$\begin{pmatrix} 0 & X \\ 0 & 0 \end{pmatrix}$$

The result should be obvious: the rank is 0 if $X$ is zero, and 1 if $X$ is nonzero, assuming that $X$ is an element of a commutative ring with unit. A third possibility exists: if $X$ is not an element of a commutative ring with unit, asking for the rank of this matrix (by the standard definition of rank) is a type error and ought to give Undefined or an equivalent exception. It would be far too much work to do a systematic review of all the interesting ways widely-used mathematical software systems manage to fail at this simple task, so to keep things brief I will only make a partial exhibit of Sage, Maple and Mathematica.

Approximate and enclosure input

The low-hanging fruit is floating-point arithmetic. Sage (along with 99% of all numerical software in the world) will happily prove that $10^{-20} = 0$:

sage: X = 0.1 + 1e-20 - 0.1
sage: Matrix([[0, X], [0, 0]]).rank()
0

This is an example of losing track of uncertainty: the result of an approximate calculation is presented as the exact integer 0. The result is not surprising to anyone who has taken an introductory course in numerical analysis; the problem is that the uncertainty is invisible in the output. This would be less of a concern in purely numerical software where all results can be expected to be approximate, but in a computer algebra system that has the ability to manipulate exact objects, the possibility of approximate input silently giving exact output is problematic. It means that users must have significant knowledge of implementation details in order to compose functions correctly; a user can very easily introduce an approximation into an exact calculation by accident and get no warning that the seemingly exact result is incorrect. An alternative could be to return a metavalue reflecting the nature of the result. For example, an output 0.0 (of a floating-point type) might serve, but some kind of tagged integer type ("0, but this is just a guess") could be a more sophisticated alternative.

The same problem occurs in Mathematica, which uses significance arithmetic to try to track accuracy of floating-point numbers automatically. Significance arithmetic works great until it doesn't:

In[1]:= MatrixRank[{{0, (0.100000000000000000000000000 + 10^-50) - 0.100000000000000000000000000}, {0, 0}}]
Out[1]= 0

A more interesting bug manifests if we use ball arithmetic instead of floating-point arithmetic. Ball arithmetic (courtesy of Arb) should provide rigorous enclosures for numerical values, and by extension rigorous mathematical results. Yet, using ball arithmetic in Sage, we can prove the dubious theorem $\tfrac{1}{10} - \tfrac{1}{10} \ne 0$:

sage: X = RBF("0.1") - RBF("0.1")
sage: Matrix([[0, X], [0, 0]]).rank()
1

The problem can be seen as an instance of either losing track of uncertainty, violating the inclusion principle, or confusing metavalues with values. Comparisons in ball arithmetic in Sage return True if True is the only possible value, and False both when False is the only possible value and when the value could be either True or False:

sage: RBF("0.1") - RBF("0.1")
[+/- 2.78e-17]
sage: RBF("0.1") - RBF("0.1") == 0
False
sage: RBF("0.1") - RBF("0.1") != 0
False
sage: RBF("0.1") - RBF("0.1") != 1
True

True thus acts as a value (or an exact metavalue) while False acts as a metavalue (representing two possibilities). Alternatively, we can view the comparison as an operation acting on the representations of the metavalues (treating the balls as values and defining $x = y$ and $x \ne y$ in a nonstandard way) rather than reflecting the operations $x = y$ and $x \ne y$ on real number values. This reflects a deliberate interface choice made in the C-level API of Arb, and poses no problem when algorithms are written explicitly with this behavior in mind, but it easily leads to bugs when crossing abstraction boundaries. The generic rank algorithm in Sage, presumably, tests whether elements are zero unaware of this behavior, so the uncertainty information is lost. More appropriate behavior for the rank method would be to recognize nonexact input and return an enclosure of the possible values 0 and 1, or to throw an exception.

Sage's rank calculation is also broken for $p$-adic numbers (represented as enclosures of the form $a + O(p^n)$ with a default precision of $n = 20$, supposedly providing rigorous error tracking):

sage: X = Qp(7)(1) - 7^19 - 1
sage: Matrix([[0, X], [0, 0]]).rank()
1
sage: X = Qp(7)(1) - 7^20 - 1
sage: Matrix([[0, X], [0, 0]]).rank()
0

The culprit is the following loss of uncertainty (alternatively, this constitutes the value-metavalue confusion fallacy of coercing 0 into a $p$-adic enclosure and comparing the enclosing sets as values):

sage: 1 + Qp(7)(7^20) - 1
O(7^20)
sage: 1 + Qp(7)(7^20) - 1 == 0
True

When given a power series enclosure, Sage appropriately throws an exception, however:

sage: R. = PowerSeriesRing(ZZ, x); X = O(x^20)
sage: Matrix([[0, X], [0, 0]]).rank()
    ...
NotImplementedError: Echelon form over generic non-exact rings not implemented at present

Exact input

Problems are in no way restricted to approximate or enclosure types. Even if we give exact symbolic expressions as input, Sage will be happy to output nonsense:

sage: X = 2*log(sqrt(2)+sqrt(3)) - log(5+2*sqrt(6)) + exp(-1000)
sage: Matrix([[0, X], [0, 0]]).rank()
0

In this case, we have proved that $e^{-1000} = 0$ (the logarithmic terms are precisely equivalent). This is an instance of losing track of uncertainty that is completely hidden from the user: symbolic expressions in Sage use an approximate numerical test internally to determine equality. This would perhaps be acceptable if the information was conveyed in the output ("0, but this is just a guess"), but as it stands this is a pure bug.

Mathematica curiously suffers from the same problem, despite its supposedly superior symbolic simpliciation facilities and internal use of significance arithmetic meant to be more reliable than floating-point arithmetic:

In[1] := MatrixRank[{{0, 2*Log[Sqrt[2]+Sqrt[3]]-Log[5+2*Sqrt[6]] + Exp[-1000]}, {0, 0}}]
Out[1]= 0

What is extra curious is that Mathematica can determine the correct sign of $X$ itself:

In[2] := FullSimplify[2*Log[Sqrt[2]+Sqrt[3]]-Log[5+2*Sqrt[6]] == 0]
Out[2]= True

In[3] := FullSimplify[2*Log[Sqrt[2]+Sqrt[3]]-Log[5+2*Sqrt[6]] + Exp[-1000] == 0]
Out[3]= False

This suggests that the problem is a loss of metavalue information in Mathematica's matrix rank routine.

Maple2020, meanwhile, errs in the opposite direction of Sage and Mathematica, proving the false theorem that $2 \log(\sqrt{2}+\sqrt{3}) \ne \log(5+2 \sqrt{6})$:

> Rank(Matrix([[0, 2*log(sqrt(2)+sqrt(3))-log(5+2*sqrt(6))], [0, 0]]));                              
                                                 1

This is an instance of treating noncanonical forms as canonical, as well as losing track of uncertainty: Maple tries to check if $X = 0$, and will assume that $X \ne 0$ when it fails to do so. In fact, Maple's normal simplification routines can handle this example:

> simplify(2*log(sqrt(2)+sqrt(3))-log(5+2*sqrt(6)));                                               
                                                 0
> testeq(2*log(sqrt(2)+sqrt(3))-log(5+2*sqrt(6)));                                                 
                                               true

The bug is in Maple's linear algebra code, which tries to select a method for putting matrix entries in canonical form, but by an oversight chooses a weaker method than the default simplification function. (Thanks to Jacques Carette for providing a correct explanation. The first version of this text erroneously referred to Maple's evalb method, which does have the ability to output boolean metavalues but assumes that numerical inputs are already in canonical form, making it irrelevant for the present example.)

In contrast, Calcium handles this problem correctly. The rank is computed correctly in the above examples (using a combination of exact methods to prove equalities and enclosure methods to prove inequalities):

>>> X = 2*log(sqrt(2)+sqrt(3)) - log(5+2*sqrt(6))
>>> ca_mat([[0, X], [0, 0]]).rank()
0
>>> X = 2*log(sqrt(2)+sqrt(3)) - log(5+2*sqrt(6)) + exp(-1000)
>>> ca_mat([[0, X], [0, 0]]).rank()
1

When the rank cannot be determined exactly, Calcium is able to signal that the result is unknown, here (in the Python wrapper) reflected by throwing an exception:

>>> X = 1 - exp(ca(2)**-10000)
>>> ca_mat([[0, X], [0, 0]]).rank()
  ...
NotImplementedError: failed to compute rank

Symbolic input with variables

Let us finally look at the example of symbolic expressions with free variables. If $X$ is a symbolic variable, then Sage, Maple and Mathematica all claim that the rank of $\scriptstyle \begin{pmatrix} 0 & X \\ 0 & 0 \end{pmatrix}$ is 1:

sage: X = var("X")      # create a symbolic variable
sage: Matrix([[0, X], [0, 0]]).rank()
1

> Rank(Matrix([[0, X], [0, 0]]));                                                                  
                                                 1

In[1]:= MatrixRank[{{0, X}, {0, 0}}]
Out[1]= 1

This is of course nonsense, since substituting 0 for $X$ should give a matrix with rank 0. In other words, substitution does not commute with evaluation, fundamentally invalidating the symbolic expression system as a tool for automated logical reasoning:

In[2]:= {MatrixRank[{{0, 0}, {X /. X -> 0, 0}}], MatrixRank[{{0, X}, {0, 0}}] /. X -> 0}
Out[2]= {0, 1}

Depending on our point of view, the underlying problem is either losing track of uncertainty (as well as violating the inclusion principle), or confusing metavalues with values (in a specific sense).

The first interpretation of the results from Sage, Maple and Mathematica is that symbolic simplifications are done "modulo exceptional cases": when an algorithm needs to branch on a condition involving a symbolic variable, it assumes a generic condition and discards the possibility of a special case (for example, assuming $X \ne 0$). Making such an assumption is not a bug in itself—it corresponds to a demotion from an enclosure to a guess—but the absence of information in the output is arguably a bug (a violation of the inclusion principle, or loss of uncertainty information). A more appropriate answer would be to return a metavalue indicating the the rank belongs to the set $\{0, 1\}$, a symbolic conditional result $$ \begin{cases} 0 & \text{ if } X = 0, \\ 1 & \text{ if } X \ne 0, \end{cases} $$ a metavalue giving the rank 1 but indicating that this is just a guess, or just the input expression unevaluated. Mathematica is sometimes able to generate such information (see GenerateConditions), but not here.

The second interpretation is more charitable. The rank should indeed be 1 if we interpret the symbol $X$ as a specific kind of indeterminate: the generator of the polynomial ring $\mathbb{Z}[X]$, say. This $X$ is a specific nonzero value, and not a metavalue representing a range of possible values. In Sage, we can construct this specific value explicitly; the following is undoubtedly correct:

sage: X = polygen(ZZ, "X")
sage: Matrix([[0, X], [0, 0]]).parent()
Full MatrixSpace of 2 by 2 dense matrices over Univariate Polynomial Ring in X over Integer Ring
sage: Matrix([[0, X], [0, 0]]).rank()
1

Unfortunately, it is very common practice in computer algebra systems to use undefined symbols both as metavalues (symbolic variables) and as formal indeterminates (generators of extension rings), employing generator value semantics or symbol metavalue semantics on a case by case basis for such symbols. This kind of implicit mixing of values and metavalues easily leads to bugs.

Word-size integers

Factoid: C# exposes more than 10 integer-like data types, none of which are those defined by (Pythagoras, 500BC). - Tim Sweeney (POPL talk, 2006)

Just for fun, I will also mention the highly sophisticated modern scientific programming language Julia, whose idea of an integer or rational number departs radically from the antiquated propositions in Euclid's Elements:

julia> rank(Matrix([0 2^64; 0 0]))
0

To write mathematically correct Julia code, the user must understand the difference between true integers and Julia's metavalues for integers (word-size integers without overflow checking). The problem is easy to spot in a one-line example, but woe when an overflow is buried deep in the code (being a C programmer myself, I know).

In fairness to Julia, it is possible to get a more reasonable answer by explicitly constructing "big" integers, though with this change the rank function no longer works in my copy of Julia, and if it did, the computation would presumably take several hundred nanoseconds longer than with word-size integers. The Julia designers decided that making the obvious syntax fast was more important than making the obvious syntax mathematically correct. One can argue about whether this tradeoff is the right one. Curiously, I don't think I've ever seen a high-level language that distinguishes between "strings" and "big strings", where "strings" silently truncate after 8 or 16 characters, despite the obvious performance benefits such strings would entail.

Ideas for better software

As I have already alluded to, a possible path toward more reliable mathematical software is to design programming languages with a clear distinction between the semantics of values and metavalues and in which uncertainty about values propagates rigorously by default. Type safety helps, but it is not sufficient in itself; implementation-level types often don't match up perfectly with mathematical types, and metavalue fallacies often occur at the interface between these abstractions or at type boundaries.

In most mainstream programming languages, the builtin objects and type system are too simplistic for doing serious mathematics, so users have to build their own object systems on top of the abstractions provided by the language. Sage, for example, implements its own highly sophisticated mathematical type system on top of Python objects. Sage's system supports advanced automatic coercions based on category theory, and allows distinguishing between different notions of metavalues (rings can be exact or approximate, and so on). Unfortunately, in practice, Sage is also full of bugs—in part but not only because of its vast scope.

One reason why Sage is full of bugs is that it is hard to implement complex abstractions in a dynamically typed object-oriented language with virtually no static correctness guarantees. Another reason is that mathematically desirable semantics are in direct opposition to the core semantics of the language. For example, Python does not allow overloading the not, and and or operators to work with custom boolean types, so it is highly inconvenient to work with boolean metavalues—virtually any predicate will break the value-metavalue abstraction barrier. Indeed, all the control flow and basic collection types such as hash tables in Python are designed with value semantics (or exact metavalue semantics) in mind. AbstractAlgebra.jl in Julia is facing many of the same problems, despite the fact that Julia's type system offers much more structure than Python.

In a programming language with builtin metavalue awareness, one would expect

if condition:
    foo()
else:
    bar()

to evaluate foo() if condition is True, to evaluate bar() if condition is False, and to evaluate both foo() and bar() and return the union of these results (or a symbolic representation of the if statement as a whole) when condition cannot be decided — alternatively, to pick either branch and return a metavalue with a tag indicating that this is just a guess. Throwing an exception is also reasonable, but puts more burden on the user to handle exceptional cases when program termination is undesirable.

In general, such semantics can be implemented by representing as a program as a symbolic expression or abstract syntax tree and performing abstract interpretation or symbolic evaluation. Curiously, many programming languages do support such metavalue computation natively — but only separately from normal execution, for example in type inference or optimization during compilation (or when running external code analysis tools).

A better paradigm for very high-level programming might be to allow writing programs with exact mathematical value specifications, making configurable metavalue computation the main form of program execution, allowing the user to choose whether they want a formal proof certificate, an exact (but not formally certified) result, a rigorous enclosure, or an approximation or heuristic estimate. "Configurability" could be coarse-grained (the user selects whether to run the program in its entirety using certificates, canonical representations, exact but possibly non-canonical representations, enclosures, or guesses/approximations) or much more fine-grained (including the ability to make conditional choices of metavalue representation at critical points in the program). Parts of programs that need to deal explicitly with internal representation (for example, numerical algorithms designed explicitly around floating-point rounding behavior) should presumably be compartmentalized.

The question is not how to make this possible—it is already possible in any language with reflection capabilities (with much more effort, in any Turing-complete language)—the question is how to provide optimal language support (both in terms of syntax, semantics and efficiency) and how to build good libraries around this paradigm. Fully symbolic systems such as Wolfram Mathematica support working in this paradigm relatively well, but this effort is undermined by dubious semantics for numbers and symbolic variables.

Of course, mathematically correct semantics in the way I've advocated here (essentially, in the form of contagious uncertainty) are not the only characteristics of software quality and correctness. Termination and resource usage during execution are other important aspects. More complicated metavalue systems require more computational resources, and loop termination for nonexact metavalues can be highly nontrivial (this problem can to some extent be mitigated using high-level functional and symbolic constructs instead of procedural code). The purpose of a sophisticated metavalue system is also defeated if the system is too hard to understand. Low-level programming languages use fixed-size integers and floating-point numbers because they have $O(1)$ complexity and are easy to reason about locally; programs using such types are easy to analyze in terms of their operational semantics. There are also potentially many other ways users may want to propagate information within a program; metavalues in the sense I have discussed here only reflect one possibility.

Whole-program formal verification is the gold standard of software correctness. This is essentially a question of treating whole programs, and not just the values they process, as the objects of computation (a view advocated by Bruno Buchberger). Strong semantics for values and metavalues alone is more of a traditional crutch, but could also be part of formally certified solutions by solving subproblems automatically and overall making software easier to reason about formally (interval arithmetic is a good example of this).

Updated 2020-02-25. Thanks for Aaron Meurer for correcting a remark about Python's equality method and Jacques Carette for providing a correct explanation of why Maple gets the rank wrong.

fredrikj.net | Blog index | RSS feed | Follow me on Mastodon | Become a sponsor