#### Supplement to Chance versus Randomness

## B. Further Details Concerning Algorithmic Randomness

### B.1 Further Details Concerning Martin-Löf Randomness

#### B.1.1 Statistical Tests in Martin-Löf’s Framework

Martin-Löf makes use of the notion of a *statistical
test*. In statistical hypothesis testing, we will reject a
hypothesis if an outcome is observed that is improbable (supposing the
hypothesis were true) at some set significance level; in that case,
the outcome is significant. Correspondingly, we may define a sequence
to be *significant* if it is a member of an effective measure
zero set. A random sequence, then, is not a significant sequence.

How are these recursive significance tests defined? We illustrate the idea using Martin-Löf’s (1966: 604) example of the property of large numbers (having a digit frequency equal to \(\frac{1}{2} )\). He restricts attention to significance levels of the form \(2^{-m}\), and suggests that we would reject a finite initial substring \(x_1 \ldots x_n\) at the level \(2^{-m}\) if the difference between the observed relative frequency in that string differed ‘too much’ from \(\frac{1}{2}\). By ‘too much’ Martin-Löf means that fewer than \(2^{n-m}\) strings which are in fact initial subsequences of a random sequence, would have a relative frequency differing by that much from \(\frac{1}{2}\). (By the definitions in supplement B.2, this means that the string is rejected if it is a member of a set which has measure less than \(2^{n-m}/2^n = 2^m\).)

An infinite sequence \(x\) is rejected at level
\(2^{-m}\) iff we would reject some initial string
at that level. Finally, an infinite string is rejected *tout
court* if, for every level of significance, there is an initial
string of that sequence that we would reject at that level
(Martin-Löf 1969: 609). A Martin-Löf random sequence is one
we would not reject.

#### B.1.2 Relations between Martin-Löf Randomness, von Mises Randomness, and other Measure Theoretic Accounts, including Schnorr Randomness

If \(\phi\) is an effectively computable (recursive) place selection, the compound test of first applying \(\phi\) and then some recursive significance test is also a recursive significance test (since the class of recursive functions is closed under functional composition). Therefore the class of ML-random sequences is closed under effective place selections, and contains only sequences with the right limit relative frequencies. Given the characterisation of vM-random sequences in section §2.1.1, it is clear that the ML-random sequences are a subset of the vM-random sequences. Ville’s result shows that this inclusion is strict, a result strengthed by a result of van Lambalgen (1987b: §4). It also follows immediately from the above claims that the class of ML-random sequences is such that every admissible subsequence of a random sequence is itself Borel normal. (It is not known whether the requirement that all admissible subsequences be themselves Borel normal is sufficient for ML-randomness.)

Various refinements of ML-randomness have been offered, usually by
varying the conditions on what should count as an effective
statistical test. Schnorr (1971), for example, restricts the effective
tests to a narrower class than Martin-Löf permits, and thus has a
correspondingly broader set of random sequences. For Schnorr, the
effective uniform measure zero sets should be defined as the
intersection of a sequence of sets whose measures are not merely
bounded by computable reals (as in section
B.2),
but are themselves computable reals. Schnorr
offers an argument that this requirement—that the significance
tests should be *total recursive*, not just recursive as
Martin-Löf requires—is needed for genuinely effective
significance tests. Schnorr contends that only when the test is total
recursive can we ‘visualize’ the property of stochasticity
that is being tested for, and this is the only constraint on whether
we can genuinely grasp the property (van Lambalgen 1987a: 70).

Schnorr also provides an interesting connection with the gambling
motivation behind von Mises’ view, as Schnorr is able to characterise
both his and Martin-Löf’s conception of randomness in terms of
the failure of a generalised kind of betting strategy called
a *martingale*, which generalises the place selection
characterisation of betting strategies (van Lambalgen, 1987a:
§3.4; Martin-Löf, 1969b: §9). A martingale determines a
betting strategy that depends not only on the outcomes observed, but
also on the accumulated winnings of the gambler employing it. (In more
detail: a martingale is a function \(d\) from finite binary
sequences into the non-negative reals such that \(d(\omega) =
(d(\omega 1) + d(\omega 1))/2\). The
function \(d\) is the expected winnings of a gambler following
the strategy on the initial subsequence of a given sequence, subject
to the condition that the expected winnings on a sequence is the
average of the expected winnings of the next sequence—in effect,
the extended sequence that becomes actual pays double the wager placed
on it. The values of \(d\) on \(\omega , \omega 1\), and \(\omega 0\)
determine the entire betting strategy.) A martingale *succeeds*
on a sequence if the gambler’s expected winnings are arbitrarily great
(i.e., the limit of \(d\) as the length of the sequence increases
is unbounded). A *constructive* martingale is one for which a
computable function (of \(\omega\) and some natural number \(t)\)
monotonically approaches \(d(\omega)\) from below as \(t\)
increases. Schnorr proves that a sequence is ML-random iff no
constructive martingale succeeds on it. A *computable*
martingale is one where \(d\) itself is computable. He defines
a *Schnorr-random* sequence as one on which no computable
martingale succeeds sufficiently quickly to be effectively
detected. It is clear that every ML-random sequence is Schnorr random,
but not vice versa.

Schnorr motivates his ‘sufficiently quickly’ condition by observing that the idea of Martin-Löf tests is to represent statistical tests, but it ends up including many statistical tests that do not ‘have a physical meaning’ (Schnorr 1971: 255). He continues, observing that the statistical properties with a physical meaning

are intuitively those properties that can be established by statistical experience. This means that a sequence fails to be random in this sense if and only if there is an effective process in which this failure becomes evident. On the other hand, it is clear that if there is no effective process in which the failure of the sequence to be random appears, then the sequence behaves like a random sequence. Therefore the definition of random sequence has to be made in such a way that this sequence is random by definition. (Schnorr 1971: 255)

Schnorr 1971 actually discriminates a number of different randomness properties using martingales, and further definitions give rise to other natural classes of random sequences: see also Li and Vitányi 2008: § 4.5.8, Dasgupta, 2011: §11, Downey and Griffiths 2004, and Downey and Hirschfeldt, 2010: ch. 7. But Schnorr proposes that, for the reasons given, his account of random sequences is more conceptually adequate; and suggests a rival to the Martin-Löf-Chaitin thesis:

Thesis: A sequence behaves within all effective procedures like a \(p\)-random sequence iff it is Schnorr \(p\)-random. (Schnorr 1977: 198)

However, the philosophical (as opposed to mathematical) rationale
behind these notions is not yet clear. Schnorr’s notion of
visualisability, whatever its merits, is not clear enough to be
decisive, especially because even arbitrary total recursive tests are
still far from being graspable in any ordinary everyday sense. And
Schnorr’s restriction has other drawbacks; for one, there is no
universal test of Schnorr randomness (Downey and Hirschfeldt 2010:
§7.1). This is because, if the property is satisfied by a
computable example (as Schnorr requires for visualisability), there is
a computable sequence which is not rejected by the test. If there were
a universal test for Schnorr randomness, there would be a computable
sequence that is not rejected by it—but that would be a
computable random sequence, which cannot be. So of course it is an
intended feature of Schnorr’s account that there be no universal test;
but that does make the space of randomness properties a little more
unmanageable on Schnorr’s view. In general, one might be worried (for
good reason) about Schnorr’s *operationalist* idea that a
non-random sequence must be one that can be effectively determined to
be non-random: we don’t normally take the existence of good evidence
for a property to be equivalent to the presence of the property.

Another potential difficulty with Schnorr randomness is that it is not quite as robust as ML-randomness. The class of ML-random sequences has many apparently very different mathematical definitions (in terms of effective tests, in terms of Martingales, and—as we will see in §2.3—in terms of complexity) and is thus apparently a mathematically robust notion; it is not clear that Schnorr randomness has the same robustness. There is a complexity based account of Schnorr randomness in terms of computable prefix free machines (Downey and Griffiths 2004), as well as one in terms of computable martingales (Downey and Hirshfeldt 2010: §7.1). But ML-randomness satisfies van Lambalgen’s theorem on relative randomness (van Lambalgen 1987a, Dasgupta 2011: §10), for if \(x \oplus y\) is a random sequence which results from interleaving two other sequences \(x\) and \(y\), then those two sequences will be random relative to each other (that is, even complete knowledge of the outcomes in \(x\)—i.e., using \(x\) as an oracle—won’t improve predictability of the outcomes in \(y)\). But Schnorr random sequences do not satisfy van Lambalgen’s theorem (Downey and Hirschfeldt 2010: §5.9)— in fact, there is a Schnorr random sequence such that the even outcomes compute the odd outcomes and vice versa. However, the strongest reason for suspecting that Schnorr randomness is not the correct account of random sequences is that there is a Schnorr random sequence that is not von Mises-random (Wang 1996: Theorem 3.3.5). So there is a Schnorr random sequence that does not have invariant subsequence frequencies under admissible place selections, and is therefore susceptible to exploitation by a gambling system. This seems compelling reason to think that Schnorr randomness is not conceptually adequate.

Having said this, these objections to Schnorr randomness open the
door to similar objections to ML-randomness, though one might wonder
whether the purported insufficiencies of ML-randomness are nearly as
worrisome as the insufficies of Schnorr randomness, no matter how much
Schnorr might embrace as intended some of these apparent
counterexamples. But a more nuanced response, neither opting for
ML-randomness or Schnorr randomness as the ‘correct’
account of our intuitive notion of randomness, might appeal in light
of some recent results. (This *pluralist* approach to
randomness is reminiscent of the pluralist position with respect to
the informal notion of *logical consequence* defended by Beall
and Restall (2006).)

The basic idea is the familiar one that random sequences
are *typical*, in the sense of most expected
(§2.1). This notion of typicality is
connected to a mathematically important style of quantification, where
a given result is said to hold for *almost all* members of a
given domain. For example, consider this theorem:

Theorem 9(Lebesgue, via Jordan). Every real-valued function of bounded variation is differentiablealmost everywhere. (Brattka, Miller, and Nies 2011: §1.1)

This theorem applies to arbitrary functions. If we restrict our
attention to *computable* functions, we can obtain this
result:

Theorem 10(Demuth; Bratka, Miller, and Nies 2011: § 3.2). Every computable real-valued function of bounded variation is differentiable at exactly those reals which are ML-random.

So, for computable functions, the Martin-Löf random reals (those represented by a ML-random sequence) provide a precise measure one set that corresponds to the ‘almost everywhere’ of the non-effective Theorem 9. The ML-random reals are exactly those which are typical with respect to this property of computable functions.

The interesting thing is that other ‘almost everywhere’
results of classical analysis *do not* end up holding at only
the ML-random reals when we move to computable functions. For
example:

Theorem 11(Rute 2011; Pathak 2009; Pathak, Rojas, and Simpson). Every variation computable absolutely continuous function is differentiable at exactly those reals which are Schnorr random.

Thus another almost everywhere property of classical analysis, when effectivised, holds at just the Schnorr reals. In general, it is possible to characterise randomness notions (such as ML-randomness, Schnorr randomness, computable randomness, etc.) in terms of the differentiability of some appropriate class of computable functions (the ‘main thesis’ of Nies 2011). Although the different notions of randomness end up corresponding to the differentiability of slightly different classes of appropriate computable functions, in a sense there is a common core notion of typicality in play here. But unlike the case of Church’s thesis, the existence of many overlapping but distinct precisifications of ‘typical with respect to differentiability of (some sort of) computable function’ seems to undermine the idea that there is any one precisification that perfectly deserves the name. On the other hand, the thesis that randomness is intimately linked to typical differentiability is not one that has much intuitive support in the first place, and a kind of pluralism with respect to the different realisers of this description may not appeal. The more direct considerations for ML-randomness, and against Schnorr randomness, in particular that ML-random sequences are strictly a subset of the Schnorr random sequences, might justify us in continuing to give some support to the Martin-Löf-Chaitin Thesis.

#### B.1.3 Some Subtleties Involving Computable Measures

Standard approaches to algorithmic randomness, those summarised in the state of the art volumes of Downey and Hirschfeldt (2010) and Nies (2009), are based on randomness with respect to the standard Lebesgue measure over the space of sequences (discussed in supplement B.2). The limitation this poses is that we also ordinarily might want to attribute randomness to other kinds of objects than equifrequent sequences. As discussed in §4.3, to connect chance with randomness will in particular involve consideration of other Bernoulli processes which produce sequences with unequal outcome frequencies. Also in §4.3, a generalisation of Schnorr’s theorem to arbitrary computable measures was introduced. While this allows biased sequences to count as both Kolmogorov and ML-random, and looks to be a natural way to maintain a univocal notion of randomness in the face of the compressibility of biased sequences, computable measures can have some interesting features.

The feature most of interest here is that a some computable
measures assign positive measure to a some individual infinite binary
sequences. (The uniform Lebesgue measure cannot assign positive
measure to any point of the Cantor space, on pain of violating the
axioms of measure theory.) Any sequence which receives positive
measure is called an *atom* of the measure under
discussion. Examples are easily constructed: consider the computable
measure which assigns measure 1/2 to the individual sequence
consisting only of 0s, and divides the remaining measure uniformly
over the other sequences. It turns out that the only individual
sequences which can get such positive probability are the computable
sequences: a sequence is an atom of a computable measure if and only
if the sequence is computable (Kautz 1991). But this seems to pose a
problem, for an individual sequence can be random with respect to a
measure, because it has positive probability (and therefore is part of
every \(\lambda\)-measure 1 set of sequences), even though it is
computable. Atoms are thus sometimes referred to as
‘pseudorandom’ sequences, not because they mimic a
Lebesgue-random sequence, but because they are functionally like
random sequences with respect to their preferred measure, but are in
some sense not ‘really’ random because of their
computability. Such pseudorandom sequences might pose a problem for
the generalisation of Kolmogorov complexity to biased sequences,
because it seems that the generalisation ends up classifying as random
some atoms, which are apparently not random because they are
computable.

There is an interesting result that pushes back against this
unwelcome conclusion (Downey and Hirschfeldt, § 6.12). It turns
out that for every computable measure \(\lambda\), if \(\sigma\) is random
with respect to that measure and is *not* an atom of \(\lambda\)
(i.e., is not itself a computable sequence), then there is an
ML-random sequence \(\sigma'\) that is in the same *Turing
degree* as \(\sigma\) (a result due to Levin and Zvonkin (1970) and
Kautz (1991)). A Turing degree, for our purposes, is a set of
sequences such that for any pair of sequences \(\sigma_1\) and
\(\sigma_2\) in the set, there exists a Turing machine equipped
with an *oracle* concerning \(\sigma_1\) that computes
\(\sigma_2\). (An *oracle* concerning a sequence is a
special state in a Turing machine that allows that machine, having
computed a string \(x\) to obtain a ‘yes/no’ answer
concerning whether the \(x\)-th member of the sequence has value
1; equivalently, conceiving of the sequence as coding a set of natural
numbers, whether or not \(x\) is a member of the set—see Li
and Vitányi 2008: 38.) So if a given \(\lambda\)-random sequence
is of the same Turing degree as a ML-random sequence, it is possible
to *effectively* recover an unbiased random sequence (one that
is ML-random) from an uncomputable biased sequence (one that is
\(\lambda\)-random). Arguably, this shows that it is ‘at least as
“hard”’ (Li and Vitányi 2008: 44) to determine the
content of the \(\lambda\)-random sequence as the genuine ML-random
sequence (and vice versa). In that sense, the non-computable
\(\lambda\)-random sequences are at least as disorderly as the ML-random
sequences, and having admitted that the latter capture the notion of
randomness for a sequence tolerably well, we should also admit that
the former do so as well.

One way of thinking about this result is that random sequences are those non-computable sequences that are random with respect to a computable measure, for those sequences are in some sense equivalent to the sequences that are random with respect to the Lebesgue measure. The non-redundancy of the clause requiring the non-computability of the sequence is due to existence of atoms in measures other than the Lebesgue measure, but that those atoms have non-zero measure may be regarded as an unintended side-effect.

Another way of thinking about it would involve excluding atomic countable measures from the domain of legitimate probability functions over the space of sequences. One way of doing this would be to argue that proper chance functions are such that sequences with the same frequencies should have the same measure. Since any atomic infinite sequence shares a frequency with the countably many sequences that differ from it in only finitely many places, imposing that condition would exclude atomic measures. We could then classify the random sequences as those that are random with respect to some proper measure (where the propriety of measure is justified in some way from the idea that genuine chances that underly some particular sequences should be proper in the sense just introduced). This approach secures a notion of randomness for biased sequences, but at the cost—noted in the main text—of putting a connection with chance in by hand. If that connection with chance is not presupposed, the exclusion of atoms looks unmotivated, except by the result connecting ML-random sequences with the uncomputable \(\lambda\)-random sequences. And there is reason to think that this approach won’t work in any case: some recent work suggests that there is some atomless measure for any point in the Cantor space, with the exception of those points falling in a poorly understood countable set known as NCR\(_1\) (Reimann and Slaman 2008). So not every atomless measure will be proper; what the precise role of atomlessness is with respect to \(\lambda\)-randomness is yet to be made entirely clear.

### B.2 Lebesgue Measure and Effective Measure

Let \(X\) be the set of all sequences of outcomes of a binary
process. Let \(X^{\lt \omega}\) be the set of finite
sequences of outcomes of a binary process, or *strings*.
If \(x \in X\), let
\(s_x (i)\) be the \(i\)-th member
of sequence \(x\). If \(\sigma\) is a finite sequence of outcomes,
let \(N (\sigma)\) be the set of all (finite or
infinite) \(x \in X\) which begin with
\(\sigma\).

We now define a uniform Lebesgue measure \(\mu\) on subsets of the
Cantor space (Dasgupta 2011: § 2; Earman 1986:
138–9; Gowers 2008: III.55). We begin by assigning measures to
the sets \(N(\sigma)\), called the *basic sets*.
Note that half of the possible outcome sequences begin with 1, and half
with 0; and of those that begin with 1, half begin 11, the other half
10; and so on. In general, where \(\lvert\sigma\rvert\) denotes the length
of the string \(\sigma\), the proportion of all possible outcome
sequences that begin with \(\sigma\)—the measure of
\(N(\sigma)\), denoted
\(\mu(N(\sigma))\)—is
\(1/2^{\lvert\sigma\rvert}\). We now extend this
measure to the full space. An *open set* is defined to be an
arbitrary union of basic sets. A basic set is
*maximally contained* in \(A\) if
\(N(\sigma) \subseteq A\)
and for every proper initial segment \(\tau\) of
\(\sigma\), it is *not* the case that
\(N(\tau) \subseteq A\) . It turns out that every
open set \(G\) is a union of non-overlapping
(‘disjoint’) basic sets that are maximally contained in
\(G\). The measure of a union of disjoint sets is the sum of the
measures of the sets, so if the open set \(G\) decomposes into
disjoint basic sets \(N(\sigma_1),
N(\sigma_2),\ldots\), then
\(\mu(G) = \sum_i \mu(N(\sigma_i))\).

A set \(E\) is said to have *measure zero* if there is
an infinite sequence of open sets \(G_1,
G_2,\ldots\) such that (i) \(E\) is contained
in each \(G_i\) (equivalently, \(E \subseteq \cap_{i}G_{i})\); and
(ii) \(\mu(G_i) \le 2^{-i}\). (The idea is that a small set should be
coverable by an open set which is also small. Conditions which do the
same job as (ii) can be formulated in various ways, with various
exponents; it is convenient to have the measures of the open sets
diminish quickly.) A set \(E\) is
*measurable* iff it can be approximated arbitrarily closely by
the union of finitely many basic sets. (More precisely, if for any
\(\varepsilon \gt 0\), there is a finite union of basic sets
\(F\) such that the symmetric difference between \(E\) and
\(F\) can be covered by a set of measure less than
\(\varepsilon\). The symmetric difference is the set of elements which
are in exactly one of \(E\) and \(F\); it is the
‘mismatch’ between the two sets.)

The measurable sets form a \(\sigma\)-algebra, which is a set with the
nice properties of being closed under complementation, union, and
intersection. Moreover, it can be shown that the measure \(\mu\)
can be naturally and uniquely extended to the set of all measurable
sets, where the measure defined has corresponding nice properties (so
that, e.g., the measure of the complement of \(E\) is \(1 - \mu(E)\), if \(E \subseteq F\) then
\(\mu(E) \le \mu(F)\), etc.). The
measure so-defined is the *Lebesgue (uniform) probability*
*measure* on the Cantor space.

Each individual \(x \in X\) forms a singleton of measure zero, since if \(\sigma_n\) is the initial \(n\) elements of \(x\), then \(x \in N(\sigma_n)\) for all \(n\). Each \(N(\sigma_n)\) is an open set, because a basic set. Moreover, there are infinitely many such sets, and for each \(\mu(N(\sigma_n)) = 2^{-n}\). So this singleton set satisfies the conditions for a measure zero set. (A countable union of measure zero sets also has measure zero, so all countable sets of sequences are also of measure zero.)

We now apply to measure theory some ideas from the theory of
computability. A set is *recursively enumerable* iff it is the
range of a (total or partial) recursive function on the natural
numbers (Boolos *et al.* 2003: §8.3). Since the recursive
functions are just those which are effectively computable, by the
Church-Turing thesis, we may also call these sets *computably
enumerable*. We defined an open set as an arbitrary union of basic
sets; we now define an *effective open set* as a union of basic
sets which are members of a recursively enumerable set. There are only
countably many recursive functions, so there are only countably many
recursively enumerable sets of basic sets. Since there are countably
many basic sets, there are uncountably many sets of basic sets. So
while there are uncountably many open sets, there are only countably
many effective open sets; the later notion is much more restrictive. A
sequence of sets \(G_1, G_2,\ldots\) is *uniformly effective
open* if there is a recursive function \(f\) from pairs of natural
numbers \((m, n)\) to basic sets such that each \(G_{m} =
\cup^{\infty}_{n=1} N(f(m, n))\). That is, the sequence is uniformly
effective open if not only is each member effective open, but there is
a single recursive function that enumerates, for each member of the
sequence, the set of basic sets of which it is the union.

Adapting the earlier definition in the obvious way, a set is said to
have *effective measure zero* if there is an infinite sequence of
uniformly effective open sets \(G_1,
G_2,\ldots\) such that (i) \(E\) is covered by
each \(G_i\); and (ii)
\(\mu(G_i) \le 2^{-i}\).
(The idea is that an effectively small set should be coverable by an
effective open set which is also small.) A set is *effective measure
one* iff its complement is effective measure zero.