Supplement to Chance versus Randomness
A. Some Basic Principles About Chance
- A.1 The Principal Principle
- A.2 The Basic Chance Principle
- A.3 Frequencies, Reductionism, and the Stable Trial Principle
The most prominent constraint has been the idea that chances, when known, should guide rational credence, at least when other things are equal. Reasonable people who know the chance of some outcome, and know nothing else of relevance, should set their personal confidence in the outcome eventuating to the same value as the chance. This commonsensical claim was made precise and elevated to the status of a principle in Lewis (1980), who called it the ‘Principal Principle’—‘principal’ because ‘it seems… to capture all we know about chance’ (Lewis 1980: 86). Lewis' more precise formulation goes as follows. Let p be some proposition about the outcome of some chance process (say, a coin toss, or the decay of an atom of some radioactive element), C be a reasonable initial credence function, and E be the background evidence at t, where E crucially doesn't include information that pertains to the truth of p except through its chance (Lewis 1980: 92; see also Hoefer 2007: 553). It is standardly assumed that, at least, historical information prior to t and information about how possible histories and possible laws bear on possible chances is admissible. Then the Principal Principle is (assuming C(⌜Ch(p) = x⌝ ∧ E) > 0):
(PP) C(p|⌜Ch(p) = x⌝ ∧ E) = x.
Assuming one updates credences by conditionalising, this form of the Principle Principle then entails that when rational agents come to know the chances, their credences are equal to the chances when they have no inadmissible information about the outcome. Chance thus plays the role of an ‘expert’ probability function, a norm for the credences of rational agents (Gaifman, 1988; Hall, 2004). The expertise of the chance function in the original Principal Principle is unconditional—rational initial credences should simply adopt values equal to the chances. When the evidence E is ordinary, this is unproblematic; but if there are ever cases where the evidence trumps the chances, a more nuanced principle is required—the New Principal, of which more below.
Lewis showed that, from the Principal Principle, much of what we know about chance follows. For example, if it is accepted, we needn't add as a separate constraint that chances are probabilities. Suppose one knew all the propositions stating the chances at some particular time of all future outcomes, and had no inadmissible evidence. Suppose one began with rational credence before learning the chances. Then in accordance with the Principal Principle, one assigns the same value to conditional credence (conditional on this evidence about the chances) in each future proposition as the value of its chance. Since rational conditional credences are probabilities, so too must chances be (Lewis 1980: 98). Furthermore, the chance of past events is always 1. Suppose A has already happened; since historical information is admissible, the Principal Principle implies that the chance of A, at t, is 1. It does not imply that the chance of A was always 1 (as the evidence admissible at one time may well have been inadmissible at some earlier time); so chances change over time, in accordance with the common belief that only the future is chancy, and the past is fixed. And finally, since an agent's credences guide their actions—that's partly what makes them that agent's credences—if the agent updates their credences rationally and in line with the Principal Principle, then their beliefs about chances will guide their credences. Chance, then, is the kind of probability which ‘is the very guide of life’ (Butler 1736: Introduction).
Of course someone who didn't believe in credences wouldn't accept the Principal Principle (Kyburg 1978), and there have been modifications and amendments proposed to respond to various problems some have perceived with the Principal Principle (see below). But the former group is vanishingly small in number, and even those who propose modifications agree that the Principal Principle is an extremely good approximation to the correct principle. Even if PP turns out to be not exactly right, the commonsense belief that it gives precise form to would still remain as a guiding constraint for any theory that could reasonably be considered a theory of chance:
A feature of Reality deserves the name of chance to the extent that it occupies the definitive role of chance; and occupying the role means obeying the old Principle [PP], applied as if information about present chances, and the complete theory of chance, were perfectly admissible. Because of undermining, nothing perfectly occupies the role, so nothing perfectly deserves the name. But near enough is good enough. If nature is kind to us, the chances ascribed by the probabilistic laws of the best system will obey the old Principle to a very good approximation in commonplace applications. They will thereby occupy the chance-role well enough to deserve the name. To deny that they are really chances would be just silly. (Lewis 1994: 489)
There is thus widespread agreement that the Principal Principle, or something close to it, captures a basic truth about chance.
As Lewis states, it is the problem of ‘undermining’ which has come to be seen as most problematic for the Principal Principle. This is the problem that knowing the chances itself seems to provide trumping evidence about the chances! According to reductionist theories of chance, discussed further below (§3), the values of the chances are fixed by the total history of occurring events. A simple view of this sort is the view that says the chances are the occurring frequencies, rounded up to give simple values. So if a coin is repeatedly tossed and lands heads about half the time, round up the frequency to one-half exactly—that is the chance of heads. But if the chance of heads is now ½, the chance of one million consecutive heads is 1/(2106) > 0. Supposing the coin has only been tossed a reasonably small number of times, a million further heads will swamp the currently observed frequencies; in other words, if that very surprising but nevertheless possible event were to occur, the chance of heads would be 1, or very close to it—not ½. Such a possible future, which has some chance of making the current chances false, is called an undermining future. The problem arises because the current chance of an undermining future is positive, but since if the undermining future came to pass, the current chances would not be what they are—they would be different. So we know that if the current chances are as we think, the undermining future will not come to pass. So we can know a priori that if the chances are as we think they are, the undermining future is impossible, and we should assign no credence in the undermining future conditional on the chances being as they are. But the PP entails that we should place some positive credence in the undermining future conditional on the chances being as they are. Contradiction (Lewis, 1994; Hall, 1994; Thau, 1994).
In response, Lewis, Hall and Thau advocate moving to another principle. Hall diagnoses the problem as arising because the present chances involve, on this reductionist picture, information which interacts problematically with the chances assigned to undermining futures—they aren't independent of one another. So the chances aren't an unconditional expert. But the chances are still expert for you, not in the sense that you should slavishly adopt the chances as your credences, but in the sense that the chance, given your current information, of some outcome is still a better guide to what you should set your credence to be than any alternative. That is, Hall (2004) argues, chance is an analyst-expert, which ‘earns [its] epistemic status because [it] is particularly good at evaluating the relevance of one proposition to another’—in this case, evaluating the relevance of your evidence (even evidence about the chances) to future outcomes. In that case, we should adopt only something like this principle connecting chance and credence:
C(p|⌜Ch(p|E) = x⌝ ∧ E) = x.
If we adopt this principle instead of PP (modulo some important qualifications that Hall (2004: pp. 102–5) discusses), we can avoid the problem of undermining futures. Consider an undermining future F. It remains true that it has some chance of coming to pass, so that there is some chance that the chances are otherwise. Earlier we used the PP to show that rational agents should therefore have some positive credence in F conditional on the chances, even though they are a priori inconsistent. But the above principle only tells us that
(3) C(F|⌜Ch(F|E) = x⌝ ∧ E) = x.
Since it follows immediately if E includes facts about the present chances that E and F are inconsistent, the chance of F conditional on E is zero and therefore x = 0—we cannot derive from the analyst-expert role of chance the problematic claim. This is just to say that if the present evidence suffices to establish the chances, it also suffices to rule out undermining futures which would falsify what E says about the present chances (though, since undermining futures are possible, evidence about chances is not perfectly admissible).
For various reasons, Chance-Analyst hasn't been thought to perfectly capture the full sense in which we should epistemically defer to chance. For one thing, that principle involves credence being conditionalised on a proposition about chance, that the chance function is itself not conditionalised upon; it seems perhaps odd that one would count as failing to defer to chance if the chance function did not assign chance 1 to the proposition about chance that you are conditionally certain of, and yet that is a consequence of Chance-Analyst (Reaber 2010, Other Internet Resources). In this sense, an overall better formulation of the New Principle that Lewis and others invoke to respond to the problem of undermining futures might be the version proposed by Joyce (2007). This is a ‘global’ principle, because it involves conditioning on a claim about an entire probability function at once. Let chance be a non-rigid designator of the actual chance function (varying from world to world), and let P rigidly designate some particular probability function. Then, Joyce proposes, this principle captures what it means for someone to defer their credence to the chances:
(NP) Let C be the credence function for someone whose evidence is limited to the past and present. Then, if the chances are given by probability function P, then C(p|⌜chance = P⌝) = P(p|⌜chance = P⌝). (Joyce, 2007: 198)
Joyce notes that an aspect of his principle NP is a commitment to the earlier principle Chance-Analyst, so adopting NP as one's conception of deference to chances allows the reductionist to block the problem of undermining in the way sketched above.
It is an issue for Humeans whether our evidence E really does make the present chances part of our present evidence. If not, it will be difficult to apply either of these principles, NP or Chance-Analyst, in a way that generates unconditional credential judgements. Lewis and others worry about how, and to what extent, we do know about the present chances—the answer, that we can know them to the extent that they are independent of possible undermining futures, is not very helpful. The further observation that most future events, even if they make some small contribution to the coming to pass of an undermining future, can be treated as if they are independent of the present chances without risk of significant error, is more helpful. It shows that, for most particular localised future events, we can treat chance as governed by the PP (as what Hall calls a ‘database expert’), as Chance-Analyst simply reduces to the original PP. Indeed reductionists and non-reductionists alike can accept the New Principle NP, knowing in the latter case that the chances are independent of the future and so the original PP is fine, and in the former case that for all practical purposes the original PP is fine. (More on Lewis' own theory of chance, and its connection with laws and his broader metaphysical concerns, can be found in the entry on Lewis, Weatherson 2010: §5.1.)
Others have argued that the original puzzle only arises because even the original PP inappropriately conditionalises the credence on evidence E which includes information about the chances. Ismael (forthcoming) argues that the real principle to adopt is the following, where ht is just the history up to t:
(UPP) C(p|ht) = Cht(p).
This principle also is not susceptible to undermining, because one never conditionalises on the theory of chance (assuming that the past history itself does not fix the chances). One won't ever in general know the right-hand side of this equation; but, by the theorem of total probability and general principles about current estimates of unknown quantities, it can be estimated as the weighted sum of the chances assigned by various future histories, weighted by your credences in those histories. Ismael's final recommendation is ‘that you should adjust credence in A to your best estimate of the chances’.
Contrary to Lewis' contention, however, the Principal Principle (or its sophisticated variant) is not the only truth about chance. As Arntzenius and Hall (2003) have pointed out (in connection with the problems for reductionism about chance), some probability functions which obey the Principal Principle perfectly are very unlike chances. They conclude that we know more about chance than is captured by the Principal Principle alone, because we know that these functions, constructed simply to meet the Principal Principle but with no independent claim to be classified as chances, are not chances. This conclusion has been widely accepted. So while the Principal Principle captures a lot of what we know about chance, there are other truths about chance that help to narrow the field of probability functions which could be chances still further. The problem with the existence of additional principles is that perhaps nothing meets all of them perfectly. Indeed, this problem seems to be real, for every function which has been proposed to meet most of the platitudes about chance has turned out to violate some others (Schaffer 2003). But most have agreed that we may adapt Lewis' remarks above, maintaining that the function which is near enough to meeting all or most of what we take ourselves to know about chance is near enough to chance to deserve the name.
Prominent among these other principles is the Basic Chance Principle (BCP), connecting chance and possibility. It was named by Bigelow, Collins and Pargetter (1993), who give this informal argument for the existence of such a connection:
In general, if the chance of A is positive there must be a possible future in which A is true. Let us say that any such possible future grounds the positive chance of A . But what kinds of worlds can have futures that ground the fact that there is a positive present chance of A in the actual world? Not just any old worlds. … [T]he positive present chance of A in this world must be grounded by the future course of events in some A-world sharing the history of our world and in which the present chance of A has the same value as it has in our world. That is precisely the content of the BCP. (Bigelow et al. 1993 : 459)
In other words, if the chance of A is non-zero in some world at some time, then A will in fact happen at some possible world which shares the history and chances with w —if not w itself, then a situation very like w. If Chtw is the chance distribution at t in world w, their formulation of the BCP is this:
Suppose x > 0 and Chtw(A ) = x. Then A is true in at least one of those worlds w′ that matches w up to time t and for which Cht(A ) = x. (Bigelow et al. 1993 : 459)
Again, in accepting the connection between chance and possibility expressed by the BCP, we needn't endorse this precise formulation. Schaffer, for example, though he endorses a principle stronger than the BCP, motivates it by this informal gloss: ‘if there is a non-zero chance of p , this should entail that p is possible, and indeed that p is compossible with the circumstances’ (Schaffer 2007: 124). Mellor (2000) endorses this basic connection between chance and possibility, arguing that there is a ‘necessity condition’ on chance, ensuring that chances behave just like modalities. In particular, on Mellor's view, chance one behaves like necessity, and chance zero like impossibility, so p's having an intermediate chance entails that it is possible. Finally, Eagle (2011) agrees with Mellor on the formal features of chance ascriptions, but argues that the connection is better taken to hold between chance and the modal ‘can’ of ability ascriptions. Evidence for this thesis comes from the widespread endorsement by ordinary speakers that there is a non-zero chance that p iff p can happen, where ‘can’ expresses the dynamic, ability-attributing, modal. This is not the BCP, however, because the best semantic accounts of ‘can’ are very unlike the conditions on ‘possibly’ imposed in the BCP (Kratzer, 1977; Lewis, 1979b). So the details of the chance-possibility connection could turn out very differently while still endorsing the broad thrust of the BCP.
The BCP is not a trivial truth, and is not universally accepted. One objection is that the BCP is inconsistent with the existence of undermining futures, those futures which have a present chance of coming to pass, but which if they did come to pass would entail that the present chances (or laws) are otherwise than they are. As before, the present chance of tossing a fair coin one million times and it landing heads every time is small but non-zero. If this event were to occur in some possibility, the chance of heads for that coin in that world would not—or so the story goes—be ½; so there is no world with the same chances and history as ours in which this event occurs, contrary to the BCP. The key to the existence of undermining futures is the broadly Humean (or reductionist) principle that whatever the chances are, they should supervene on the total arrangement or pattern of occurrent events (see §A.3). So if the counterfactual pattern of events can be so as to undermine the actual chance, the BCP will fail. Thus commitment to the BCP prima facie involves a commitment to a non-Humean conception of chance, one on which undermining is impossible. On the other hand, such a view avoids the problems that undermining generates for the original PP, so defenders of BCP can also retain the original version of the that Principle.
This objection only has force if one accepts Humean reductionism about chance, and while that has a strong pull for many broadly empiricist metaphysicians and philosophers of science, it cannot be thought to have as much direct intuitive support as the BCP itself (indeed, Bigelow et al. are explicit in using BCP to argue against Humeanism). Moreover, some versions of the BCP, like Eagle's, are not as obviously inconsistent with Humeanism about chance as the original Bigelow et al. version. Finally, Schaffer (2003: §4) has argued that, despite appearances, there is a conception of chance according to which the BCP and a broadly Humean account of chance are together consistent (and moreover consistent with the PP). So this objection isn't conclusive. But the fact that there is an objection to be found in this direction at all relies on a further view about chance, discussed in the next section.
The third constraint is that the chance of some outcome should be approximately equal to the actual frequencies of similar outcomes in all similar circumstances. This vague constraint has metaphysical and epistemological readings. It may be understood as proposing that the value of chances be systematically related to the values of the frequencies, or merely as proposing that evidence about the values of the chances is provided by evidence about the value of the frequencies (and vice versa).
On the metaphysical side, it is easy to come to share the belief that chance and frequency should be close. The attraction of the view is obvious, for it proposes in effect to reduce chance to occurrent categorical facts like the values of actual frequencies and other Humean magnitudes. But it is difficult to formulate the reductionist claim precisely. It could be formulated as a supervenience thesis, so that no two worlds could differ in the chance they assign to some outcome unless they also differed in the actual frequencies of similar outcomes. But such a principle seems open to the objection that two different but close chance functions could easily result in the same pattern of outcomes, particularly if there were relatively few relevant outcomes. (The converse supervenience thesis is subject to similar worries, since two worlds could easily differ in their outcomes while sharing their chances.) And this supervenience thesis is relatively weak—any stronger connection between chances and frequencies, such as that proposed by frequentists of various sorts (von Mises, 1957; Reichenbach, 1949), is subject to these objections (among others: Jeffrey 1977; Hájek 1997). Perhaps the most promising reductionist view about chance is Lewis' best system analysis of probabilistic laws (Lewis, 1994; Loewer, 2004). Lewis suggests that the axioms of the best systematisation of the actual facts, including the frequency facts, deserve to be called the laws of nature. In some cases, the best system will include probabilistic axioms. In this case, Lewis (1994: 480) concludes: ‘So now we can analyse chance: the chances are what the probabilistic laws of the best system say they are.’ (A related account is offered by Hoefer 2007.) Yet on this view, the relationship between chance and frequency is not at all straightforward. While Lewis does offer a suggestive and persuasive vision for giving a reductive account of chance, it nevertheless remains true that ‘no reductionist has in fact ever provided an exact recipe that would show how categorical facts fix the facts about objective chance’ (Hall 2004: 111).
But we may endorse the epistemological connection between chance and frequency without making any decision about the prospects of reductionism concerning the metaphysics of chance. For whether chances depend on frequencies or not, it is a fundamental principle of scientific and statistical inference that frequencies are good evidence for chances, and that chances are good evidence for frequencies. The latter claim follows quickly from the fact that chances, if they exist, are probabilities, and the fact that probabilities in many common cases with independent trials obey the Law of Large Numbers (Sinai 1992: 21). But the inference from frequencies to chances is more subtle. It is apparently required: If frequencies don't constrain the chances, then any current opinion about the present chances may be rationally maintained in the light of any future evidence whatsoever, and yet this is something we would find quite irrational. A view about chance which does not include this connection will have a very difficult time explaining why the principles of direct and inverse inference (i.e., inference from actual statistics of outcomes to chance, and vice versa) in statistical hypothesis testing work at all (Hacking, 1965; Ismael, 1996; Eells, 1983). And yet they work very well, as the spectacular empirical success of statistically confirmed theories like quantum physics shows.
The most promising way of implementing the epistemic constraint from frequency to chance is to ground it in the Principal Principle, as Lewis suggests (1980: 106–8)—see also Levi 1980: ch. 12 and Howson and Urbach 1993: 342–7. He argues that hypotheses about chances are as amenable to differential confirmation by evidence, including frequency evidence, as any other propositions about which we have credences. If the hypotheses about chance make predictions about the observed frequencies, in line with the law of large numbers, then the observed frequencies will constrain our posterior credence in the chance hypotheses, in line with the tenets of Bayesian confirmation theory (see the entry on Bayesian epistemology, Talbott 2008). If the PP has this fundamental role, justifying this apparently independent constraint, then it is particularly pressing for a metaphysical account of chance to make a connection between chance and frequency tenable. And here reductionists like Lewis, Hoefer, and Loewer make their stand:
Be my guest—posit all the primitive unHumean whatnots you like. … But play fair in naming your whatnots. Don't call any alleged feature of reality ‘chance’ unless you've already shown that you have something, knowledge of which could constrain rational credence. I think I see, dimly but well enough, how knowledge of frequencies and symmetries and best systems could constrain rational credence. I don't begin to see, for instance, how knowledge that two universals stand in a certain special relation N* could constrain rational credence about the future coinstantiation of those universals. (Lewis 1994:484)
But Hall (2004) argues that anti-reductionists, who propose that chances are independent fundamental features of reality, can equally well explain the PP by taking it to be an analytic truth.
Reductionists and anti-reductionists about chance alike will admit that, while frequencies can be evidence for chances, not all frequencies are up to the job. Frequentists acknowledged this: only collections of outcomes with similar generating conditions provide frequencies which are useful for calculating the chance of a future similar outcome. Frequencies would be worse than useless if ‘we couldn't distinguish natural from gerrymandered kinds; again, we could get the analysis to yield almost any answer we liked. But we can distinguish. (If we could not, puzzles about chance would be the least of our worries.)’ (Lewis 1994: 477). Giving a precise answer to the question, which classes of events are appropriately natural and non-gerrymandered? is difficult—it is the famous reference class problem for frequentism (see also the discussion in the main entry at §4.2). But as Hájek (2007) has argued, and the quote from Lewis implicitly makes clear, this problem will face any view about chance whatever, and isn't particularly a problem for frequentism, or about chance, being rather the old problem of the existence of natural classes. Early frequentists simply assumed, as is now also widely believed, that it makes sense to invoke such uniform classes of events. Von Mises is explicit:
In games of chance, in the problems of insurance, and in the molecular processes we find events repeating themselves again and again. They are mass phenomena or repetitive events. … The rational concept of probability, which is the only basis of probability calculus, applies only to problems in which either the same event repeats itself again and again, or a great number of uniform elements are involved at the same time. … It is essential for the theory of probability that experience has shown that in the game of dice, as in all the other mass phenomena which we have mentioned, the relative frequencies of certain attributes become more and more stable as the number of observations is increased. (von Mises 1957:10–2)
Set aside the dubious contention that probability requires the existence of ‘mass phenomena’ (this would rule out as conceptually incoherent the perfectly legitimate idea that there might be a chance of an event which just happens to possess no similar counterpart events). The key insight is that usable (stable) frequencies are only found in mass phenomena, where the mass phenomena are explicitly defined so as to require ‘repetitive’ events, and hence cannot be an arbitrary and gerrymandered collection of outcomes.
This requirement on usable frequencies, that they come from repeated trials of the same experiment, points us towards another constraint on theories of chance, that chances should depend on the properties of the chance setup. The chance of a single outcome might well be measured by the frequencies in similar trials, but the connection between the trial of interest and the other trials is not merely incidental—what makes those trials evidentially relevant is the fact that they share underlying physical similarities.
One way of capturing this idea is at least that duplicate trials, precisely similar in all respects, in the same world (and thus subject to the same laws of nature) should have the same outcome chances. This is roughly the ‘stable trial principle’ as defended by Schaffer (2003: 37) (later slightly varied as the ‘intrinsicness requirement’ in Schaffer 2007: 125). Pretty much all conceptions of chance, reductionist and non-reductionist alike, respect this constraint, it has been argued:
[any reductionist] recipe for how total history determines chances should be sensitive to basic symmetries of time and space—so that if, for example, two processes going on in different regions of spacetime are exactly alike, your recipe assigns to their outcomes the same single-case chances. (It is not that a non-reductionist will have no place for such a constraint: it is just that she will likely not view it as a substantive metaphysical thesis about chance, but as a substantive methodological thesis about how we should, in doing science, theorize about chance.) (Arntzenius and Hall 2003: 178)
Even quite exotic views about chance respect the stable trial principle. Consider the view that robust objective chances are unnecessary, and can be replaced entirely by credences with certain formal properties (a view most closely associated with de Finetti's famous proof (1964) that exchangeable—invariant under permutations of order—credences about the outcomes of many individual trials will behave as if there was a real unknown chance guiding the agent's overall credence in the pattern of those outcomes). This theory of ‘chance’, minimal as it is, nevertheless seems to respect something like the stable trial principle, because exchangeable credences demand the same credences in cases that are (credal) duplicates (Skyrms, 1980: 158–60). (Though see Howson and Urbach (1993: 349–51) for some doubts about whether de Finetti's argument really makes genuine chance dispensable.) And certainly views of chance according to which chance is a real objective phenomena, grounded in the chance setup of a mass phenomena, will endorse the stable trial principle. Indeed, many such views will even endorse stronger principles, such as that the chances supervene on the physical properties of the trial device alone, or that the chance depends on some particular dispositional property of the chance setup (as in propensity theories). But the controversies that surround propensities (Eagle 2004), and envelop these stronger claims, don't significantly undermine the original stable trial principle, and the original intuition that the underlying physical process that generates a chancy outcome is of primary importance for grounding the value of a chance, under a given set of laws.