|This is a file in the archives of the Stanford Encyclopedia of Philosophy.|
how to cite
Stanford Encyclopedia of Philosophy
Bayesian epistemology did not emerge as a philosophical program until the first formal axiomatizations of probability theory in the first half of the 20th century. One important application of Bayesian epistemology has been to the analysis of scientific practice in Bayesian Confirmation Theory. In addition, a major branch of statistics, Bayesian statistics, is based on Bayesian principles. In psychology, an important branch of learning theory, Bayesian learning theory, is also based on Bayesian principles. Finally, the idea of analyzing rational degrees of belief in terms of rational betting behavior led to the 20th century development of a new kind of decision theory, Bayesian decision theory, which is now the dominant theoretical model for the both the descriptive and normative analysis of decisions. The combination of its precise formal apparatus and its novel pragmatic self-defeat test for justification makes Bayesian epistemology one of the most important developments in epistemology in the 20th century, and one of the most promising avenues for further progress in epistemology in the 21st century.
Bayesians propose additional standards of synchronic coherence -- standards of probabilistic coherence -- and additional rules of inference -- probabilistic rules of inference -- in both cases, to apply not to beliefs, but degrees of belief (degrees of confidence). For Bayesians, the most important standards of probabilistic coherence are the laws of probability. For more on the laws of probability, see the following supplementary article:
Supplement on Probability LawsFor Bayesians, the most important probabilistic rule of inference is given by a principle of conditionalization.
Conditional Probability:By itself, the definition of conditional probability is of little epistemological significance. It acquires epistemological significance only in conjunction with a further epistemological assumption:
P(S/T) = P(S&T)/P(T).
Simple Principle of Conditionalization:In epistemological terms, this Simple Principle of Conditionalization requires that the effects of evidence on rational degrees be analyzed in two stages: The first is non-inferential. It is the change in the probability of the evidence statement E from Pi(E), assumed to be greater than zero and less than one, to Pf(E) = 1. The second is a probabilistic inference of conditionalizing on E from initial probabilities (e.g., Pi(S)) to final probabilities (e.g., Pf(S) = Pi(S/E)).
If one begins with initial or prior probabilities Pi, and one acquires new evidence which can be represented as becoming certain of an evidentiary statement E (assumed to state the totality of one's new evidence and to have initial probability greater than zero), then rationality requires that one systematically transform one's initial probabilities to generate final or posterior probabilities Pf by conditionalizing on E -- that is: Where S is any statement, Pf(S) = Pi(S/E).
Problems with the Simple Principle (to be discussed below) have led many Bayesians to qualify the Simple Principle by limiting its scope. In addition, some Bayesians follow Jeffrey in generalizing the Simple Principle to apply to cases in which one's new evidence is less than certain (also discussed below). What unifies Bayesian epistemology is a conviction that conditionalizing (perhaps of a generalized sort) is rationally required in some important contexts -- that is, that some sort of conditionalization principle is an important principle governing rational changes in degrees of belief.
A Dutch Book Argument relies on some descriptive or normative assumptions to connect degrees of belief with willingness to wager -- for example, a person with degree of belief p in sentence S is assumed to be willing to pay up to and including $p for a unit wager on S (i.e., a wager that pays $1 if S is true) and is willing to sell such a wager for any price equal to or greater than $p (one is assumed to be equally willing to buy or sell such a wager when the price is exactly $p). A Dutch Book is a combination of wagers which, on the basis of deductive logic alone, can be shown to entail a sure loss. A synchronic Dutch Book is a Dutch Book combination of wagers that one would accept all at the same time. A diachronic Dutch Book is a Dutch Book combination of wagers that one will be motivated to enter into at different times.
Ramsey and de Finetti first employed synchronic Dutch Book Arguments in support of the probability laws as standards of synchronic coherence for degrees of belief. The first diachronic Dutch Book Argument in support of a principle of conditionalization was reported by Teller, who credited David Lewis. The Lewis/Teller argument depends on a further descriptive or normative assumption about conditional probabilities due to de Finetti: An agent with conditional probability P(S/T) = p is assumed to be willing to pay any price up to and including $p for a unit wager on S conditional on T. (A unit wager on S conditional on T is one that is called off, with the purchase price returned to the purchaser, if T is not true. If T is true, the wager is not called off and the wager pays $1 if S is also true.) On this interpretation of conditional probabilities, Lewis, as reported by Teller, was able to show how to construct a diachronic Dutch Book against anyone who, on learning only that T, would predictably change his/her degree of belief in S to Pf(S) > Pi(S/T); and how to construct a diachronic Dutch Book against anyone who, on learning only that T, would predictably change his/her degree of belief in S to Pf(S) < Pi(S/T). For illustrations of the strategy of the Ramsey/de Finetti and the Lewis/Teller arguments, see the following supplementary article:
Supplement on Dutch Book ArgumentsThere has been much discussion of exactly what it is that Dutch Book Arguments are supposed to show. On the literal-minded interpretation, their significance is that they show that those whose degrees of belief violate the probability laws or those whose probabilistic inferences predictably violate a principle of conditionalization are liable to enter into wagers on which they are sure to lose. There is very little to be said for the literal-minded interpretation, because there is no basis for claiming that rationality requires that one be willing to wager in accordance with the behavioral assumptions described above. An agent could simply refuse to accept Dutch Book combinations of wagers.
A more plausible interpretation of Dutch Book Arguments is that they are to be understood hypothetically, as symptomatic of what has been termed pragmatic self-defeat. On this interpretation, Dutch Book Arguments are a kind of heuristic for determining when one's degrees of belief have the potential to be pragmatically self-defeating. The problem is not that one who violates the Bayesian constraints is likely to enter into a combination of wagers that constitute a Dutch Book, but that, on any reasonable way of translating one's degrees of belief into action, there is a potential for one's degrees of belief to motivate one to act in ways that make things worse than they might have been, when, as a matter of logic alone, it can be determined that alternative actions would have made things better (on one's own evaluations of better and worse).
Another way of understanding the problem of susceptibility to a Dutch Book is due to Ramsey: Someone who is susceptible to a Dutch Book evaluates identical bets differently based on how they are described. Putting it this way makes susceptibility to Dutch Books sound irrational. But this standard of rationality would make it irrational not to recognize all the logical consequences of what one believes. This is the assumption of logical omniscience (discussed below).
If successful, Dutch Book Arguments would reduce the justification of the principles of Bayesian epistemology to two elements: (1) an account of the appropriate relationship between degrees of belief and choice; and (2) the laws of deductive logic. Because it would seem that the truth about the appropriate relationship between the degrees of belief and choice is independent of epistemology, Dutch Book Arguments hold out the potential of justifying the principles of Bayesian epistemology in a way that requires no other epistemological resources than the laws of deductive logic. For this reason, it makes sense to think of Dutch Book Arguments as indirect, pragmatic arguments for according the principles of Bayesian epistemology much the same epistemological status as the laws of deductive logic. Dutch Book Arguments are a truly distinctive contribution made by Bayesians to the methodology of epistemology.
It should also be mentioned that some Bayesians have defended their principles more directly, with non-pragmatic arguments. In addition to reporting Lewis's Dutch Book Argument, Teller offers a non-pragmatic defense of Conditionalization. There have been many proposed non-pragmatic defenses of the probability laws, the most compelling of which is due to Joyce. All such defenses, whether pragmatic or non-pragmatic, produce a puzzle for Bayesian epistemology: The principles of Bayesian epistemology are typically proposed as principles of inductive reasoning. But if the principles of Bayesian epistemology depend ultimately for their justification solely on the laws of deductive logic, what reason is there to think that they have any inductive content? That is to say, what reason is there to believe that they do anything more than extend the laws of deductive logic from beliefs to degrees of belief? It should be mentioned, however, that even if Bayesian epistemology only extended the laws of deductive logic to degrees of belief, that alone would represent an extremely important advance in epistemology.
Bayes' Theorem:The epistemological significance of Bayes' Theorem is that it provides a straightforward corollary to the Simple Principle of Conditionalization. Where the final probability of a hypothesis H is generated by conditionalizing on evidence E, Bayes' Theorem provides a formula for the final probability of H in terms of the prior or initial likelihood of H on E (Pi(E/H)) and the prior or initial probabilities of H and E:
P(S/T) = P(T/S) × P(S)/P(T) [where P(T) is assumed to be greater than zero]
Corollary of the Simple Principle of Conditionalization:
Pf(H) = Pi(H/E) = Pi(E/H) × Pi(H)/Pi(E).
Due to the influence of Bayesianism, likelihood is now a technical term of art in confirmation theory. As used in this technical sense, likelihoods can be very useful. Often, when the conditional probability of H on E is in doubt, the likelihood of H on E can be computed from the theoretical assumptions of H.
B. Confirmation and disconfirmation by entailment. Whenever a hypothesis H logically entails evidence E, E confirms H. This follows from the fact that to determine the truth of E is to rule out a possibility assumed to have non-zero prior probability that is incompatible with H -- the possibility that ~E. A corollary is that, where H entails E, ~E would disconfirm H, by reducing its probability to zero. The most influential model of explanation in science is the hypothetico-deductive model (e.g., Hempel). Thus, one of the most important sources of support for Bayesian Confirmation Theory is that it can explain the role of hypothetico-deductive explanation in confirmation.
C. Confirmation of logical equivalents. If two hypotheses H1 and H2 are logically equivalent, then evidence E will confirm both equally. This follows from the fact that logically equivalent statements always are assigned the same probability.
D. The confirmatory effect of surprising or diverse evidence. From the corollary above, it follows that whether E confirms (or disconfirms) H depends on whether E is more probable (or less probable) conditional on H than it is unconditionally -- that is, on whether:
(b1) P(E/H)/P(E) > 1.An intuitive way of understanding (b1) is to say that it states that E would be more expected (or less surprising) if it were known that H were true. So if E is surprising, but would not be surprising if we knew H were true, then E will significantly confirm H. Thus, Bayesians explain the tendency of surprising evidence to confirm hypotheses on which the evidence would be expected.
Similarly, because it is reasonable to think that evidence E1 makes other evidence of the same kind much more probable, after E1 has been determined to be true, other evidence of the same kind E2 will generally not confirm hypothesis H as much as other diverse evidence E3, even if H is equally likely on both E2 and E3. The explanation is that where E1 makes E2 much more probable than E3 (Pi(E2/E1) >> Pi(E3/E1), there is less potential for the discovery that E2 is true to raise the probability of H than there is for the discovery that E3 is true to do so.
E. Relative confirmation and likelihood ratios. Often it is important to be able to compare the effect of evidence E on two competing hypotheses, Hj and Hk, without having also to consider its effect on other hypotheses that may not be so easy to formulate or to compare with Hj and Hk. From the first corollary above, the ratio of the final probabilities of Hj and Hk would be given by:
Ratio Formula:If the odds of Hj relative to Hk are defined as ratio of their probabilities, then from the Ratio Formula it follows that, in a case in which change in degrees of belief results from conditionalizing on E, the final odds (Pf(Hj)/Pf(Hk)) result from multiplying the initial odds (Pi(Hj)/Pi(Hk)) by the likelihood ratio (Pi(E/Hj)/Pi(E/Hk)). Thus, in pairwise comparisons of the odds of hypotheses, the likelihood ratio is the crucial determinant of the effect of the evidence on the odds.
Pf(Hj)/Pf(Hk) = [Pi(E/Hj) × Pi(Hj)]/[Pi(E/Hk) × Pi(Hk)]
F. The typical differential effect of positive evidence and negative evidence. Hempel first pointed out that we typically expect the hypothesis that all ravens are black to be confirmed to some degree by the observation of a black raven, but not by the observation of a non-black, non-raven. Let H be the hypothesis that all ravens are black. Let E1 describe the observation of a non-black, non-raven. Let E2 describe the observation of a black raven. Bayesian Confirmation Theory actually holds that both E1 and E2 may provide some confirmation for H. Recall that E1 supports H just in case Pi(E1/H)/Pi(E1) > 1. It is plausible to think that this ratio is ever so slightly greater than one. On the other hand, E2 would seem to provide much greater confirmation to H, because, in this example, it would be expected that Pi(E2/H)/Pi(E2) >> Pi(E1/H)/Pi(E1).
These are only a sample of the results that have provided support for Bayesian Confirmation Theory as a theory of rational inference for science. For further examples, see Howson and Urbach. It should also be mentioned that an important branch of statistics, Bayesian statistics is based on the principles of Bayesian epistemology.
B. The problem of the priors. Are there constraints on prior probabilities other than the probability laws? Consider Goodman's "new riddle of induction": In the past all observed emeralds have been green. Do those observations provide any more support for the generalization that all emeralds are green than they do for the generalization that all emeralds are grue (green if observed before now; blue if observed later); or do they provide any more support for the prediction that the next emerald observed will be green than for the prediction that the next emerald observed will be grue (i.e., blue)? This question divides Bayesians into two categories:
(a) Objective Bayesians (e.g., Rosenkrantz) hold that there are rational constraints on prior probabilities that require that observations support the green-generalization and the green-prediction much more strongly than the grue-generalization and the grue-prediction. Objective Bayesians are the intellectual heirs of the advocates of a Principle of Indifference for probability. Rosenkrantz builds his account on the maximum entropy rule proposed by E.T. Jaynes. The difficulties in formulating an acceptable Principle of Indifference have led most Bayesians to abandon Objective Bayesianism.Subjective Bayesians believe that their position is not objectionably subjective, because of results (e.g., Doob or Gaifman and Snir) proving that even subjects beginning with very different prior probabilities will tend to converge in their final probabilities, given a suitably long series of shared observations. These convergence results are not completely reassuring, however, because they only apply to agents who already have significant agreement in their priors and they do not assure convergence in any reasonable amount of time. Also, they typically only guarantee convergence on the probability of predictions, not on the probability of theoretical hypotheses. For example, Carnap favored prior probabilities that would never raise above zero the probability of a generalization over a potentially infinite number of instances (e.g., that all crows are black), no matter how many observations of positive instances (e.g., black crows) one might make without finding any negative instances (i.e., non-black crows). In addition, the convergence results depend on the assumption that the only changes in probabilities that occur are those that are the non-inferential results of observation on evidential statements and those that result from conditionalization on such evidential statements.
(b) Subjective Bayesians (e.g., de Finetti) do not believe that rationality alone places enough constraints on one's prior probabilities to make them objective. For Subjective Bayesians, it is up to our own free choice or to evolution or to socialization or some other non-rational process to determine one's prior probabilities. Rationality only requires that the prior probabilities satisfy relatively modest synchronic coherence conditions.
Because of the problem of the priors, it is an open question whether Bayesian Confirmation Theory has inductive content, or whether it merely translates the framework for rational belief provided by deductive logic into a corresponding framework for rational degrees of belief.
Principle of Jeffrey Conditionalization:
Pf(H) = Pi(H/E) × Pf(E) + Pi(H/~E) × Pf(~E) [where E and H are both assumed to have prior probabilities between zero and one]
Counting in favor of Jeffrey's Principle is its theoretical elegance. Counting against it is the practical problem that it requires that one be able to completely specify the direct non-inferential effects of an observation, something it is doubtful that anyone has ever done. Skyrms has given it a Dutch Book defense.
B. The problem of old evidence. On a Bayesian account, the effect of evidence E in confirming (or disconfirming) a hypothesis is solely a function of the increase in probability that accrues to E when it is first determined to be true. This raises the following puzzle for Bayesian Confirmation Theory discussed extensively by Glymour: Suppose that E is an evidentiary statement that has been known for some time -- that is, that it is old evidence; and suppose that H is a scientific theory that has been under consideration for some time. One day it is discovered that H implies E. In scientific practice, the discovery that H implied E would typically be taken to provide some degree of confirmatory support for H. But Bayesian Confirmation Theory seems unable to explain how a previously known evidentiary statement E could provide any new support for H. For conditionalization to come into play, there must be a change in the probability of the evidence statement E. Where E is old evidence, there is no change in its probability. Some Bayesians who have tried to solve this problem (e.g., Garber) have typically tried to weaken the logical omniscience assumption to allow for the possibility of discovering logical relations (e.g., that H and suitable auxiliary assumptions imply E). As mentioned above, relaxing the logical omniscience assumption threatens to block the derivation of almost all of the important results in Bayesian epistemology, so there is no general agreement among Bayesians on how to solve this problem. Other Bayesians (e.g., Lange) employ the Bayesian formalism as a tool in the rational reconstruction of the evidentiary support for a scientific hypothesis, where it is irrelevant to the rational reconstruction whether the evidence was discovered before or after the theory was initially formulated.
C. The problem of rigid conditional probabilities. When one conditionalizes, one applies the initial conditional probabilities to determine final unconditional probabilities. Throughout, the conditional probabilities themselves do not change; they remain rigid. Examples of the Problem of Old Evidence are but one of a variety of cases in which it seems that it can be rational to change one's initial conditional probabilities. Thus, many Bayesians reject the Simple Principle of Conditionalization in favor of a qualified principle, limited to situations in which one does not change one's initial conditional probabilities. There is no generally accepted account of when it is rational to maintain rigid initial conditional probabilities and when it is not.
D. The problem of prediction vs. accommodation. Related to the problem of Old Evidence is the following potential problem: Consider two different scenarios. In the first, theory H was developed in part to accommodate (i.e., to imply) some previously known evidence E. In the second, theory H was developed at a time when E was not known. It was because E was derived as a prediction from H that a test was performed and E was found to be true. It seems that E's being true would provide a greater degree of confirmation for H if the truth of E had been predicted by H than if H had been developed to accommodate the truth of E. There is no general agreement among Bayesians about how to resolve this problem. Some (e.g., Horwich) argue that Bayesianism implies that there is no important difference between prediction and accommodation, and try to defend that implication. Others (e.g., Maher) argue that there is a way to understand Bayesianism so as to explain why there is an important difference between prediction and accommodation.
E. The problem of new theories. Suppose that there is one theory H1 that is generally regarded as highly confirmed by the available evidence E. It is possible that simply the introduction of an alternative theory H2 can lead to an erosion of H1's support. It is plausible to think that Copernicus' introduction of the heliocentric hypothesis had this effect on the previously unchallenged Ptolemaic earth-centered astronomy. This sort of change cannot be explained by conditionalization. It is for this reason that many Bayesians prefer to focus on probability ratios of hypotheses (see the Ratio Formula above), rather than their absolute probability; but it is clear that the introduction of a new theory could also alter the probability ratio of two hypotheses -- for example, if it implied one of them as a special case.
A. Other principles of synchronic coherence. Are the probability laws the only standards of synchronic coherence for degrees of belief? Van Fraassen has proposed an additional principle (Reflection or Special Reflection), which he now regards as a special case of an even more general principle (General Reflection).
B. Other probabilistic rules of inference. There seem to be at least two different concepts of probability: the probability that is involved in degrees of belief (epistemic or subjective probability) and the probability that is involved in random events, such as the tossing of a coin (chance). De Finetti thought this was a mistake and that there was only one kind of probability, subjective probability. For Bayesians who believe in both kinds of probability, an important question is: What is (or should be) the relation between them? The answer can be found in the various proposals for principles of direct inference in the literature. Typically, principles of direct inference are proposed as principles for inferring subjective or epistemic probabilities from beliefs about objective chance (e.g., Pollock). Lewis reverses the direction of inference, and proposes to infer beliefs about objective chance from subjective or epistemic probabilities, via his (Reformulated) Principal Principle.
C. Principles of rational acceptance. What is the relation between beliefs and degrees of belief? Jeffrey proposes to give up the notion of belief (at least for empirical statements) and make do with only degrees of belief. Other authors (e.g., Levi, Maher, Kaplan) propose principles of rational acceptance as part of accounts of when it is rational to accept a statement as true, not merely to regard it as probable.
Table of Contents
First published: July 12, 2001
Content last modified: July 12, 2001