Stanford Encyclopedia of Philosophy
This is a file in the archives of the Stanford Encyclopedia of Philosophy.

The Problem of Induction

First published Wed Nov 15, 2006; substantive revision Wed Oct 24, 2012

The original problem of induction can be simply put. It concerns the support or justification of inductive methods; methods that predict or infer, in Hume's words, that “instances of which we have had no experience resemble those of which we have had experience” (THN, 89). Such methods are clearly essential in scientific reasoning as well as in the conduct of our everyday affairs. The problem is how to support or justify them and it leads to a dilemma: the principle cannot be proved deductively, for it is contingent, and only necessary truths can be proved deductively. Nor can it be supported inductively—by arguing that it has always or usually been reliable in the past—for that would beg the question by assuming just what is to be proved.

A century after Hume first put the problem, and argued that it is insoluble, J. S. Mill gave a more specific formulation of an important class of inductive problems: “Why,” he wrote, “is a single instance, in some cases, sufficient for a complete induction, while in others myriads of concurring instances, without a single exception known or presumed, go such a little way towards establishing an universal proposition?” (Mill 1843, Bk III, Ch. III). (Compare: (i) Everyone seated on the bus is moving northward. (ii) Everyone seated on the bus was born on a prime numbered day of the month.)

In recent times inductive methods have fissioned and multiplied, to an extent that attempting to define induction would be more difficult than rewarding. It is however instructive to contrast induction with deduction: Deductive logic, at least as concerns first-order logic, is demonstrably complete. The premises of an argument constructed according to the rules of this logic imply the argument's conclusion. Not so for induction: There is no comprehensive theory of sound induction, no set of agreed upon rules that license good or sound inductive inference, nor is there a serious prospect of such a theory. Further, induction differs from deductive proof or demonstration (in first-order logic, at least) not only in induction's failure to preserve truth (true premises may lead inductively to false conclusions) but also in failing of monotonicity: adding true premises to a sound induction may make it unsound.

The characterization of good or sound inductions might be called the characterization problem: What distinguishes good from bad inductions? The question seems to have no rewarding general answer, but there are nevertheless interesting partial characterizations, some of which are explored in this entry.

1. The contemporary notion of induction

The Oxford English Dictionary (OED Online, accessed October 20, 2012) defines “induction,” in the sense relevant here, as

7. The process of inferring a general law or principle from the observation of particular instances (opposed to deduction n., q.v.)

That induction is opposed to deduction is not quite right, and the rest of the definition is outdated and too narrow: much of what contemporary epistemology, logic, and the philosophy of science count as induction infers neither from observation nor particulars and does not lead to general laws or principles. This is not to denigrate the leading authority on English vocabulary—until the middle of the previous century induction was understood to be what we now know as enumerative induction or universal inference; inference from particular inferences:

a1, a2, …, an are all Fs that are also G,

to a general law or principle

All Fs are G

The problem of induction was, until recently, taken to be to justify this form of inference; to show that the truth of the premise supported, if it did not entail, the truth of the conclusion. The evolution and generalization of this question—the traditional problem has become a special case—is discussed in some detail below.

A few simple counterexamples to the OED definition may suggest the increased breadth of the contemporary notion:

  1. There are (good) inductions with general premises and particular conclusions:
    All observed emeralds have been green.
    Therefore, the next emerald to be observed will be green.
  2. There are valid deductions with particular premises and general conclusions:
    New York is east of the Mississippi.
    Delaware is east of the Mississippi.
    Therefore, everything that is either New York or Delaware is east of the Mississippi.

Further, on at least one serious view, due in differing variations to Mill and Carnap, induction has not to do with generality at all; its primary form is the singular predictive inference—the second form of enumerative induction mentioned above—which leads from particular premises to a particular conclusion. The inference to generality is a dispensable middle step.

Although inductive inference is not easily characterized, we do have a clear mark of induction. Inductive inferences are contingent, deductive inferences are necessary. Deductive inference can never support contingent judgments such as meteorological forecasts, nor can deduction alone explain the breakdown of one's car, discover the genotype of a new virus, or reconstruct fourteenth century trade routes. Inductive inference can do these things more or less successfully because, in Peirce's phrase, inductions are ampliative. Induction can amplify and generalize our experience, broaden and deepen our empirical knowledge. Deduction on the other hand is explicative. Deduction orders and rearranges our knowledge without adding to its content.

Of course, the contingent power of induction brings with it the risk of error. Even the best inductive methods applied to all available evidence may get it wrong; good inductions may lead from true premises to false conclusions. (A competent but erroneous diagnosis of a rare disease, a sound but false forecast of summer sunshine in the desert.) An appreciation of this principle is a signal feature of the shift from the traditional to the contemporary problem of induction.

How to tell good inductions from bad deductions? That question is a simple formulation of the problem of induction. In its general form it clearly has no substantive answer, but its instances can yield modest and useful questions. Some of these questions, and proposed answers to them, are surveyed in what follows.

Some authorities, Carnap in the opening paragraph of The Continuum of Inductive Methods (1952) is an example, take inductive inference to include all non-deductive inference. That may be a bit too inclusive; perception and memory are clearly ampliative but their exercise seems not to be congruent with what we know of induction, and the present article is not concerned with them. The scope of the contemporary concept is charted in the taxonomy in section 3.2 below.

Testimony is another matter. Although testimony is not a form of induction, induction would be all but paralyzed were it not nourished by testimony. Scientific inductions depend upon data transmitted and supported by testimony and even our everyday inductive inferences typically rest upon premises that come to us indirectly.

2. Hume on induction

The source for the problem of induction as we know it is Hume's brief argument in Book I, Part III, section VI of the Treatise (THN). The great historical importance of this argument, not to speak of its intrinsic power, recommends that reflection on the problem begin with a rehearsal of it.

First a note on vocabulary. The term ‘induction’ does not appear in Hume's argument, nor anywhere in the Treatise or the first Inquiry, for that matter. Hume's concern is with inferences concerning causal connections, which, on his account are the only connections “which can lead us beyond the immediate impressions of our memory and senses” (THN, 89). But the difference between such inferences and what we know today as induction, allowing for the increased complexity of the contemporary notion, is largely a matter of terminology.

Secondly, Hume divides all reasoning into demonstrative, by which he means deductive, and probabilistic, by which he means the generalization of causal reasoning. The deductive system that Hume had at hand was just the weak and complex theory of ideas in force at the time, augmented by syllogistic logic (THN, Book I, Part III, Section I for example). His ‘demonstrations’ rather than structured deductions are often founded on the principle that conceivable connections are possible, inconceivable connections impossible, and necessary connections those the denials of which are impossible or inconceivable. That said, and though we should today allow contingent connections that are neither probabilistic nor causal, there are few points at which the distinction is not clear.

It should also be remarked that Hume's argument applies just to what is known today as enumerative induction, based on instances, and primarily to singular predictive inference (including ‘predictions’ about the present or past; see section 3.2 below for a taxonomy of inductive inference) but, again, its generalization to other forms of inductive reasoning is straightforward. In what follows we paraphrase and interpolate freely so as to ease the application of the argument in contemporary contexts.

The argument should be seen against the background of Hume's project as he announces it in the introduction to the Treatise: This project is the development of the empirical science of human nature. The epistemological sector of this science involves describing the operations of the mind, the interactions of impressions and ideas and the function of the liveliness that constitutes belief. But this cannot be a merely descriptive endeavor; accurate description of these operations entails also a considerable normative component, for, as Hume puts it,

[o]ur reason [to be taken here quite generally, to include the imagination] must be consider'd as a kind of cause, of which truth is the natural effect; but such-a-one as by the irruption of other causes, and by the inconstancy of our mental powers, may frequently be prevented. (Hume THN, 180)

The account must thus not merely describe what goes on in the mind, it must also do this in such a way as to show that and how these mental activities lead naturally, if with frequent exceptions, to true belief (see Loeb 2006 for further discussion of these questions).

Now as concerns the argument, its conclusion is that in induction (causal inference) experience does not produce the idea of an effect from an impression of its cause by means of the understanding or reason, but by the imagination, by “a certain association and relation of perceptions.” The center of the argument is a dilemma: If inductive conclusions were produced by the understanding, inductive reasoning would be based upon the premise that nature is uniform;

that instances of which we have had no experience, must resemble those of which we have had experience, and that the course of nature continues always uniformly the same. (THN, 89)

And were this premise to be established by reasoning, that reasoning would be either deductive or probabilistic (i.e., causal). The principle can't be proved deductively, for whatever can be proved deductively is a necessary truth, and the principle is not necessary; its antecedent is consistent with the denial of its consequent. Nor can the principle be proved by causal reasoning, for it is presupposed by all such reasoning and any such proof would be a petitio principii.

The normative component of Hume's project is striking here: That the principle of uniformity of nature cannot be proved deductively or inductively shows that it is not the principle that drives our causal reasoning only if our causal reasoning is sound and leads to true conclusions as a “natural effect” of belief in true premises. This is what licenses the capsule description of the argument as showing that induction cannot be justified or licensed either deductively or inductively; not deductively because (non-trivial) inductions do not express logically necessary connections, not inductively because that would be circular. If, however, causal reasoning were fallacious, the principle of the uniformity of nature might well be among its principles.

The negative argument is an essential first step in Hume's general account of induction. It rules out accounts of induction that view it as the work of reason. Hume's positive account begins from another dilemma, a constructive dilemma this time: Inductive inference must be the work either of reason or of imagination. Since the negative argument shows that it cannot be a species of reasoning, it must be imaginative.

Hume's positive account of causal inference can be simply described: It amounts to embedding the singular form of enumerative induction in the nature of human, and at least some bestial, thought. The several definitions offered in Enquiries concerning Human Understanding and concerning the Principles of Morals (EHU, 60) make this explicit:

[W]e may define a cause to be an object, followed by another, and where all objects similar to the first are followed by objects similar to the second. Or, in other words, where, if the first object had not been, the second never had existed.

Another definition defines a cause to be:

an object followed by another, and whose appearance always conveys the thought to that other.

If we have observed many Fs to be followed by Gs, and no contrary instances, then observing a new F will lead us to anticipate that it will also be a G. That is causal inference.

It is clear, says Hume, that we do make inductive, or, in his terms, causal, inferences; that having observed many Fs to be Gs, observation of a new instance of an F leads us to believe that the newly observed F is also a G. It is equally clear that the epistemic force of this inference, what Hume calls the necessary connection between the premises and the conclusion, does not reside in the premises alone:

All observed Fs have also been Gs,


a is an F,

do not imply

a is a G.

It is false that “instances of which we have had no experience must resemble those of which we have had experience” (EHU, 89).

Hume's positive view is that the experience of constant conjunction fosters a “habit of the mind” that leads us to anticipate the conclusion on the occasion of a new instance of the second premise. The force of induction, the force that drives the inference, is thus not an objective feature of the world, but a subjective power; the mind's capacity to form inductive habits. The objectivity of causality, the objective support of inductive inference, is thus an illusion, an instance of what Hume calls the mind's “great propensity to spread itself on external objects” (THN, 167).

Hume's account of causal inference raises the problem of induction in an acute form: One would like to say that good and reliable inductions are those that follow the lines of causal necessity; that when

All observed Fs have also been Gs,

is the manifestation in experience of a causal connection between Fand G, then the inference

All observed Fs have also been Gs,
a is an F,
Therefore, a, not yet observed, is also a G,

is a good induction. But if causality is not an objective feature of the world this is not an option. The Humean problem of induction is then the problem of distinguishing good from bad inductive habits in the absence of any corresponding objective distinction.

Two sides or facets of the problem of induction should be distinguished: The epistemological problem is to find a method for distinguishing good or reliable inductive habits from bad or unreliable habits. The second and deeper problem is metaphysical. This is the problem of distinguishing reliable from unreliable inductions. This is the problem that Whitehead called “the despair of philosophy” (1925, 35). The distinction can be illustrated in the parallel case of arithmetic. The by now classic incompleteness results of the last century show that the epistemological problem for first-order arithmetic is insoluble; that there can be no method, in a quite clear sense of that term, for distinguishing the truths from the falsehoods of first-order arithmetic. But the metaphysical problem for arithmetic has a clear and correct solution: the truths of first-order arithmetic are precisely the sentences that are true in all arithmetic models. Our understanding of the distinction between arithmetic truths and falsehoods is just as clear as our understanding of the simple recursive definition of truth in arithmetic, though any method for applying the distinction must remain forever out of our reach.

Now as concerns inductive inference, it is hardly surprising to be told that the epistemological problem is insoluble; that there can be no formula or recipe, however complex, for ruling out unreliable inductions. But Hume's arguments, if they are correct, have apparently a much more radical consequence than this: They seem to show that the metaphysical problem for induction is insoluble; that there is no objective difference between reliable and unreliable inductions. This is counter intuitive. Good inductions are supported by causal connections and we think of causality as an objective matter: The laws of nature express objective causal connections. Ramsey writes in his Humean account of the matter:

Causal laws form the system with which the speaker meets the future; they are not, therefore, subjective in the sense that if you and I enunciate different ones we are each saying something about ourselves which pass by one another like “I went to Grantchester,” “I didn't”. (Ramsey 1931a, 137)

A satisfactory resolution of the problem of induction would account for this objectivity in the distinction between good and bad inductions.

It might seem that Hume's argument succeeds only because he has made the criteria for a solution to the problem too strict. Enumerative induction does not realistically lead from premises

All observed Fs have also been Gs
a is an F,

to the simple assertion

Therefore, a, not yet observed, is also a G.

Induction is contingent inference and as such can yield a conclusion only with a certain probability. The appropriate conclusion is

It is therefore probable that, a, not yet observed, is also a G

Hume's response to this (THN, 89) is to insist that probabilistic connections, no less than simple causal connections, depend upon habits of the mind and are not to be found in our experience of the world. Weakening the inferential force between premises and conclusion may divide and complicate inductive habits; it does not eliminate them. The laws of probability alone have no more empirical content than does deductive logic. If I infer from observing clouds followed by rain that today's clouds will probably be followed by rain this can only be in virtue of an imperfect habit of associating rain with clouds.

2.1 The justification of induction

Hume's argument is often credited with raising the problem of induction in its modern form. For Hume himself the conclusion of the argument is not so much a problem as a principle of his account of induction: Inductive inference is not and could not be reasoning, either deductive or probabilistic, from premises to conclusion, so we must look elsewhere to understand it. Hume's positive account does much to alleviate the epistemological problem—how to distinguish good inductions from bad ones—without treating the metaphysical problem. His account is based on the principle that inductive inference is the work of association which forms a “habit of the mind” to anticipate the consequence, or effect, upon witnessing the premise, or cause. He provides illuminating examples of such inferential habits in sections I.III.XI and I.III.XII of the Treatise (THN). The latter accounts for frequency-to-probability inferences in a comprehensive way. It shows that and how inductive inference is “a kind of cause, of which truth is the natural effect.”

Although Hume is the progenitor of modern work on induction, induction presents a problem, indeed a multitude of problems, quite in its own right. The by now traditional problem is the matter of justification: How is induction to be justified? There are in fact several questions here, corresponding to different modes of justification. One very simple mode is to take Hume's dilemma as a challenge, to justify (enumerative) induction one should show that it leads to true or probable conclusions from true premises. It is safe to say that in the absence of further assumptions this problem is and should be insoluble. The realization of this dead end and the proliferation of other forms of induction have led to more specialized projects involving various strengthened premises and assumptions. The several approaches treated below exemplify this.

2.2 Karl Popper's views on induction

One of the most influential and controversial views on the problem of induction has been that of Karl Popper, announced and argued in The Logic of Scientific Discovery (LSD). Popper held that induction has no place in the logic of science. Science in his view is a deductive process in which scientists formulate hypotheses and theories that they test by deriving particular observable consequences. Theories are not confirmed or verified. They may be falsified and rejected or tentatively accepted if corroborated in the absence of falsification by the proper kinds of tests:

[A] theory of induction is superfluous. It has no function in a logic of science.

The best we can say of a hypothesis is that up to now it has been able to show its worth, and that it has been more successful than other hypotheses although, in principle, it can never be justified, verified, or even shown to be probable. This appraisal of the hypothesis relies solely upon deductive consequences (predictions) which may be drawn from the hypothesis: There is no need even to mention “induction”. (LSD, 315)

Popper gave two formulations of the problem of induction; the first is the establishment of the truth of a theory by empirical evidence; the second, slightly weaker, is the justification of a preference for one theory over another as better supported by empirical evidence. Both of these he declared insoluble, on the grounds, roughly put, that scientific theories have infinite scope and no finite evidence can ever adjudicate among them (LSD, 253–254; Grattan-Guiness 2004). He did however hold that theories could be falsified, and that falsifiability, or the liability of a theory to counterexample, was a virtue. Falsifiability corresponds roughly to to the proportion of models in which a (consistent) theory is false. Highly falsifiable theories thus make stronger assertions and are in general more informative. Though theories cannot in Popper's view be supported, they can be corroborated: a better corroborated theory is one that has been subjected to more and more rigorous tests without having been falsified. Falsifiable and corroborated theories are thus to be preferred, though, as the impossibility of the second problem of induction makes evident, these are not to be confused with support by evidence.

Popper's epistemology is almost exclusively the epistemology of scientific knowledge. This is not because he thinks that there is a sharp division between ordinary knowledge and scientific knowledge, but rather because he thinks that to study the growth of knowledge one must study scientific knowledge:

[M]ost problems connected with the growth of our knowledge must necessarily transcend any study which is confined to common-sense knowledge as opposed to scientific knowledge. For the most important way in which common-sense knowledge grows is, precisely, by turning into scientific knowledge. (Popper LSD, 18)

3. Probability and induction

3.1 Elementary probability

A probability on a first-order language is a function that assigns a number between zero and one inclusive to each sentence in the language. The laws of probability require that if A is any sentence of the language then

0 ≤ P(A) ≤ 1
If A and B are logically incompatible then P(AB) = P(A) + P(B)
If A is logically necessary then P(A) = 1

The probability P is said to be regular iff the condition of P3 is also necessary, i.e., iff no contingent sentence has probability one.

Given a probability P on a language L the conditional probability P(B | A) is defined for pairs A, B of sentences when P(A) is positive:

If P(A) > 0 then P(B |A) = P(AB) / P(A)

Conditional probability may also be taken as fundamental and simple probability defined in terms of it as, for example, probability conditioned on a tautology (see, for example, Hajek 2003).

Sentences A, B are said to be independent in P if

P(AB) = P(A)P(B).

The set {A1, …, Ak} is thoroughly independent in P iff for each non-null subset {B1, …, Bn} of {A1, …, Ak}

P(B1 ∧ … ∧ Bn) = P(B1) P(B2) … P(Bn)

From the above laws and definitions it follows that:

If A is logically inconsistent then P(A) = 0.
P(A) + PA) = 1

(So every consistent sentence has positive probability.)

If A and B are logically equivalent then

[(AB) ∨ ¬A]


[(AB) ∨ ¬B]

are both logically necessary. Hence by P3 and P2, if A and B are logically equivalent

P(A) = P(AB) = P(B)


Logically equivalent sentences are always equiprobable.
If P(A), P(B) are both positive then A and B are independent in P iff P(B | A) = P(B).

If A and B are independent in P, then

P(A ∧ ¬B)
= P(A) − P(AB)
= P(A) − P(A)P(B)
= P(A) (1 − P(B)
= P(A)PB)


If A and B are independent in P, A and ¬B are independent in P.

One simple and important special case concerns a language L(k), the vocabulary of which includes just one monadic predicate R and k individual constants a1, …, ak. A k-sequence in L(k) is a conjunction that includes for each constant ai either R(ai) or ¬R(ai) (not both). In a standard interpretation k-sequences represent samples from a larger population of individuals; then R and ¬R represent presence and absence of a trait of interest.

We state without proof the generalization of C5:

Given a language L(k) and a probability P on L(k), if any k-sequence of L(k) is thoroughly independent in P then every k-sequence of L(k) is thoroughly independent in P. Hence if any k-sequence of L(k) is thoroughly independent in P, and A and B are any sentences of L(k), A and B are independent in P.

P is symmetrical on a language L(k) iff it is invariant for the permutation of individual constants, i.e., iff

P[A(a1, …, an)] = P[A(b1, …, bn)]

for each formula A and any individual constants {ai}, {bi}

Independence is sufficient for symmetry in the following precise sense:

Let P be a probability on a language L(k) and let A = (B1, …, Bk) be any k-sequence in L(k). Then if A is thoroughly independent in P, P is symmetrical on L(k).

The condition of C7 is not necessary; symmetry does not imply independence, i.e., there are languages L(k), k-sequences A in L(k) and symmetrical probabilities P on L(k) such that A is not thoroughly independent in P. (A simple example in section 3.2 below illustrates this.)

If X = {x1, …, xn} is a finite set of individuals and YX, then the relative frequency of Y in X is the proportion of members of X that are also members of Y:

R(Y | X) = (1/n)C{XY}

One relation between probability and relative frequency is easily expressed in terms of symmetry. We state this without proof (see Carnap LFP, 495 for a proof):

(The proportional syllogism) If P is a symmetrical probability defined on a finite population then the probability that an individual in that population has a trait R is equal to the relative frequency of R in the population.

C8 can be understood from a Kantian-Critical point of view to express that relative frequency is the schema of (symmetrical) probability; the manifestation of probability in experience.

Bayes' Theorem (to be distinguished from Bayes' Postulate, to be treated in section 4) is a truth of probability useful in evaluating support for probabilistic hypotheses. It is a direct consequence of the definition of conditional probability.

(Bayes' Theorem) If P(E) > 0 and P(H) > 0, then
P(H | E) = P(E | H)P(H) / P(E)

A second important principle, often used in conjunction with C9 is:

If E is a consistent sentence and H1, … Hn are a logical partition (i.e., equiprobable, pairwise incompatible and jointly exhaustive) then
P(E) = ∑i P(E | Hi)P(Hi)

The simple probabilities defined in section 3.1 above can serve to illustrate and compare approaches to probabilistic induction; Carnap's logicism, Reichenbach's frequentism, and Bayesian subjectivism. These sponsor different formulations and proposed solutions of the problem of induction.

Perhaps the most evident difference among the three theories is just that of the bearers or objects of probability. Probability applies to sentences in Carnap' logicism, to event-types for Reichenbach and to beliefs in subjectivism.

3.2 Carnap's inductive logic

Carnap's taxonomy of the varieties of inductive inference (LFP 207f) may help to appreciate the complexity of the contemporary concept.

Probability in Carnap's theory is a metalinguistic operator, as it is in the exposition of section 3.1 above. In this context the problem of induction is to choose or to design a language appropriate to a given situation and to define a probability on this language that properly codifies inductive inference.

Carnap writes m(s) for the probability of the sentence s and

c(h, e) = m(he) / m(e)

when m(e) > 0, for the degree of confirmation of the hypothesis h on evidence e. Degree of confirmation satisfies the laws of probability and in addition symmetry. In standard cases c and m are also regular.

K-sequences (state descriptions in Carnap's terminology) are the most specific sentences in a language L(k): every consistent sentence is logically equivalent to a disjunction of these pairwise incompatible conjunctions, so fixing the probabilities of all state descriptions, which must always sum to one, fixes the probability of every consistent sentence in the language. (The principle C1 of section 3.1 fixes the probability of inconsistent sentences at zero.) State descriptions are isomorphic if they include the same number of negations. A structure description is a maximal disjunction of isomorphic state descriptions; all and only the state descriptions with the same number of negations. Symmetry entails that state descriptions in the same structure description are equiprobable.

To fix ideas we consider L(3) which we take to represent three draws with replacement from an urn including an indefinite number of balls, each either Red (R) or Black (¬R) . There are then eight state descriptions (eight possible sequences of draws) and four structure descriptions: a state description says which balls drawn have which color. A structure description says just how many balls there are of each color in a sequence of draws without respect to order.

From a deductive-logical point of view, the set of logical consequences of a given state description is a maximal consistent set of sentences of L(3): The set is consistent (consisting as it does of the logical consequences of a consistent sentence) and maximal; no sentence of L(3) not implied by the set is consistent with it. The state descriptions correspond to models, to points in a logical space. A (symmetrical) probability on L(3) thus induces a normal measure on sets of models: Any assignment of non-negative numbers summing to one to the state descriptions or models fixes probabilities. In this finite case, the extent to which evidence e supports a hypothesis h is the proportion of models for e in which h is true. Deductively, e logically implies h if h is true in every model for e. Degree of confirmation is thus a metrical generalization of first-order logical implication.

There are two probabilities that support contrasting logical-probable relations among the sentences of L(3). The simpler of these, m† and c†, is uniform or equiprobable over state descriptions; each state description has probability 1/8. From the point of view of induction it is significant that every 3-sequence (every sequence of three draws) is thoroughly independent in m†. This means that drawing and replacing a Red ball provides no evidence about the constitution of the urn or the color of the next ball to be drawn. Carnap took this to be a strong argument against the use of m† in induction, since it seemed to prohibit learning from experience.

Although m† may not serve well inductively, it is one of a class of very important probabilities. These are probabilities that are equiprobable for R, and in which for each k, every k-sequence is thoroughly independent. Such measures are known as Bernoullian probabilities, they satisfy the weak law of large numbers, first proved by Jacob Bernoulli in 1713. This law states that in the Bernoullian case of thorough independence and equiprobability, as the number of trials increases without bound, the difference between the probability of S and its relative frequency becomes arbitrarily small.

The second probability in question is m* and c*. m* is uniform (equiprobable) not on state descriptions but on structure descriptions. This can be thought of as enforcing a division of labor between cause and chance: The domain of cause includes the structures of the urn and the balls, the number and colors of the balls, the way the balls are shuffled between draws and so on. Given these causal factors, the order in which the balls are drawn is a matter of chance; this order is not determined by the mechanics of the physical set up just described. Of course the mechanics of the draw are also causally determined, but not by the mechanics of the physical set up.

In the present example a simple calculation shows that:

m*(R(1)) = m*(R(2)) = m*(R(3)) = 1/2

c*(R(2), R(1) = 2/3

c*(R(3), R(1) ∧ R(2)) = 3/4

c*(R(3), R(1) ∧ ¬R(2)) = 3/8

c*(R(2), ¬R(1)) = 1/3

c*(R(3), ¬R(1) ∧ ¬R(2)) = 1/4

m* (c*) is thus affected by evidence, positively and negatively, as m† is not. R(1), R(2) and R(3) are not independent in m*. This establishes, as promised above in section 3.2, a symmetrical probability in which k-sequences are not thoroughly independent. Symmetry is a demonstrably weaker constraint on probability than independence.

In later work Carnap introduced systems (the λ systems) in which different predicates could be more or less sensitive to evidence.

3.3 Reichenbach's frequentism

Carnap's logical probability generalized the metalinguistic relation of logical implication to a numerical function, c(h, e), that expresses the extent to which an evidence sentence e confirms a hypothesis h. Reichenbach's probability implication is also a generalization of a deductive concept, but the concept generalized belongs first to an object language of events and their properties.

This generalization extends classical first-order logic to include probability implications. These are formulas (Reichenbach TOP, 45)

xAp xB

where p is some quantity between zero and one inclusive.

In a more conventional notation this probability implication between properties or classes may be written

P(B | A) = p

(Reichenbach writes P(A, B) rather than P(B | A). The latter is written here to maintain consistency with the notations of other sections.)

Reichenbach's probability logic is a conservative extension of classical first-order logic to include rules for probability implications. The individual variables (x, y) are taken to range over events (“The gun was fired,” “The shot hit the target”) and, as the notation makes evident, the variables A and B range over classes of events (“the class of firings by an expert marksman,” “the class of hits within a given range of the bullseye”) (Reichenbach TOP, 47). The formal rules of probability logic assure that probability implications conform to the laws of conditional probability and allow inferences integrating probability implications into deductive logic, including higher-order quantifiers over the subscripted variables.

Reichenbach's rules of interpretation of probability implications require, first, that the classes A and B be infinite and in one-one correspondence so that their order is established. It is also required that the limiting relative frequency

limn→∞ N(AnBn) / n

where An, Bn are the first n members of A, B respectively, and N gives the cardinality of its argument, exists. When this limit does exist it defines the probability of B given A (Reichenbach 1971, 68):

P(B | A) =df limn→∞ N(AnBn) / n, when the limit exists.

The complete system also includes higher-order or, as Reichenbach calls them, concatenated probabilities. First-level probabilities involve infinite sequences; the ordered sets referred to by the predicates of probability implications. Second-order probabilities are determined by lattices, or sequences of sequences. (Reichenbach 1971, chapter 8 and ¶41).

3.3.1 Reichenbachian induction.

On Reichenbach's view, the problem of induction is just the problem of ascertaining probability on the basis of evidence (TOP, 429). The conclusions of inductions are not asserted, they are posited. “A posit is a statement with which we deal as true, though the truth value is unknown” (TOP, 373).

If the relative frequency of
B in A = N(AnBn) / n
is known for the first n members of the sequence A and nothing is known about this sequence beyond n, then we posit that the limit
limn→∞[ N(AnBn) / n]
will be within a small increment δ of
N(AnBn) / n.

(This corresponds to the Carnapian λ-function c0 (λ(κ) = 0) which gives total weight to the empirical factor and no weight to the logical factor.)

It is significant that finite relative frequencies are symmetrical, independent of order, but limiting relative frequencies are not; whether a limit exists, and if it exists its value, depend upon the order of the sequence. The definition of probability as limiting relative frequency thus entails that probability, and hence inductive inference, so defined is not symmetrical.

Reichenbach's justification of induction by enumeration is known as a pragmatic justification (see also Salmon 1967, 52–54). It is first important to keep in mind that the conclusion of inductive inference is not an assertion, it is a posit. Reichenbach does not argue that induction is a sound method; his account is rather what Wesley Salmon (1963) and others have referred to as vindication: that if any rule will lead to positing the correct probability, the inductive rule will do this, and it is, furthermore, the simplest rule that is successful in this sense.

What is now the standard difficulty with Reichenbach's rule of induction was noticed by Reichenbach himself and later strengthened by Salmon (1963). It is that for any observed relative frequency in an initial segment of any finite length, and for any arbitrarily selected quantity between zero and one inclusive, there exists a rule that leads to that quantity as the limit on the basis of that observed frequency. Salmon goes on to announce additional conditions on adequate rules that uniquely determine the rule of induction. More recently Cory Juhl (1994) has examined the rule with respect to the speed with which it approaches a limit.

4. Bayesianism and subjectivism

Bayesian induction incorporates a subjectivistic view of probability, according to which probability is identified with strength of belief. Objective Bayesianism incorporates also normative epistemic constraints. (“Logical Foundations of Evidential Support,” (Fitelson 2006a) is a good example of the genre.) Contemporary Bayesianism is not only a doctrine, or family of positions, about probability. It applies generally in epistemology and the philosophy of science as well. “Bayesian statistical inference for psychological research” (Edwards et al. 1963) gave a general Bayesian account of statistical inference. Savage (1954), Jeffrey (1983) and Skyrms (1980) give extensive Bayesian accounts of decision making in situations of uncertainty. More recently objective Bayesianism has taken on the traditional problem of the justification of universal inference. The matter is briefly discussed in section 5 below.

The Bayesian approach to induction can be illustrated in the languages L(k) of section 3.1: Recall that an urn contains three balls, each either Red or Black (= not Red). It is not known how many balls of each color there are. Balls are to be drawn, their colors recorded, and replaced. On the basis of this evidence, the outcomes of the successive draws, we are to support beliefs about the constitution of the urn. There are four possible constitutions, determined by the numbers of Red (and Black) balls in the urn. We can list these as alternative hypotheses stating the number of Reds:

It is useful to consider what our beliefs would be if we knew which hypothesis was true. If the probability P on the language L(k) gives our beliefs about this setup, then P is, as remarked in section 3.1, symmetric. Further, if, for example, we knew that there were two Reds and one Black ball in the urn the sequences of draws would be symmetric and (thoroughly) independent with constant probability (= 2/3) of Red on each draw. To what extent a given sequence of draws supports the different hypotheses is, on the other hand, not at all clear. If σ(k) is a k-sequence we want to find the probabilities P(Hi | σ(k)), for i = 0, 1, 2 and 3. We do know that after the first draw we shall reject either H0 or H3, but little else is evident. Notice however that if the probabilities P(Hi | σ(k)), the extent to which given sequences support the different hypotheses, are not readily available, we just saw that their converses, P(σ(k) | Hi) (these are the likelihoods of the hypotheses given the evidence σ(k)) are easily and directly calculated: If, for example, the k-sequence σ(k) includes n Reds and (kn) Blacks then

P(σ(k)| H2) = (2/3)n(1/3)(k − n)

Each k-sequence is thus thoroughly independent in each conditional probability, P(__| Hi), with constant probability of Rj. These conditional probabilities are thus Bernoullian.

Bayes' theorem (C9 of section 3.1) expresses the probability P(H | E) in terms of the likelihood P(E | H).

P(H | E) = P(E | H)P(H) / P(E)

Bayes' postulate says in this case that if we have no reason to believe that any of the four hypotheses is more likely than the others, then we may consider them to be equiprobable. Since the hypotheses are pairwise incompatible, on the basis of this assumption it follows from C9.1 of section 3.1 that

P(E) = ∑i P(E | Hi)P(Hi)

And hence that for each hypothesis Hj,

P(Hj | E) = P(E | Hj)P(Hj) / ∑i P(E | Hi)P(Hi)

Thus, for example, we have that

P(H1 | R1) = (1/3) / ∑i P(E | Hi)P(Hi) = (1/3) / 2 = 1/6

Similarly, P(H2 | R1) = 1/3, P(H3 | R1) = 1/2.

The simple, and obvious, criticism of the Bayesian method is that the prior (before knowledge of any evidence) probabilities fixed by Bayes' postulate are arbitrary. The Bayesian response is that the Bayesian method of updating probabilities with successive outcomes progressively diminishes the effect of the initial priors. This updating uses the posterior probabilities resulting from the first draw as the “prior” probabilities for the second draw. Further, as the number of trials increases without bound, the updated probability is virtually certain to approach one of the conditional probabilities P(_ | Hi) (de Finetti 1937). (See Zabell 2005 for a precise formulation and exposition of the de Finetti theorem and Jeffrey 1983, Section 12.6, for a more brief and accessible account.)

4.1 Induction and deduction

Our deep and extensive understanding of deductive logic, in particular of the first-order logic of quantifiers and truth functions, is predicated on two metatheorems; semantical completeness of this logic and the decidability of proofs and deductions. The decidability result provides an algorithm which when applied to a (finite) sequence of sentences decides in finitely many steps whether the sequence is a valid proof of its last member or is a valid deduction of a given conclusion from given premises. Semantical completeness enables the easy and enlightening movement between syntactical, proof theoretic, operations and reasoning in terms of models. In combination these metatheorems resolve both the metaphysical and epistemological problems for proofs and demonstrations in first-order logic: Namely, what distinguishes valid from invalid logical demonstration? and what are reliable methods for deductive inference? (It should however be kept in mind that neither logical validity nor logical implication is decidable.) Neither of these metatheorems is possible for induction. Indeed, if Hume's arguments are conclusive then the metaphysical problem, to distinguish good from bad inductions, is insoluble.

But this is not to say that no advance can be made on the epistemological problem, the task of finding or designing good inductive methods; methods that will lead to true conclusions or predictions if not inevitably then at least in an important proportion of cases in which they are applied. Hume himself, in fact, made significant advances in this direction: first in the section of the Treatise (I.III.XIII) on inductive fallacies in which he gives an account of how it is that “we learn to distinguish the accidental circumstances from the efficacious causes,” (THN 149) and later (THN I.III.XV, “Rules by which to judge of causes and effects,”) which rules are clear predecessors of Mill's Four Methods (Mill 1843, Bk III, Ch. VIII).

As concerns differences between induction and deduction, one of these is dramatically illustrated in the problems with Williams' thesis discussed in section 4.2 below: This is that inductive conditional probability is not monotonic with respect to conditions: Adding conditions may increase or decrease the value of a conditional probability. The same holds for non-probabilistic induction: Adding premises to a good induction may weaken its strength: That the patient presents flu-like symptoms supports the hypothesis that he has the flu. When to this evidence is added that he has been immunized against flu, that support is undermined. A second difference concerns relativity to context, to which induction but not deduction is susceptible. We return to this question in section 5 below.

4.2 A demonstrative argument to show the soundness of induction

Among those not convinced by Hume's arguments stated in section 2.1 above are D.C. Williams, supported and corrected by D.C. Stove, and David Armstrong. Williams argued in The Ground of Induction (1947) that it is logically true that one form of probabilistic inductive inference is sound and that this is logically demonstrable in the theory of probability. Stove reiterated the argument with a few reformulations and corrections four decades later. Williams held that induction is a reasonable method. By this he intended not only that it is characterized by ordinary sagacity. Indeed, he says that that an aptitude for induction is just what we mean by ‘ordinary sagacity’. He claims that induction, or one important species of it, is reasonable in the (not quite standard sense) of being “logical or according to logic”.

Hume, on the other hand, according to Williams held that:

[A]lthough our nervous tissue is so composed that when we have encountered a succession on M's which are P we naturally expect the rest of M's to be P, and although this expectation has been borne out by the event in the past, the series of observations never provided a jot of logical reason for the expectation, and the fact that the inductive habit succeeded in the past is itself only a gigantic coincidence, giving no reason for supposing it will succeed in the future. (Williams 1947, 15)

Williams and Stove maintain that while there may be, in Hume's phrase, no “demonstrative arguments to prove” the uniformity of nature, there are good deductive arguments that prove that certain inductive methods yield their conclusions with high probability.

The specific form of induction favored by Williams and Stove is now known as inverse inference; inference to a characteristic of a population based on premises about that population (see the taxonomy in section 3.2 above). Williams and Stove focus on inverse inferences about relative frequency. In particular on inferences of the form:

The relative frequency of the trait R in the sufficiently large sample S from the finite population X is rf(R | X) = r


The relative frequency of R in X is close to r; f(R | X) ≈ r

(Williams 1947, 12; Stove 1986, 71–75) (This includes, of course, the special case in which r = 1)

Williams, followed by Stove, sets out to show that it is necessarily true that the inference from (i) to (ii) has high probability:

Given a fair sized sample, then, from any [large, finite] population, with no further material information, we know logically that it very probably is one of those which [approximately] match the population, and hence that very probably the population has a composition similar to that which we discern in the sample. This is the logical justification of induction. (Williams 1947, 97)

Williams and Stove recognize that induction may depend upon context and also upon the nature of the traits and properties to which it is applied. And Stove, at least, does not propose to justify all inductions: “That all inductive inferences are justified is false in any case” (Stove 1986, 77).

Williams' initial argument was simple and persuasive. It turns out, however, to have subtle and revealing difficulties. In response to these difficulties, Stove modified and weakened the argument, but this response may not be sufficient. There is in addition the further problem that the sense of necessity that founds the inferences is not made precise and becomes increasingly stressed as the argument plays out.

There are two principles on which the (i) to (ii) inference depends: First is the proportional syllogism (C8 of section 3.1). Second is a rule relating the frequency of a trait in a population to its frequency in samples from that population:

The proof of the Frequency Principle depends upon the fact that the relative frequencies of a trait in samples from such a population are normally distributed with mean equal to the frequency of the trait in the population and dispersion (standard deviation) that diminishes with the sizes of the population and the samples.

Williams' ingenious argument from (i) to (ii) begins with an induction on a ‘hyperpopulation’ (Williams 1947, 94–96) of all samples of size k (‘k-samples’—like the k-sequences of the previous section) drawn from a large finite population X of individuals. The ‘individuals’ of the hyperpopulation are themselves k-samples from the population X.

Now let P be a symmetrical probability. (Williams assumes symmetry without explicit notice.) Given a large population X and a k-sample S of appropriate size from X , in which the relative frequency of the trait R is r, the content of the premise (i) above can be expressed in two premises:

Premise A. S is a k-sample from X.
Premise B. The relative frequency of R in S is r, i.e., f(R | S) = r

To show that (i) implies (ii), Williams argued from A and B as follows: It follows from the Frequency Principle that

The relative frequency of k-samples in the population X that resemble X is high.

It follows from Premise A, (1), and the Proportional Syllogism (C8) that

P(S resembles X) is high

By Premise B and the definition of resemblance

P[f(R | X) ≈ r | S resembles X] is high.

It follows from (2) and (3) by the laws of probability that

P[f(R | X) ≈ r] is high

Hence, goes the argument, (i) above implies (ii).

We might like to reason in this way, and Williams did reason in this way, but as Stove pointed out in The Rationality of Induction (1986, 65) it ignores the failure of monotonicity. Inductive inference in general and inductive conditional probabilities in particular are not monotonic: adding premises may change a good induction to a bad one and adding conditions to a conditional probability may change, and sometimes reduce, its value. Here, (3) depends on Premise B but suppresses mention of it, thus failing to respect the requirement to take account of all available and relevant evidence. In stating (3) Williams neglected the critical distinction between the probability of f(R | X) = r conditioned on resemblance:

P[f(R | X) ≈ r | S resembles X]

and the result of adding Premise B, which states the relative frequency of R in S0, to the condition of (3a)

P[f(R | X) ≈ r | S resembles Xf(R | S0) = r]

When, however, the conditions of (3) are expanded to take account of premise B,

P[f(R | X) ≈ r | S resembles Xf(R | S0) = r] is high

the result does not follow from the premises; (3*) is true for some values of r and not for others.

As Maher describes this effect (and as Williams himself (1947, 89) had pointed out):

Sample proportions near 0 or 1 increase the probability that the population is nearly homogeneous which, ceteris paribus, increases the probability that the sample matches the population; conversely, sample proportions around 1/2 will ceteris paribus, decrease the probability of matching. (Maher 1996, 426)

Thus the addition of premise B to the condition of (3) might decrease the probability that S0 resembles the population X: (3b) might be low while (3a) is high.

Conditional probability contrasts with the deductive logic of the material conditional in this respect:

(AB) implies [(AC) → B]


P(B | A) = p does not imply P(B | AC) = p

Stove's response to this difficulty was to point out that neither he nor Williams had ever claimed that every inductive inference, nor even every instance of the (i) to (ii) inference, was necessarily highly probable. All that was needed on Stove's view to establish Williams's thesis was to provide one case of values for r, X, S and R for which the inference holds necessarily. This would, Stove claimed, show that at least one inductive inference was necessarily rational.

Stove (1986, chapter VII) provided examples of specific values for the above parameters that he argued do license the inference. Maher pointed out that the argument depends upon tacit assumptions about the prior probabilities of different populations and their constitutions, and that when this is taken account of the conclusions no longer follow deductively. Scott Campbell (2001) continued the discussion.

Williams' original argument when expressed in general terms is simple and seductive: It is a combinatorial fact that the relative frequency of a trait in a large population is close to its relative frequency in most large samples from that population. The proportional syllogism is a truth of probability theory: In the symmetrical case relative frequency equals probability. From these it looks to be a necessary truth that the probability of a trait in a large population is close to its relative frequency in that population. We have seen that and why the consequence does not follow. Various efforts at weakening the original Williams thesis have been more or less successful. It is in any event plausible that there are at least some examples of inductions for which some form of the Williams thesis is true, but the thesis emerges from this dialectic considerably weakened.

4.3 Rationalistic criticism of Hume

D.M. Armstrong, like Williams and Stove, is a rationalist about induction.

About one-third of What is a Law of Nature (Armstrong 1983) is devoted to stating and supporting three rationalistic criticisms of what Armstrong calls the regularity theory of law. Put very generally, the various forms of the regularity theory all count laws, if they count them at all, as contingent generalizations or mere descriptions of the events to which they apply: “All there is in the world is a vast mosaic of local matters of particular fact, just one little thing and then another,” as David Lewis put this view (1986, ix). Armstrong argues against all forms of the regularity theory; laws, on his view, are necessary connections of universals that neither depend nor supervene on the course of worldly events but determine, restrict, and govern those events. The law statement, a linguistic assertion, must in his view be distinguished from the law itself. The law itself is not linguistic; it is a state of affairs, “that state of affairs in the world which makes the law statement true” (Armstrong 1991, 505). A law of nature is represented as ‘N(F, G)’ where F and G are universals and N indicates necessitation; Necessitation is inexplicable, it is “a primitive, which we are forced to postulate” (Armstrong 1983, 92). “That each F is a G, however, does not entail that F-ness [the universal F] has N to G-ness” (Armstrong 1983, 85). That is to say that the extensional inclusion ‘all F's are G's’ may be an accidental generalization and does not imply a lawlike connection between F's and G's. In a “first formulation” of the theory of laws of nature (Armstrong 1983, 85), if N(F, G) is a law, it “does not obtain of logical necessity, if it does obtain then it entails the corresponding Humean or cosmic uniformity: (x)(FxGx).” In later reconsideration (Armstrong 1983, 149), however, this claim is withdrawn: N(F, G) does not entail that all F's are G's; for some F's may be “interfered with,” preventing the law's power from doing its work.

Armstrong's rationalism does not lead him, as it did Williams and Stove, to see the resolution of the problem of induction as a matter of demonstrating that induction is necessarily a rational procedure:

[O]rdinary inductive inference, ordinary inference from the observed to the unobserved, is, although invalid, nevertheless a rational form of inference. I add that not merely is it the case that induction is rational, but it is a necessary truth that it is so. (Armstrong 1983, 52)

Armstrong does not argue for this principle; it is a premise of an argument to the conclusion that regularity views imply the inevitability of inductive skepticism; the view, attributed to Hume, that inferences from the observed to the unobserved are not rational (Armstrong 1983, 52). Armstrong seems to understand ‘rational’ not in Williams' stronger sense of entailing deductive proofs, but in the more standard sense of (as the OED defines it) “Exercising (or able to exercise) one's reason in a proper manner; having sound judgement; sensible, sane” (Williams' “ordinary sagacity,” near enough)

The problem of induction for Armstrong is to explain why the rationality of induction is a necessary truth (Armstrong 1983, 52). Or, in a later formulation, to “lay out a structure of reasoning which will more fully reconcile us (the philosophers) to the rationality of induction” (Armstrong 1991, 505). His resolution of this problem has two “pillars” or fundamental principles. One of these is that laws of nature are objective natural necessities and, in particular, that they are necessary connections of universals. The second principle is that induction is a species of inference to the best explanation (IBE), what Peirce called ‘abduction’.

[T]he core idea is very simple: observed regularities are best explained by hypotheses of strong laws of nature [i.e., objective natural necessities], hypotheses which in turn entail conclusions about the unobserved. (Armstrong 2001, 503)

IBE, as its name suggests, is an informal and non-metric form of likelihood methods. Gilbert Harman coined the term in “The Inference to the Best Explanation,” (Harman 1965, see also Harman 1968) where he argued that enumerative induction was best viewed as a form of IBE: The explanandum is a collection of statements asserting that a number of F's are G's and the absence of contrary instances, and the explanans, the best explanation, is the universal generalization, all F's are G's. IBE is clearly more general than simple enumerative induction, can compare and evaluate competing inductions, and can fill in supportive hypotheses not themselves instances of enumerative induction. (Armstrong's affinity for IBE should not lead one to think that he shares other parts of Harman's views on induction.)

An instantiation of a law is of the form

N(F, G) a's being F, a's being G

where a is an individual. Such instantiations are states of affairs in their own right.

As concerns the problem of induction, the need to explain why inductive inferences are necessarily rational, one part of Armstrong's resolution of the problem can be seen as a response to the challenge put sharply by Goodman: Which universal generalizations are supported by their instances? Armstrong holds that necessary connections of universals, like N(F, G), are lawlike, supported by their instances, and, if true, laws of nature. It remains to show how and why we come to believe these laws. Armstrong's proposal is that having observed many F's that are G, and no contrary instances, IBE should lead us to accept the law N(F, G).

[T]he argument goes from the observed constant conjunction of characteristics to the existence of a strong law, and thence to a testable prediction that the conjunction will extend to all cases. (Armstrong 1991, 507)

5. Paradoxes, the new riddle of induction and objectivity

The traditional problem of induction as Hume formulated it concerned what we now know as universal inference (see the taxonomy in section 3.2 above). The premise

One or several A's have been observed to be B's, and no A's are known to be not B's.

Inductively supports

All A's are B's.

And singular predictive inference:

One or several A's have been observed to be B's, and no A's are known to be not-B.
a, heretofore unobserved, is known to be A, and not known to be not-B


a is B.

The first of these forms, universal inference, can be codified or schematized by means of two definitions and two principles:

A simple example, due to C. G. Hempel, shows that all is not as simple as it might at first appear: By Nicod's principle

(AaBa) supports ∀x(AxBx).

This last is logically equivalent to

x[(Ax ∧ ¬ Bx) → (Ax ∧ ¬ Ax)].

But nothing can be a positive instance of the latter, and hence (AaBa) cannot support it. Thus if the equivalence principle is to obtain, Nicod's principle cannot be a necessary condition of inductive support, though it may be sufficient. The difficulty is endemic; the structure of logical equivalents may differ, but that of instances cannot.

5.1 The paradox of the ravens

The paradox of the ravens shows that even when suitably restricted, the Nicod principle is not without problems: By instance confirmation ‘a is not black and not a raven’ confirms ‘all non-black things are non-ravens.’ Since this is logically equivalent to ‘all ravens are black,’ by the equivalence principle:

And this is, or at least seems, paradoxical; that a non-raven lends support to a hypothesis about the color of ravens is highly implausible.

The paradox resides in the conflict of this counterintuitive result with our strong intuitive attachment to enumerative induction, both in everyday life and in the methodology of science.

The initial resolution of this dilemma was proposed by C. G. Hempel (1945) who credits discussion with Nelson Goodman. Assume first that we ignore all the background knowledge we bring to the question, such as that there are very many things that are either ravens or are not black, and that we look strictly at the truth conditions of the premise (this is neither a raven nor black) and the supported hypothesis (all ravens are black). The hypothesis says (is equivalent to)

Everything is either a black raven or is not a raven.

This hypothesis partitions the world into three exclusive and exhaustive classes of things: non-black ravens, black ravens, and non-ravens. Any member of the first class falsifies the hypothesis. Each member of the other two classes confirms it. A non-black non-raven is a member of the third class and is thus a confirming instance.

If this seems implausible it is because we in fact do not, as assumed, ignore the context in which the question is raised. We know before considering the inference that there are some black ravens and that there are many more non-ravens, many of which are not black. Observing, for example, a white shoe thus tells us nothing about the colors of ravens that we don't already know, and since (sound) induction is ampliative, good inductions should increase our knowledge. If we didn't know that many non-ravens are not black, the observation of a non-black, non-raven would increase our knowledge.

On the other hand, we don't know whether any of the unobserved ravens are not black, i.e., whether the first and falsifying class of things has any members, Observing a raven that is black tells us that this object at least is not a falsifying instance of the hypothesis, and this we did not know before the observation.

As Goodman puts it, the paradox depends upon “tacit and illicit evidence” not stated in its formulation:

Taken by itself, the statement that the given object is neither black nor a raven confirms the hypothesis that everything that is not a raven is not black as well as the hypothesis that everything that is not black is not a raven. We tend to ignore the former hypothesis because we know it to be false from abundant other evidence—from all the familiar things that are not ravens but are black. (Goodman 1955, 72)

The important lesson of the paradox of the ravens and the Hempel-Goodman resolution of it is that inductive inference is sensitive to background information and context. What looks to be a good induction when considered out of context and in isolation turns out not to be so when the context, including background knowledge, is taken into account. The inductive inference from

a is a white shoe


all ravens are black

is not so much unsound as it is uninteresting and uninformative.

Recent discussion of the paradox continues and improves on the Hempel-Goodman account by making explicit, and thus licit, the suppressed evidence. Further development, along generally Bayesian lines, generalizes the earlier approach by defining comparative and quantitative concepts of support capable of differentiating support for the two hypotheses in question. We return to the matter in discussing objective Bayesian approaches to induction below.

5.2 The grue paradox and the new riddle of induction

Suppose that at time t we have observed many emeralds to be green and no emeralds to be any other color. We thus have evidence statements

Emerald a is green, emerald b is green, etc.

and these statements support the generalization:

All emeralds are green.

Now define the predicate “grue” to apply to all things observed before time t just in case they are green and to other things just in case they are blue. Then we have also the evidence statements

Emerald a is grue, emerald b is grue, etc.

Hence the same observations support incompatible hypotheses about emeralds to be observed after t; that they are green and that they are blue.

A few cautionary remarks about this frequently misunderstood paradox:

That the definition of grue includes a time parameter is sometimes advanced as a criticism of the definition. But, as Goodman points out, were we to take “grue” and its obverse “bleen” (“blue up to t, green thereafter”) instead of “green” and “blue” as primitive terms, definitions of the latter, standard English, terms would include time parameters. The question here is whether inductive inference should be relative to the language in which it is formulated. Deductive inference is relative in this way as is Carnapian inductive logic.

The grue paradox raises and illustrates problems of a different nature from those raised by the paradox of the ravens: “All ravens are black” whether true or false is a clear example of a solid generalization. The questions that are raised by the ravens paradox concern the nature of this support and the role of context; not that the nature of the hypothesis itself can be called into question. The grue paradox, on the other hand, presents us with a quite different question; here is a generalization of appropriate form that is clearly not, indeed apparently cannot, be supported by its instances. What is the difference between healthy generalizations, which, like “All ravens are black” are supported by their instances, and grue-type generalizations that cannot be so supported? That is Goodman's new riddle of induction. (But see Mill's remark cited at the beginning of this article where the riddle is anticipated.)

The old, or traditional problem of induction was to justify induction; to show that induction, typically universal and singular predictive predictive inference, leads always, or in an important proportion of cases, from true premises to true conclusions. This problem, says Goodman, is, as Hume demonstrated, insoluble, and efforts to solve it are at best a waste of time. We have been looking at the wrong problem; it is only a careless reading of Hume that prevented us from seeing this. Once we see the difficulty, more homely examples than the grue hypothesis are easy to come by:

Albert is in this room and safe from freezing.


Everyone in this room is safe from freezing.

For the same reasons

Everyone in this room is safe from freezing.

supports the counterfactual

If Nanook of the north were in this room he would be safe from freezing.


Albert is in this room and is a third son.

does not support

Everyone in this room is a third son.

Nor does

Everyone in this room is a third son.

support the counterfactual

If my only son were in this room he would be a third son.

It is not the least of Goodman's accomplishments to have shown that three questions all issue from the same new riddle:

Goodman's own response to the new riddle was that those generalizations that are supported by their instances involve predicates that have a history of use in prediction. Such predicates Goodman called projectible.

5.3 Return of the ravens

The project of the Carnapian logic of confirmation was to put inductive reasoning on the sure path of a science; to give a unified and objective account of inductive inference including clear rules of procedure, in close analogy to deductive inference (see Carnap LFP, section 43). The Hempel-Goodman resolution of the ravens paradox, according to which reference to context may be essential to induction, threatens to undermine this enterprise before it properly begins: If Hempel and Goodman have it right, a is a black raven may confirm all ravens are black in one context and not in another.

This relativity produces yet another problem of induction: How can the objectivity of inductive inference be assured given that it depends upon context? Context dependence must in this regard be distinguished from non-monotonicity: Monotonicity and its contrary concern relations among inductive arguments or inferences; a sound inductive argument can be converted into an unsound argument by adding premises. Context dependence, on the other hand, means that one and the same argument may be sound in one context and not in another. Context dependence, but not non-monotonicity, entails relativity and loss of objectivity.

It is useful to distinguish discursive context, such as that there are many more black things than ravens, from non-discursive context. Hume gives us a famous and striking example of the latter:

[A] man, who being hung out from a high tower in a cage of iron cannot forbear trembling, when he surveys the precipice below him, tho he knows himself to be perfectly safe from falling. (THN 149)

Hume's view is that such contextual factors can be neutralized by general rules, which rules will ascribe “one inference to our judgment, and the other to our imagination” (THN 149). Something like this must be the right account.

As concerns discursive context, objective Bayesianism seeks to eliminate or ameliorate the relativity illustrated and exemplified in the Ravens paradox first by supplementing the Nicod principle with a definition or necessary and sufficient condition of inductive support. This is done in terms of one of several appropriate objective probabilities, governed by normative principles. (Carnap's measures discussed in section 4.2 are examples of these.) Given such a probability, P, there are a number of alternative definitions of support. One widely accepted definition states that evidence E supports a hypothesis H if and only if E raises the probability of H (Carnap LFP, 464; Maher 1996; Fitelson and Hawthorne 2006, 10).

Support principle. Evidence E supports hypothesis H if and only if P(H | E) > P(H)

We look briefly at two objective Bayesian approaches to the problem of relativity in induction.

5.3.1 Logical Bayesianism

Patrick Maher, in “Inductive Logic and the Ravens Paradox” (1999), argues that the conclusion of the Ravens paradox is in fact not paradoxical, and that its appearance as such is deceptive. Maher claims that for certain objective logical probabilities (defined by Carnap in the λ system of 1980) a is neither black nor a raven raises the probability of all ravens are black, thus supports the latter, and that there is hence no paradox. The probabilities in question, like the probability P* described in section 3.2 above, weight homogeneous structure descriptions more heavily.

Maher also accounts for our initial (and mistaken) rejection of the paradoxical conclusion;

a is neither black nor a raven supports all ravens are black

By citing a false principle that is easily confused with PC. This is

Given that a is not a raven a is not black supports all ravens are black

(See also Hempel 1945, 19.)

As Maher puts it, “the observation of non-ravens tells us nothing about the color of [unobserved] ravens.”

5.3.2 Moving context into content

Branden Fitelson and James Hawthorne (2006) undermine the relativity entailed by context dependence by making of conditional probability a three-place function, in which probability is conditioned on evidence and context, the latter to include also background knowledge.

They write

PK(H, E)

to indicate relativity to background knowledge, including perhaps knowledge of (discursive) context. This permits the distinction of support relative to (i) some background knowledge, (ii) our actual, present background knowledge, (iii) tautological or necessary background knowledge and (iv) any background knowledge whatever.

In the signal example of the ravens paradox, background knowledge including a few remarkably weak modal assumptions (such as “Finding a to be a black raven neither absolutely proves nor absolutely falsifies ‘All ravens are black.’”) entail that a non-black non-raven supports that all ravens are black.

6. Knowledge, values and evaluation

6.1 Pragmatism: induction as practical reason

In 1953 Richard Rudner published “The Scientist qua Scientist Makes Value Judgments,” in which he argued for the thesis expressed in its title. Rudner's argument was simple:

[S]ince no hypothesis is ever completely verified, in accepting a hypothesis the scientist must make the decision that the evidence is sufficiently strong or that the probability is sufficiently high to warrant the acceptance of the hypothesis. (Rudner 1953, 2)

Sufficiency in such a decision will and should depend upon the importance of getting it right or wrong.

The argument has a precise point in the case of tests or experiments with known error probability (the probability of rejecting a true hypothesis or of accepting a false hypothesis) but it applies quite generally: Tests of hypotheses about drug toxicity may and should have less chance of going wrong than those about the quality of a “lot of machine-stamped belt buckles”. The argument is not restricted to scientific inductions; it shows as well that our everyday inferences depend inevitably upon value judgments; how much evidence one collects depends upon the importance of the consequences of the decision.

Isaac Levi, in responding to Rudner's claim, and to later formulations of it, distinguished cognitive values from other sorts of values; moral, aesthetic, and so on. (Levi 1986, 43–46) Of course the scientist qua scientist, that is to say in his scientific activity, makes judgments and commitments of cognitive value, but he need not, and in many instances should not, allow other sorts of values (fame, riches) to weigh upon his scientific inductions. (See also Jeffrey, 1956 for a related response to Rudner's argument.)

What is in question is the separation of practical reason from theoretical reason. Rudner denies the distinction; Levi does too, but distinguishes practical reason with cognitive ends from other sorts. Recent pragmatic accounts of inductive reasoning are even more radical. Following Ramsey (1931a) and Savage (1954), they subsume inductive reasoning under practical reason; reason that aims at and ends in action. These and their successors, such as Jeffrey (1983), define partial belief on the basis of preferences; preferences among possible worlds for Ramsey, among acts for Savage, and among propositions for Jeffrey. Preferences are in each case highly structured. In all cases beliefs as such are theoretical entities, implicitly defined by more elaborate versions of the pragmatic principle that agents (or reasonable agents) act (or should act) in ways they believe will satisfy their desires: If we observe the actions and know the desires (preferences) we can then interpolate the beliefs. In any given case the actions and desires will fit distinct, even radically distinct, beliefs, but knowing more desires and observing more actions should, by clever design, let us narrow the candidates.

In all these theories the problem of induction is a problem of decision, in which the question is which action to take, or which wager to accept. The pragmatic principle is given a precise formulation in the injunction to act so as to maximize expected utility, to perform that action, Ai, among the possible alternatives, that maximizes

U(Ai) = ∑j P(Sj | Ai)U(SjAi)

where the Sj are the possible consequences of the acts Ai, and U gives the utility of its argument.

6.2 On the value of evidence

One significant advantage of treating induction as a matter of utility maximization is that the cost of gathering more information, of adding to the evidence for an inductive inference, can be factored into the decision. Put very roughly, the leading idea is to look at gathering evidence as an action on its own. Suppose that you are facing a decision among acts Ai, and that you are concerned only about the occurrence or non-occurrence of a consequence S. The principle of utility maximization directs you to choose that act Ai that maximizes

U(Ai) = P(S | Ai)U(SAi)

Suppose further that you have the possibility of investigating to see if evidence E, for or against S, obtains. Assume further that this investigation is cost-free. Then should you investigate and find E to be true, utility maximization would direct you to choose that act Ai that maximizes utility when your beliefs are conditioned on E:

UE(Ai) = P(S | EAi)U(SEAi) + P(¬S | E∧Ai)U(¬S∧E∧Ai)

And if you investigate and find E to be false, the same principle directs you to choose Ai to maximize utility when your beliefs are conditioned on ¬E:

U¬E(Ai) = P(S | ¬EAi)U(S∧¬EAi)
        + PS | ¬EAi)US∧¬EAi)

Hence if your prior strength of belief in the evidence E is P(E), you should choose to maximize the weighted average

P(E)(UE(Ai) + PE)(U¬E(Ai)

and if the maximum of this weighted average exceeds the maximum of U(Ai) then you should investigate. About this, several brief remarks:

Notice that the utility of investigation depends upon your beliefs about your future beliefs and desires, namely that you believe now that following the investigation you will maximize utility and update your beliefs.

Investigation in the actual world is normally not cost-free. It may take time, trouble and money, and is sometimes dangerous. A general theory of epistemic utility should consider these factors.

I. J. Good (1967) proved that in the cost-free case U(Ai) can never exceed UE(Ai) and that when the utilities of outcomes are distinct the latter always exceeds the former (Skyrms 1990, chapter 4).

The question of bad evidence is critical. The evidence gathered might take you further from the truth. (Think of drawing a succession of red balls from an urn containing predominantly blacks.)


Academic Tools

sep man icon How to cite this entry.
sep man icon Preview the PDF version of this entry at the Friends of the SEP Society.
inpho icon Look up this entry topic at the Indiana Philosophy Ontology Project (InPhO).
phil papers icon Enhanced bibliography for this entry at PhilPapers, with links to its database.

Other Internet Resources

Related Entries

actualism | Bayes' Theorem | Carnap, Rudolf | conditionals | confirmation | epistemology: Bayesian | epistemology: evolutionary | epistemology: naturalized | epistemology: social | fictionalism: modal | Frege, Gottlob: logic, theorem, and foundations for arithmetic | Goodman, Nelson | Hempel, Carl | Hume, David | induction: new problem of | logic: inductive | logic: non-monotonic | memory | Mill, John Stuart | perception: epistemological problems of | Popper, Karl | probability, interpretations of | Ramsey, Frank | Reichenbach, Hans | testimony: epistemological problems of | Vienna Circle


The author would like to thanks to Patrick Maher for helpful comments, and the editors would like to thank Wolfgang Swarz for his suggestions for improvement.