This is a file in the archives of the Stanford Encyclopedia of Philosophy. 
version history

Stanford Encyclopedia of PhilosophyA  B  C  D  E  F  G  H  I  J  K  L  M  N  O  P  Q  R  S  T  U  V  W  X  Y  Z

last substantive content change

Suggested Readings: Hume (1748), especially section VII.
The problem of imperfect regularities does not tell decisively against the regularity approach to causation. Successors of Hume, especially John Stuart Mill and John Mackie, have attempted to offer more refined accounts of the regularities that underwrite causal relations. Mackie introduced the notion of an inus condition: an inus condition for some effect is an insufficient but nonredundant part of an unnecessary but sufficient condition. Suppose, for example, that a lit match causes a forest fire. The lighting of the match, by itself, is not sufficient; many matches are lit without ensuing forest fires. The lit match is, however, a part of some constellation of conditions that are jointly sufficient for the fire. Moreover, given that this set of conditions occurred, rather than some other set sufficient for fire, the lighting of the match was necessary: fires do not occur in such circumstances when lit matches are not present.
There are, however, disadvantages to this type of approach. The regularities upon which a causal claim rest now turn out to be much more complicated then we had previously realized. In particular, this complexity raises problems for the epistemology of causation. One appeal of Hume's regularity theory is that it seems to provide a straightforward account of how we come to know what causes what: we learn that A causes B by observing that As are invariably followed by Bs. Consider again the case of smoking and lung cancer: on the basis of what evidence do we believe that the one is a cause of the other? It is not that all smokers develop lung cancer, for we do not observe this to be true. But neither have we observed some constellation of conditions C, such that smoking is invariably followed by lung cancer in the presence of C, while lung cancer never occurs in nonsmokers meeting condition C. Rather, what we observe is that smokers develop lung cancer at much higher rates than nonsmokers; this is the prima facie evidence that leads us to think that smoking causes lung cancer. This fits very nicely with the probabilistic approach to causation.
As we shall see in Section 3.2 below, however, the basic idea that causes raise the probability of their effects has to be qualified in a number of ways. By the time these qualifications are added, it appears that probabilistic theories of causation have to make a move that is quite analogous to Mackie's appeal to constellations of background conditions. Thus it is not clear that the problem of imperfect regulaties, by itself, offers any real reason to prefer probabilistic approaches to causation over regularity approaches.
Suggested Readings: Refined versions of the regularity analysis are found in Mill (1843), Volume I, chapter V, and in Mackie (1974), chapter 3. The introduction of Suppes (1970) presses the problem of imperfect regularities.
Many philosophers find the idea of indeterministic causation counterintuitive. Indeed, the word “causality” is sometimes used as a synonym for determinism. A strong case for indeterministic causation can be made by considering the epistemic warrant for causal claims. There is now very strong empirical evidence that smoking causes lung cancer. Yet the question of whether there is a deterministic relationship between smoking and lung cancer is wide open. The formation of cancer cells depends upon mutation, which is a strong candidate for being an indeterministic process. Moreover, whether an individual smoker develops lung cancer or not depends upon a host of additional factors, such as whether or not she is hit by a bus before cancer cells begin to form. Thus the price of preserving the intuition that causation presupposes determinism is agnosticism about even our best supported causal claims.
Since probabilistic theories of causation require only that a cause raise the probability of its effect, these theories are compatible with indeterminism. This seems to be a potential advantage over regularity theories. It is unclear, however, to what extent this potential advantage is actual. In the realm of microphysics, where we have strong (but still contestable) evidence of indeterminism, our ordinary causal notions do not easily apply. This is brought out especially clearly in the famous Einstein, Podolski and Rosen thought experiment. On the other hand, it is unclear to what extent quantum indeterminism ‘percolates up’ to the macroworld of smokers and cancer victims, where we do seem to have some clear causal intuitions.
Suggested Readings: Humphreys (1989), contains a sensitive treatment of issues involving indeterminism and causation; see especially sections 10 and 11. Earman (1986) is a thorough treatment of issues of determinism in physics.
Some proponents of probabilistic theories of causation follow Hume in identifying causal direction with temporal direction. Others have attempted to use the resources of probability theory to articulate a substantive account of the asymmetry of causation, with mixed success. We will discuss these proposals at greater length in Section 3.3 below.
Suggested Readings: Hausman (1998) contains a detailed discussion of issues involving the asymmetry of causation. Mackie (1974), chapter 3, shows how the problem of asymmetry can arise for his inus condition theory. Lewis (1986) contains a very brief but clear statement of the problem of asymmetry.
Figure 1
The ability to handle such spurious correlations is probably the greatest success of probabilistic theories of causation, and remains a major source of attraction for such theories. We will discuss this issue in greater detail in Section 3.2 below.
Suggested Readings: Mackie (1974), chapter 3, shows how the problem of spurious regularities can arise for his inus condition theory. Lewis (1986) contains a very brief but clear statement of the problem of spurious regularities.
Suggested Readings: Mill (1843) contains the classic discussion of “the cause” and “a cause.” Bennett (1988) is an excellent discussion of facts and events.
P(B  A) = P(A & B)/P(A).As an illustration, suppose that we toss a fair die. Let A represent the die's landing with an even number (2, 4 or 6) showing on the topmost face. Then P(A) is onehalf. Let B represent the die's landing with a prime number (2, 3 or 5) showing on the topmost face (on that same roll). Then P(B) is also onehalf. Now the conditional probability P(B  A) is onethird. It is the probability that the number on the die is both even and prime, i.e., that the number is 2, divided by the probability that the number is even. The numerator is onesixth, and the denominator is onehalf; hence that conditional probability is onethird. The concept of conditional probability does not have any notion of temporal or causal order built into it. Suppose, for example, that the die is rolled twice. It makes sense to ask about the probability that the first roll is a prime number, given that the first roll is even; the probability that the second roll is a prime number, given that the first roll is even; and the probability that the first roll is a prime number, given that the second roll is even.
If P(A) is 0, then the ratio in the definition of conditional probability is undefined. There are, however, other technical developments that will allow us to define P(B  A) when P(A) is 0. The simplest is simply to take conditional probability as a primitive, and to define unconditional probability as probability conditional on a tautology.
One natural way of understanding the idea that A raises the probability of B is that P(B  A) > P(B  notA). Thus a first attempt at a probabilistic theory of causation would be:
PR: A causes B if and only if P(B  A) > P(B  notA).This formulation is labeled PR for “ProbabilityRaising.” When P(A) is strictly between 0 and 1, the inequality in PR turns out to be equivalent to P(B  A) > P(A) and also to P(A & B) > P(A)P(B). When this last relation holds, A and B are said to be positively correlated. If the inequality is reversed, they are negatively correlated. If A and B are either positively or negatively correlated, they are said to be probabilistically dependent. If equality holds, then A and B are probabilistically independent or uncorrelated.
PR addresses the problems of imperfect regularities and indeterminism, discussed above. But it does not address the other two problems discussed in section 1 above. First, probabilityraising is symmetric: if P(B  A) > P(B  notA), then P(A  B) > P(A  notB). The causal relation, however, is typically asymmetric.
Figure 2
Second, PR has trouble with spurious correlations. If A and B are both caused by some third factor, C, then it may be that P(B  A) > P(B  notA) even though A does not cause B. This situation is shown schematically in Figure 2. For example, let A be an individual's having yellowstained fingers, and B that individual's having lung cancer. Then we would expect that P(B  A) > P(B  notA). The reason that those with yellowstained fingers are more likely to suffer from lung cancer is that smoking tends to produce both effects. Because individuals with yellowstained fingers are more likely to be smokers, they are also more likely to suffer from lung cancer. Intuitively, the way to address this problem is to require that causes raise the probabilities of their effects ceteris paribus. The history of probabilistic causation is to a large extent a history of attempts to resolve these two central problems.
Suggested Readings: For a primer on basic probability theory, see the entry for “probability calculus: interpretations of.” This entry also contains a discussion of the intperpretation of probability claims.
NSO: Factor A occurring at time t, is a cause of the later factor B if and only if:We will call this the NSO, or ‘No Screening Off’ formulation. Suppose, as in our example above, that smoking (C) causes both yellowstained fingers (A) and lung cancer (B). Then smoking will screen yellowstained fingers off from lung cancer: given that an individual smokes, his yellowstained fingers have no impact upon his probability of developing lung cancer.
 P(B  A) > P(B  notA)
 There is no factor C, occurring earlier than or simultaneously with A, that screens A off from B.
The second condition of NSO does not suffice to resolve the problem of spurious correlations, however. This condition was added to eliminate cases where spurious correlations give rise to factors that raise the probability of other factors without causing them. Spurious correlations can also give rise to cases where a cause does not raise the probability of its effect. So genuine causes need not satisfy the first condition of NSO. Suppose, for example, that smoking is highly correlated with exercise: those who smoke are much more likely to exercise as well. Smoking is a cause of heart disease, but suppose that exercise is an even stronger preventative of heart disease. Then it may be that smokers are, over all, less likely to suffer from heart disease than nonsmokers. That is, letting A represent smoking, C exercise, and B heart disease, P(B  A) < P(B  notA). Note, however, that if we conditionalize on whether one exercises or not, this inequality is reversed: P(B  A & C) > P(B  notA & C), and P(B  A & notC) > P(B  notA & notC). Such reversals of probabilistic inequalities are instances of “Simpson's Paradox.”
The next step is to replace conditions 1 and 2 with the requirement that causes must raise the probability of their effects in test situations:
TS: A causes B if P(B  A & T) > P(B  notA & T) for every test situation T.A test situation is a conjunction of factors. When such a conjunction of factors is conditioned on, those factors are said to be “held fixed.” To specify what the test situations will be, then, we must specify what factors are to be held fixed. In the previous example, we saw that the true causal relevance of smoking for lung cancer was revealed when we held exercise fixed, either positively (conditioning on C) or negatively (conditioning on notC). This suggests that in evaluating the causal relevance of A for B, we need to hold fixed other causes of B, either positively or negatively. This suggestion is not entirely correct, however. Let A and B be smoking and lung cancer, respectively. Suppose C is a causal intermediary, say the presence of tar in the lungs. If A causes B exclusively via C, then C will screen A off from B: given the presence (absence) of carcinogens in the lungs, the probability of lung cancer is not affected by whether those carcinogens got there by smoking (are absent despite smoking). Thus we will not want to hold fixed any causes of B that are themselves caused by A. Let us call the set of all factors that are causes of B, but are not caused by A, the set of independent causes of B. A test situation for A and B will then be a maximal conjunction, each of whose conjuncts is either an independent cause of B, or the negation of an independent cause of B.
Note that the specification of factors that need to be held fixed appeals to causal relations. This appears to rob the theory of its status as a reductive analysis of causation. We will see in Section 6.4 below, however, that the issue is substantially more complex than that. In any event, even if there is no reduction of causation to probability, a theory detailing the systematic connections between causation and probability would be of great philosophical interest.
The move from the basic idea of PR to the complex formulation of TS is rather like the move from Hume's original regularity theory to Mackie's theory of inus conditions. In both cases, the move substantially complicates the epistemology of causation. In order to know whether A is a cause of B, we need to know what happens in the presence and absence of B, while holding fixed a complicated conjunction of further factors. The hope that a probabilistic theory of causation would enable us to handle the problem of imperfect regularities without appealing to such constellations of background conditions seems not to have been borne out. Nonetheless, TS does seem to provide us with a theory that is compatible with indeterminism and that can distinguish causation from spurious correlation.
TS can be generalized in at least two important ways. First, we can define a ‘negative cause’ or ‘preventer’ or ‘inhibitor’ as a factor that lowers the probability of its ‘effect’ in all test situations, and a ‘mixed’ or ‘interacting’ cause as one that affects the probability of its ‘effect’ in different ways in different test situations. It should be apparent that when constructing test situations for A and B one should also hold fixed preventers and mixed causes of B that are independent of A. Generalizing even further, one could define causal relationships between variables that are nonbinary, such as caloric intake and blood pressure. In evaluating the causal relevance of X for Y, we will need to hold fixed the values of variables that are independently causally relevant to Y. In principle, there are infinitely many ways in which one variable might depend probabilistically on another, even holding fixed some particular test situation. Thus, once the theory is generalized to include nonbinary variables, it will not be possible to provide any neat classification of causal factors into causes and preventers.
These two generalizations bring out an important distinction. It is one thing to ask whether A is causally relevant to B in some way; it is another to ask in which way is A causally relevant to B. To say that A causes B is then potentially ambiguous: it might mean that A is causally relevant to B in some way or other; or it could mean that A is causally relevant for B in a particular way, that A promotes B or is a positive factor for the occurrence of B. For example, if A prevents B, then A will count as a cause of B in the first sense, but not in the second. Probabilistic theories of causation can be used to answer both types of question. A is causally relevant to B if A makes some difference for the probability of B in some test situation; whereas A is a positive or promoting cause of B if A raises the probability of B in all test situations.
The problem of spurious correlations also plagues certain versions of decision theory. This can happen when one's choice of action is symptomatic of certain good or bad outcomes, without causing those outcomes. (The bestknown example of this sort is Newcomb's Problem.) In cases like this, some versions of decision theory appear to recommend that one act so as to receive good news about events beyond one's control, rather than act so as to bring about desirable events that are within one's control. In response, many decision theorists have advocated versions of causal decision theory. Some versions closely resemble TS.
Suggested Readings: This section more or less follows the main developments in the history of probabilistic theories of causation. Versions of the NSO theory are found in Reichenbach (1956, section 23), and Suppes (1970, chapter 2). Good (1961, 1962) is an early essay on probabilistic causation that is rich in insights, but has had surprisingly little influence on the formulation of later theories. Salmon (1980) is an influential critique of these theories. The first versions of TS were presented in Cartwright (1979) and Skyrms (1980). Eells (1991, chapters 2, 3, and 4) and Hitchcock (1993) carry out the two generalizations of TS described. Skyrms (1980) presents a version of causal decision theory that is very similar to TS. See also the entry for “decision theory: causal.”
Some defenders of manipulability or agency theories of causation have argued that the necessary asymmetry is provided by our perspective as agents. In assessing whether A is a cause of B, we must ask whether A increases the probability of B, where the relevant conditional probabilities are agent probabilities: the probabilities that B would have were A (or notA) to be realized by the choice of a free agent. Critics have wondered just what these agent probabilities are.
Other approaches attempt to locate the asymmetry between cause and effect within the structure of the probabilities themselves. One very simple proposal would be to refine the way in which the test situations are constructed. (See the previous section for discussion of test situations.) In evaluating whether A is a cause of B, we should hold fixed not only the independent causes of B, but also the causes of A. Thus if B is a cause of A, rather than vice versa, A will not raise the probability of B in the appropriate test situation, since the presence or absence of B will already be held fixed. This idea is built into the Causal Markov Condition discussed in Section 5 below. Proponents of traditional probabilistic theories of causation have not adopted this strategy. This may be because they feel that this refinement would take the theory too close to vicious circularity: in order to assess whether A causes B, we would need to know already whether B causes A.
A more ambitious approach to the problem of causal asymmetry is due to Hans Reichenbach. Suppose that factors A and B are positively correlated:
1. P(A & B) > P(A)P(B)It is easy to see that this will hold exactly when A raises the probability of B and vice versa. Suppose, moreover, that there is some factor C having the following properties:
2. P(A & B  C) = P(A  C)P(B  C)In this case, the trio ACB is said to form a conjunctive fork. Conditions 2 and 3 stipulate that C and notC screen off A from B. As we have seen, this sometimes occurs when C is a common cause of A and B. Conditions 2 through 5 entail 1, so in some sense C explains the correlation between A and B. If C occurs earlier than A and B, and there is no event satisfying 2 through 5 that occurs later than A and B, then ACB is said to form a conjunctive fork open to the future. Analogously, if there is a future factor satisfying 2 through 5, but no past factor, we have a conjunctive fork open to the past. If a past factor C and a future factor D both satisfy 2 through 5, then ACBD forms a closed fork. Reichenbach's proposal was that the direction from cause to effect is the direction in which open forks predominate. In our world, there are many forks open to the future, few or none open to the past. This proposal is closely related to Reichenbach's Common Cause Principle, which says that if A and B are positively correlated (i.e., they satisfy condition 1), then there exists a C, which is a cause of both A and B, and which screens them off from each other. (By contrast, common effects do not in general screen off their causes.)3. P(A & B  notC) = P(A  notC)P(B  notC)
4. P(A  C) > P(A  notC)
5. P(B  C) > P(B  notC).
It is not clear, however, that this asymmetry between forks open to the past and forks open to the future will be as pervasive as this proposal seems to presuppose. In quantum mechanics, there are correlated effects that are believed to have no common cause that screens them off. Moreover, if ACB forms a conjunctive fork in which C precedes A and B, but C has a deterministic effect D which occurs after A and B, then ACBD will form a closed fork. A further difficulty with this proposal is that since it provides a global ordering of causes and effects, it seems to rule out a priori the possibility that some effects might precede their causes. More complex attempts to derive the direction of causation from probabilities have been offered; the issues here intersect with the problem of reduction, discussed in Section 6.4 below.
Suggested Readings: Suppes (1970, chapter 2) and Eells (1991, chapter 5) define causal asymmetry in terms of temporal asymmetry. Price (1991) defends an account of causal asymmetry in terms of agent probabilities; see also the entry for “causation and manipulation.” Reichenbach's proposal is presented in his (1956, chapter IV). Some difficulties with this proposal are discussed in Arntzenius (1993); see also his entry to this encylopedia under “physics: Reichenbach's common cause principle.” Papineau (1993) is a good overall discussion of the problem of causal asymmetry within probabilistic theories. Hausman (1998) is a detailed study of the problem of causal asymmetry.
Causal dependence, as defined in the previous paragraph, is sufficient, but not necessary, for causation. Causation is defined to be the ancestral of causal dependence; that is, A causes B just in case there is a sequence of events C_{1}, C_{2}, …, C_{n}, such that C_{1} causally depends upon A, C_{2} causally depends upon C_{1}, …, B causally depends upon C_{n}. This modification guarantees that causation will be transitive: if A causes C, and C causes B, then A causes B. This modification is also useful in addressing certain problems discussed in Section 6.2 below.
Proponents of counterfactual theories of causation attempt to derive the asymmetry of causation from a corresponding asymmetry in the truth values of counterfactuals. For instance, it may be true that if Mary had not smoked, she would have been less likely to develop lung cancer, but we would not normally agree that if Mary had not developed lung cancer, she would have been less likely to smoke. Ordinary counterfactuals do not ‘backtrack’ from effects to causes. This proscription against backtracking also solves the problem of spurious correlations: we would not say that if the column of mecury had not risen, then the drop in atmospheric pressure would have been less likely, and so the storm would have been less likely as well.
One important question is whether the counterfactuals that appear in the analysis of causation can be characterized without reference to causation. In order to do this, one would have to say what makes some worlds closer than others without making reference to any causal notions. Despite some interesting attempts, it is not clear whether this can be done. If not, then it will not be possible to provide a reductive PC analysis of causation, although it may still be possible to articulate interesting interconnections between causation, probability and counterfactuals.
The Philosopher Igal Kvart has been a persistent critic of the claim that it is possible to analyze counterfactuals without using causation. He has developed a probabilistic theory of singular causation that does not use counterfactuals. Nonetheless, his theory has a number of features in common with counterfactual theories: it is an attempt to analyze singular causation among events; it elaborates on the basic probabilityraising idea in an attempt to avoid some of the problems raised in Section 6.2 below; and it aspires to be a reductive analysis of causation, making no reference to causal relations in the analysans.
Suggested Readings: Lewis (1986a) is the locus classicus for PC. Lewis (1986b) is an attempt to explicate the notion of proximity among possible worlds. Recent attempts to analyze causation in terms of probabilistic counterfactuals have become quite intricate; see for example Noordhof (1999). For further discussion of counterfactual theories of causation, see the entry under “causation, counterfactual theories.” For Kvart's theory, see for example Kvart (1997).
Our concern here will not be with the efficacy of these methods of causal inference, but rather with their philosophical underpinnings. We will here follow the developments of SGS, as these bear a stronger resemblance to the probabilistic theories of causation described in Section 3 above. (Pearl's approach, at least in its more recent development, bears a stronger connection to counterfactual approaches.)
Suggested Readings: Pearl (2000) and Spirtes, Glymour and Scheines (2000) are the most detailed presentations of the two research programs discussed. Both works are quite technical, although the epilogue of Pearl (2000) provides a very readable historical introduction to Pearl's work. Pearl (1999) also contains a reasonably accessible introduction to some of Pearl's more recent developments. Scheines (1997) is a nontechnical introduction to some of the ideas in SGS (2000). McKim and Turner (1997) is a collection of papers on causal modeling, including some important critiques of SGS.
The directed acyclic graph G over V may be related to the probability distribution in a number of ways. One important condition that the two might satisfy is the socalled Markov Condition:
MC: For every X in V, and every set Y of variables in V \ DE(X), P(X  PA(X) & Y) = P(X  PA(X)); where DE(X) is the set of descendants of X, and PA(X) is the set of parents of X.The notation needs a little clarification. Consider, for example, the first term in the equality. Since X is a variable, it doesn't really make sense to talk about the probability of X, or of the conditional probability of X. It makes sense to talk about the probability of having an income of $40,000 per year (at least if we are talking about members of some welldefined population), but it makes no sense to talk about the probability of "income". (Note that we do not mean here the probability of having some income or other. That probability is one, assuming we allow zero to count as a value of income.) This formulation of MC uses a common notational convention. Whenever a variable, or set of variables appears, there is a tacit universal quantifier ranging over values of the variable(s) in question. Thus MC should be understood as asserting an equality between two conditional probabilities that holds for all values of the variable X, and for all values of the variables in Y and PA(X). In words, the Markov condition says that the parents of X screen X off from all other variables, except for the descendents of X. Given the values of the variables that are parents of X, the values of the variables in Y (which includes no descendents of X), make no further difference to the probability that X will take on any given value.
As stated, the Markov Condition describes a purely formal relation between abstract entities. Suppose, however, that we give the graph and probability distribution empirical interpretations. The graph will represent the causal relationships among the variables in a population, and the probability distribution will represent the empirical probability that an individual in the population will possess certain values of the relevant variables. When the directed graph is given a causal interpretation, it is called a causal graph. We will return shortly to the question of what, exactly, the arrows in a causal graph represent.
The Causal Markov Condition (CMC) asserts that MC holds of a population when the directed graph and probability distribution are given these interpretations. CMC does not hold in general, but only when certain further conditions are satisfied. For instance, V must include all common causes of variables that are included in V. Suppose, for example, that V = {X, Y}, that neither variable is a cause of the other, and that Z is a common cause of X and Y (the true causal structure is shown in Figure 3 below). The correct causal graph on V will include no arrows, since neither X nor Y cause the other. But X and Y will be probabilistically correlated, because of the underlying common cause. This is a violation of CMC. Since the correct causal graph on {X, Y} has no arrows, X has no parents or descendents; thus CMC entails that P(X  Y) = P(X). This equality is false, since X and Y are in fact correlated. CMC can also fail for certain types of heterogeneous populations composed of subpopulations with differenct causal structures. And CMC will fail for certain quantum systems. One area of controversy concerns the extent to which actual populations satisfy CMC with respect to the sorts of variable sets that are typically employed in empirical investigations. For purposes of further discussion, we will assume that CMC holds.
Figure 3
The Causal Markov Condition is a generalization of Reichenbach's Common Cause Principle, discussed in Section 3.3 above. Here are a few illustrations of how it works.
Figure 4
In Figures 3 and 4, CMC entails that the values of Z screen off the values of X from the values of Y.
Figure 5
Figure 6
In Figures 5 and 6, CMC again entails that the values of Z screen off the values of X from the values of Y. However, CMC does not entail that the values of W screen off the values of X from the values of Y in Figure 5, whereas it does entail that the values of W screen off the values of X from the values of Y in Figure 6. This shows that being a common cause of X and Y is neither necessary nor sufficient for screening off the values of those variables.
Figure 7
In Figure 7, both Z and W are common causes of X and Y, yet CMC does not entail that either one of them, by itself, suffices to screen off the values of X and Y. This seems reasonable: if we hold fixed the value of Z, we should expect X and Y to remain correlated due to the action of W. CMC does entail that Z and W jointly screen off X and Y; that is, when we condition on the values of Z and W, there will be no residual correlation between X and Y.
A second important relation between a directed graph and probability distribution is the Minimality Condition. Suppose that the directed graph G on variable set V satisfies the Markov condition with respect to the probability distribution P. The Minimality Condition asserts that no subgraph of G over V also satisfies the Markov Condition with respect to P. The Causal Minimality Condition asserts that the Minimality Condition holds when G and P are given their empirical interpretations. As an illustration, consider the variable set {X, Y}, let there be an arrow from X to Y, and suppose that X and Y are probabilistically independent of each other in P. This graph would satisfy the Markov Condition with respect to P: none of the independence relations mandated by MC are absent (in fact, MC mandates no independence relations). But this graph would violate the Minimality Condition with respect to P, since the subgraph that omits the arrow from X to Y would also satisfy the Markov Condition.
Suggested Readings: Spirtes, Glymour and Scheines (2000) and Scheines (1997). Hausman and Woodward (1999) provide a detailed discussion of the Causal Markov Condition.
P(Y = y  X = x) P(Y = y  X = x).This says nothing about how X bears on Y. Suppose for example, that we have a three variable model, including the variables smoking, exercise, and heart disease. The causal graph would (presumably) include an arrow from smoking to heart disease, and an arrow from exercise to heart disease. Nothing in the graph indicates that increased levels of smoking increase the risk and severity of heart disease, whereas increased levels of exercise (up to a point, anyway) decrease the risk and severity of heart disease. Thus an arrows in a causal graph indicates only that one variable is causally relevant to another, and says nothing about the way in which it is relevant (whether it is a promoting, inhibiting, or interacting cause, or stands in some more complex relation).
Figure 8
Consider Figure 8. Note that it differs from Figure 4 in that there is an additional arrow running directly fron X to Y. What does this arrow from X to Y indicate? It does not merely indicate that X is causally relevant to Y; in Figure 4, it is natural to expect that X will relevant to Y via its effect on Z. Applying the Causal Markov and Minimality Conditions, the arrow from X to Y indicates that Y is probabilistically dependent on X, even when we hold fixed the value of Z. That is, X makes a probabilistic difference for Y, over and above the difference it makes in virtue of its effect on Z. Figure 8 thus indicates that X has an effect on Y via two different routes: one route that runs through the variable Z and the other route which is direct, i.e., unmediated by any other variable in V. As an illustration, consider a wellknown example due to Germund Hesslow. Consumption of birth control pills (X) is a risk factor for thrombosis (Y). On the other hand, birth control pills are an effective preventer of pregnancy (Z), which is in turn a powerful risk factor for thrombosis. The use of birth control pills may thus affect one's chances of suffering from thrombosis in two different ways, one 'direct', and one via the effect of pills on one's chances of becoming pregnant. Whether birth control pills raise or lower the probability of thrombosis overall will depend upon the relative strengths of these two routes. The probabilistic theories of causation described in Section 3 above are suited to analyze the total or net effect of one factor or variable on other, whereas the causal modeling techniques discussed in this section are primarily geared toward decomposing a causal system into individual routes of causal influence.
Suggested Readings: The birth control pill example was originally presented in Hesslow (1976). Hitchcock (2001a) discusses the distinction between total or net effect, and causal influence along individual routes.
Figure 9
The Faithfulness Condition implies that the causal influences of one variable on another along multiple causal routes does not ‘cancel’. For example, suppose that Figure 8 correctly represents the underlying causal structure. Then the Faithfulness Condition implies that X and Y cannot be unconditionally independent of one another in the empirical distribution. In Hesslow's example, this means that the tendency of birth control pills to cause thrombosis along the direct route cannot be exactly canceled by the tendency of birth control pills to prevent thrombosis by preventing pregnancy. This ‘no canceling’ condition seems implausible as a metaphysical or conceptual constraint upon the connection between causation and probabilities. Why can't competing causal paths cancel one another out? Indeed, Newtonian physics provides us with an example: the downward force on my body due to gravity triggers an equal and opposite upward force on my body from the floor. My body responds as if neither force were acting upon it. The Faithfulness Condition seems rather to be a methodological principle. Given a distribution on {X, Y, Z} in which X and Y are independent, we should infer that the causal structure is that depicted in Figure 9, rather than Figure 8. This is not because Figure 8 is conclusively ruled out by the distribution, but rather because it is gratuitously complex: it postulates causal connections that are not necessary to explain the underlying pattern of probabilistic dependencies. The Faithfulness Condition is thus a formal version of Ockham's razor.
SGS use the Causal Markov, Minimality, and Faithfulness Conditions to prove a variety of statistical indistinguishability theorems. These theorems tell us when two distinct causal structures can or cannot be distinguished on the basis of the probability distributions to which they give rise. We will return to this issue in Section 6.4 below.
Suggested Readings: Spirtes, Glymour and Scheines (2000) and Scheines (1997).
This line of objection is surely right about our ordinary use of causal language. It is nonetheless open to the defender of contextunanimity to respond that she is interested in supplying a precise concept to replace the vague notion of causation that corresponds to our everyday usage. In a population consisting of individuals lacking the gene, smoking causes lung cancer. In a population consisting entirely of individuals who possess the gene, smoking prevents lung cancer.
Note that this dispute only arises in the context of a heterogeneous population. Restricting ourselves to one particular test situation, both parties can agree that smoking causes lung cancer in that test population just in case it increases the probability of lung cancer in that test situation.
One's position in this debate will depend, in part, on how one wants to use general causal claims such as “smoking causes lung cancer”. If one conceives of them as causal laws, then the contextualunanimity requirement may seem attractive. If “smoking causes lung cancer” is a kind of law, then its truth should not be contingent upon the scarcity of the gene that reverses the effects of smoking. By contrast, one may understand the causal claim in a more practical way, by treating it as a kind of policyguiding principle. Since the gene in question is very rare, it would still be rational for public health organizations to promote policies that would reduce the incidence of smoking.
Suggested Readings: Dupré; (1984) presents this challenge to the contextunanimity requirement, and offers an alternative. Eells (1991, chapters 1 and 2), defends contextunanimity using the idea that causal claims are made relative to a population. Hitchock (2001b) contains further discussion and develops the idea of treating general causal claims as policyguiding principles.
A different sort of counterexample involves causal preemption. Suppose that an assassin puts a weak poison in the king's drink, resulting in a 30% chance of death. The king drinks the poison and dies. If the assassin had not poisoned the drink, her associate would have spiked the drink with an even deadlier elixir (70% chance of death). In the example, the assassin caused the king to die by poisoning his drink, even though she lowered his chance of death (from 70% to 30%). Here the cause lowered the probability of death, because it preempted an even stronger cause.
One approach to this problem, built into the counterfactual approach described in Section 4 above, is to invoke the principle of the transitivity of causation. The assassin's action increased the probability of, and hence caused, the presence of weak poison in the king's drink. The presence of weak poison in the king's drink raised the probability of, and hence caused, the king's death. (By this time, it is already determined that the associate will not poison the drink.) By transitivity, the assassin's action caused the king's death. The claim that causation is transitive is highly controversial, however, and is subject to many persuasive counterexamples.
Another approach would be to invoke a distinction introduced in Section 5.3 above. The assassin's action affects the king's chances of death in two distinct ways: first, it introduces the weak poison into the king's drink; second, it prevents the introduction of a stronger poison. The net effect is to reduce the king's chance of death. Nonetheless, we can isolate the first of these effects (which would be indicated by an arrow in a causal graph). We do this by holding fixed the inaction of the associate: given that the associate did not in fact poison the drink, the assassin's action increased the king's chance of death (from near zero to .3). We count the assassin's action as a cause of death because it increased the chance of death along one of the routes connecting the two events.
For a counterexample of the second type, suppose that two gunmen shoot at a target. Each has a certain probability of hitting, and a certain probability of missing. Assume that none of the probabilities are one or zero. As a matter of fact, the first gunman hits, and the second gunman misses. Nonetheless, the second gunman did fire, and by firing, increased the probability that the target would be hit, which it was. While it is obviously wrong to say that the second gunman's shot caused the target to be hit, it would seem that a probabilistic theory of causation is committed to this consequence. A natural approach to this problem would be to try to combine the probabilistic theory of causation with a requirement of spatiotemporal connection between cause and effect, although it is not at all clear how this hybrid theory would work.
Suggested Readings: The example of the golf ball, due to Deborah Rosen, is first presented in Suppes (1970) Salmon (1980) presents several examples of probabilitylowering causes. Hitchcock (1995) presents a response. Lewis (1986a) discusses cases of preemption, see also the entry for “causation: counterfactual theories.” Hithcock (2001a) presents the solution in terms of decomposition into component causal routes. Woodward (1990) describes the structure that is instantiated in the example of the two gunmen. Humphreys (1989, section 14) responds. Menzies (1989, 1996) discusses examples involving causal preemption where noncauses raise the probabilities of noneffects. Hitchcock (2002) provides a general discussion of these counterexamples. For a discussion of attempts to analyze cause and effect in terms of contiguous processes, see the entry for “causation: causal processes.”
Suggested Readings: The need for distinct theories of singular and general causation is defended in Good (1961, 1962), Sober (1985), and Eells (1991, introduction and chapter 6). Eells (1991, chapter 6) offers a distinct probabilistic theory of singular causation in terms of the temporal evolution of probabilities. Carroll (1991) and Hitchcock (1995) offer two quite different lines of response. Hitchcock (2001b) argues that there are really (at least) two different distinctions at work here.
The most important work along these lines has been carried out by Spirtes, Glymour and Scheines. Rather than report on the details of their results, we present here a more generalized discussion. Suppose that a set of factors, and a system of causal relations among those factors is given: call this the causal structure CS. Let T be a theory connecting causal relations among factors with probabilistic relations among factors. Then the causal structure CS will be probabilistically distinguishable relative to T, if for every assignment of probabilities to the factors in CS that is compatible with CS and T, CS is the unique causal structure compatible with Tand those probabilities. (One could formulate a weaker sense of distinguishability by requiring that only some assignment of probabilities uniquely determines CS). Intuitively, T allows you to infer that the causal structure is in fact CS given the probability relations between factors. Given a probabilistic theory of causation T, it is possible to imagine many different properties it might have. Here are some possibilities:
Suggested Readings: The most detailed treatment of probabilistic distinguishability is given in Spirtes, Glymour and Scheines (2000); see especially chapter 4. Spirtes, Glymour and Scheines prove (theorem 4.6) a result along the lines of 3 for a theory that they propose. This work is very technical. An accessible presentation is contained in Papineau (1993), which defends a position along the lines of 4.
Christopher Hitchcock cricky@caltech.edu 
A  B  C  D  E  F  G  H  I  J  K  L  M  N  O  P  Q  R  S  T  U  V  W  X  Y  Z