## The Markov Condition

### 1. Factorization

When the probability distribution P over the variable set V satisfies the MC, the joint distribution factorizes in a very simple way. Let V = {X1X2, …, Xn}. Then

P(X1X2, …, Xn) = Πi P(Xi | PA(Xi)).

This is easily seen in the following way. Since we are assuming that the graph over V is acyclic, we may re-label the subscripts on the variables so that they are ordered from ‘earlier’ to ‘later’, with only earlier variables causing later ones. It follows from the probability calculus that P(X1X2, …, Xn) = P(X1) × P(X2 | X1) × … × P(Xn | X1X2, …,Xn−1). For each term P(Xi | X1X2, …, Xi−1), our ordering ensures that all of the parents of Xi will be included on the right hand side, and none of its descendants will. The MC then tells us that we can eliminate all of the terms from the right hand side except for the parents of Xi.

### 2. D-separation

The (MC) immediately implies that certain variables are conditionally independent of others. Further conditional independence relations will follow from the (MC) together with the probability calculus. In a complex graph, it will not always be easy to determine which conditional independence relations do and do not follow the (MC). Geiger (1987) and Verma and Pearl (1988) have developed a purely graphical criterion that is both necessary and sufficient for conditional independence to be a consequence of the (MC). (However, a probability measure that violates the Faithfulness Condition—discussed in Section 3.3—with respect to a given graph may include conditional independence relations that are not consequences of the (MC).) Let G be a directed acyclic graph over V. A path in G is a sequence of variables in V, ⟨X1, …, Xk⟩, such that for any two consecutive variables in the sequence Xi, Xi+1, there is either an arrow from Xi to Xi+1 or an arrow from Xi+1 to Xi. Such a path is said to be a path from X1 to Xk. A variable Xi on this path is said to be a collider just in case i ≠ 1, k and there are arrows from both Xi-1 and Xi+1 into Xi. Intuitively, Xi is a collider just in case the arrows converge on Xi in the path. For any two variables X, Y in V and any subset Z of V, we define the relation of d-separation as follows:

(d-sep) Z d-separates X and Y just in case every path ⟨X = X1, …, Xk = Y⟩ from X to Y contains at least one variable Xi such that either:

1. Xi is a collider, and no descendant of Xi (including Xi itself) is in Z; or
2. Xi is not a collider, and Xi is in Z.

Then:

(MC) entails that X and Y are probabilistically independent conditional upon Z just in case Z d-separates X and Y in G.