Proof of the Non-Falsifying Refutation Theorem

Here again we explicitly treat the case where only condition-independence is assumed. If result-independence holds as well, all occurrences of ‘(ck−1·ek−1)’ may be dropped, which gives the results stated in the text. If neither independence condition holds, all occurrences of ‘ck·(ck−1·ek−1)’ are replaced by ‘cn·ek−1’ and occurrences of ‘b·ck−1’ are replaced by ‘b·cn’.

The proof of Convergence Theorem 2 requires the introduction of one more concept, that of the variance in the quality of information for a sequence of experiments or observations, VQI[cn | hi/hj | b]. The quality of the information QI from a specific outcome sequence en will vary somewhat from the expected quality of information for conditions cn. A common statistical measure of how widely individual values tend to vary from an expected value is given by the expected squared distance from the expected value, which is a quantity called the variance.

Definition: VQI — the Variance in the Quality of Information.
For hj outcome-compatible with hi on ck, define
VQI[ck | hi/hj | b·(ck−1·ek−1)]  =
u  (QI[oku | hi/hj | b·ck·(ck−1·ek−1)] − EQI[ck | hi/hj | b·(ck−1·ek−1)])2 · P[oku | hi·b·ck·(ck−1·ek−1)];

Next define
VQI[ck | hi/hj | b·ck−1] =

{ek−1}  VQI[ck | hi/hj | b·(ck−1·ek−1)] · P[ek−1 | hi·b·ck−1].

For a sequence cn of observations on which hj is outcome-compatible with hi, define
VQI[cn | hi/hj | b] =

{en} (QI[en | hi/hj | b·cn] − EQI[cn | hi/hj | b])2 · P[en | hi·b·cn].

Clearly VQI will be positive unless hi and hj agree on the likelihoods of all possible outcome sequences in the evidence stream, in which case both  EQI[cn | hi/hj | b] and VQI[cn | hi/hj | b] equal 0.

VQI[cn | hi/hj | b] does not generally decompose into the sum of the VQI for individual experiments or observations ck. However, when both independence conditions hold, the decomposition into the sum does follow.

Theorem: The VQI Decomposition Theorem for Independent Evidence:
Suppose both condition independence and result-independence hold. Then
VQI[cn | hi/hj | b]  =  Σk=1n VQI[ck | hi/hj | b].
For the Proof, we employ the following abbreviations:
 Q[ek] = QI[ek | hi/hj | b·ck] Q[ek] = QI[ek | hi/hj | b·ck] E[ck] = EQI[ck | hi/hj | b] E[ck] = EQI[ck | hi/hj | b] V[ck] = VQI[ck | hi/hj | b] V[ck] = VQI[ck | hi/hj | b]

The equation stated by the theorem may be derived as follows:

V[cn]
 = ∑{en} (Q[en] − E[cn])2 · P[en | hi·b·cn] = ∑{en} ((Q[en]+Q[en−1]) − (E[cn]+E[cn−1]))2 · P[en | hi·b·cn]·P[e n−1 | hi·b·cn−1] = ∑{en−1} ∑{en} ((Q[en]−E[cn]) + (Q[en−1]−E[cn−1]))2 ·   P[en | hi·b·cn]·P[e n−1 | hi·b·cn−1] = ∑{en−1} ∑{en} ( (Q[en]−E[cn])2 + (Q[en−1]−E[cn−1])2   +    2·(Q[en]−E[cn])·(Q[en−1]−E[c n−1]) )  · P[en | hi·b·cn]·P[e n−1 | hi·b·cn−1] = V[cn] + V[cn−1] +    2·∑{en−1} ∑{en}(Q[en]·Q[en−1] − Q[en]·E[cn−1] − E[cn]·Q[en−1] +     E[cn]·E[cn−1]) · P[en | hi·b·cn]·P[e n−1 | hi·b·cn−1] = V[cn] + V[cn−1] +  2 · (E[cn]·E[cn−1] − E[cn]·E[cn−1] − E[cn]·E[cn−1] + E[cn]·E[cn−1]) = V[cn] + V[cn−1] = … = Σk=1n VQI[ck | hi/hj | b].

By averaging the values of VQI[cn | hi/hj | b] over the number of observations n we obtain a measure of the average variance in the quality of the information due to cn. We represent this average by underlining ‘VQI’.

Definition: The Average Variance in the Quality of Information
VQI[cn | hi/hj | b] = VQI[cn | hi/hj | b] ÷ n.

VQI is only a true average, a sum of n terms divided by n, when the independent evidence conditions hold. But our definition here does not presuppose independence, and the notion of “averaging” VQI, VQI, by dividing by the number of experiments and observations turns out to be useful even when the evidence is not independent.

We are now in a position to state a very general version of the second part of the Likelihood Ratio Convergence Theorem. It applies to all evidence streams not containing possibly falsifying outcomes for hj. That is, it applies to all evidence streams for which hj is outcome-compatible with hi on each ck in the stream. This theorem is essentially a specialized version of Chebyshev's Theorem, which is a so-called Weak Law of Large Numbers. This version of the theorem presupposes neither of the independence conditions.

Theorem 2*: Non-falsifying Likelihood Ratio Convergence Theorem
Choose positive ε < 1, as small as you like, but large enough that (for the number of observations n being contemplated) the value of EQI[c | hi/hj | hi·b]  >  −(log ε)/n. Then
P[ {en : P[en | hj·b·cn]/P[en | hi·b·cn] < ε} | hi·b·cn]   ≥
 1 − 1—n · VQI[cn | hi/hj | b] ————————————— EQI[cn | hi/hj | b]  +  (log ε)/n )2

Thus, provided that the average expected quality of the information, EQI[cn | hi/hj | b], for the stream of experiments and observations cn doesn't get too small (as n increases), and provided that the average variance, VQI[cn | hi/hj | b], doesn't blow up (e.g. it is bounded above), hypothesis hi say it is highly likely that outcomes of cn will be such as to make the likelihood ratio against hj as compared to hi as small as you like, as n increases.

Proof: Let

 V = VQI[cn | hi/hj | b] E = EQI[cn | hi/hj | b] Q[en] = QI[en | hi/hj | b·cn] = log(P[en | hi·b·cn]/P[en | hj·b·cn])

Choose any small ε > 0. And suppose (for n large enough) that E > −(log ε)/n. Then we have

 V = ∑{en: P[en | hj·b·cn] > 0} (E − Q)2 · P[en | hi·b·cn] ≥ ∑{en: P[en | hj·b·cn] > 0 & Q[en] ≤ −(log ε)} (E − Q)2 · P[en | hi·b·cn] ≥ (E + (log ε))2 · ∑{en: P[en | hj·b·cn] > 0 & Q[en] ≤ −(log ε)} P[en | hi·b·cn] = (E + (log ε))2 · P[ {en: P[en | hj·b·cn] > 0 & Q[en] ≤ log(1/ε)} | hi·b·cn] = (E + (log ε))2 · P[ {en: P[en | hj·b·cn]/P[en | hi·b·cn] ≥ ε} | hi·b·cn]
So,
 (1/n) · V / (E + (log ε)/n)2 = V/(E + (log ε))2 ≥ P[ {en: P[en | hj·b·cn]/P[en | hi·b·cn] ≥ ε} | hi·b·cn] = 1 − P[ {en: P[en | hj·b·cn]/P[en | hi·b·cn] < ε} | hi·b·cn]

Thus, for any small ε>0,

P[ {en: P[en | hj·b·cn]/P[en | hi·b·cn] < ε} | hi·b·cn] ≥ 1 − (1/n)· V / (E + (log ε)/n)2

(End of Proof)

The previous theorem shows that when VQI is bounded above, a sufficiently long stream of evidence will very likely result in the refutation of false competitors of a true hypothesis. This claim holds regardless of whether the evidence can be chunked into independent pieces. However, we can use the independence conditions to describe a very simple provision under which VQI is indeed bounded above. This gives us the theorem stated in the main text.

Likelihood Ratio Convergence Theorem 2 — The Non-falsifying Refutation Theorem.
Suppose that the independent evidence conditions hold. And suppose γ > 0 is a number smaller than 1/e2   (≈ .135). And suppose that for each possible outcome oku of each observation condition ck in cn, either P[oku | hi·b·ck·(ck−1·ek−1)] = 0 or P[oku | hj·b·ck·(ck−1·ek−1)] / P[oku | hi·b·ck·(ck−1·ek−1)] ≥ γ. Choose positive ε < 1, as small as you like, but large enough (for the number of observations n being contemplated) that the value of EQI[c | hi/hj | hi·b]  >  −(log ε)/n. Then

P[ {en : P[en | hj·b·cn] / P[en | hi·b·cn] < ε} | hi·b·cn]   >

 1 − 1—n · (log γ)2 —————————————— EQI[cn | hi/hj | hi·b] + (log ε)/n )2

Proof: This follows from Theorem 2* together with the following observation, which holds given the independence conditions:

If for each ck in cn, for each of its possible outcomes oku, either P[oku | hj·b·ck] = 0 or P[oku | hj·b·ck]/P[oku | hi·b·ck] ≥ γ > 0, where γ < 1, then V = VQI[cn | hi/hj | b] ≤  (log γ)2.

To see that this observation holds, assume its antecedent.

1. First notice that when 0 < P[ek | hj·b·ck] < P[ek | hi·b·ck] we have
(log[P[ek | hi·b·ck]/P[e k | hj·b·ck]])2 · P[ek | hi·b·ck]  ≤ (log γ)2 · P[ek | hi·b·ck].

So we only need establish that when P[ek | hj·b·ck] > P[ek | hi·b·ck] > 0, we will also have this relationship — i.e., we will also have

(log[P[ek | hi·b·ck]/P[e k | hj·b·ck]])2 · P[ek | hi·b·ck]  ≤  (log γ)2 · P[ek | hi·b·ck].

(Then it will follow easily that VQI[cn | hi/hj | b] ≤ (log γ)2, and we'll be done.)

2. To establish the needed relationship, suppose that P[ek | hj·b·ck] > P[ek | hi·b·ck]  > 0. Notice that for all p ≤ q, p and q between 0 and 1, the function g(p) = (log(p/q))2 · p has a minimum at p = q, where g(p) = 0, and (for p < q) has a maximum value at p = q/e2 — i.e. at p/q = 1/e2. (To get this, take the derivative of g(p) with respect to p and set it equal to 0; this gives a maximum for g(p) at p = q/e2.)

So, for 0 < P[ek | hi·b·ck] < P[ek | hj·b·ck] we have

(log(P[ek | hi·b·ck]/P[e k | hj·b·ck]))2 · P[ek | hi·b·ck]   ≤
(log(1/e2))2 · P[ek | hj·b·ck]  ≤ (log γ)2 · P[ek | hj·b·ck]

(since, for γ ≤ 1/e2 we have logγ ≤  log(1/e2) < 0; so (logγ)2 ≥ (log(1/e 2))2  > 0).

3. Now (assuming the antecedent of the theorem), for each ck,
 VQI[ck | hi/hj | b] = ∑{oku: P[oku | h j·b·ck] > 0} (EQI[ck] − QI[ck])2 · P[oku | hi·b·ck] = ∑{oku: P[oku | hj·b·ck] > 0} (EQI[ck]2 − 2·QI[ck]·EQI[ck] + QI[ck]2) · P[oku | hi·b·ck] = ∑{oku: P[oku | hj·b·ck] > 0} QI[ck]2 · P[oku | hi·b·ck] − EQI[ck]2 ≤ ∑{oku: P[oku | hj·b·ck] > 0} QI[ck]2 · P[oku | hi·b·ck] ≤ (log γ)2.

So, given independence,

VQI[ck | hi/hj | b]  =  (1/n)·Σk=1n VQI[ck | hi/hj | b]  ≤ (log γ)2.