#### Supplement to Inductive Logic

## The Effect on EQI of Partitioning the Outcome Space More Finely — and Proof of the Nonnegativity of EQI Theorem

Here again we will only explicitly treat the case where
*condition-independence* is assumed. If
*result-independence* holds as well, all occurrences of
‘(*c*^{k−1}·*e*^{k−1})’
may be dropped, which gives the theorem stated in the text. If neither
*independence condition* holds, all occurrences of
‘*c*_{k}·(*c*^{k−1}·*e*^{k−1})’
here are replaced by
‘*c*^{n}·*e*^{k−1}’,
and occurrences of
‘*b*·*c*^{k−1}’ are
replaced by
‘*b*·*c*^{n}’.

Given some experiment or observation (or series of them) *c*,
is there any special advantage to parsing the space of possible
outcomes *O* into more, rather than fewer alternatives?
Couldn't we do as well at confirming hypotheses by parsing the space
of outcomes into only two or three alternatives − e.g., one
possible outcome that *h*_{i} says is very
likely and *h*_{j} says is rather unlikely
(e.g., describing a *rejection region* for
*h*_{j}), one that
*h*_{i} says is rather unlikely and
*h*_{j} says is very likely (e.g., describing
a *rejection region* for *h*_{i}), and
perhaps a third outcome on which *h*_{i} and
*h*_{j} pretty much agree? The answer is
*no*, we cannot generally do as well at confirming hypotheses
this way. In general, parsing the space of outcomes into more
empirically distinct alternatives results in a better measure of
confirmation. To see this intuitively, suppose some outcome
description *q* can be parsed into two distinct outcome
descriptions, *q*_{1} and *q*_{2} (where
*q* is equivalent to
(*q*_{1}∨
*q*_{2})), and suppose that
*h*_{i} differs from
*h*_{j} much more on the likelihood of
*q*_{1} than on the likelihood of
*q*_{2}. Then, intuitively, when *q* is found to
be true, whichever of the more precise descriptions,
*q*_{1} or *q*_{2}, is true should make
a difference in how strongly the hypotheses are supported. So
reporting whichever of *q*_{1} or
*q*_{2} occurs will be more informative than simply
reporting *q*. If the outcome of the experiment is only
described as *q*, relevant information is lost.

It turns out that EQI measures how well possible outcomes can distinguish between hypotheses in a way that reflects the intuition that a finer partition of outcomes is more informative. The numerical value of EQI is always made larger by parsing the outcome space more finely, provided that the likelihoods for outcomes in the finer parsing differ at least a bit form the likelihoods for outcomes of a less refined parsing. This is important for our main convergence result because in that theorem we want EQI to be positive, and the larger the better.

The following **Partition Theorem** implies the
**Nonnegativity of EQI** theorem. It show that each
EQI[*c*_{k} | *h*_{i}/*h*_{j} | *b*·(*c*^{k−1}·*e*^{k−1})]
must be non-negative, and will be positive *just in case* for
at least one possible outcome *o*_{ku},
*P*[*o*_{ku} | *h*_{j}·*b*·*c*_{k}·(*c*^{k−1}·*e*^{k−1})]
≠
*P*[*o*_{ku} | *h*_{i}·*b*·*c*_{k}·(*c*^{k−1}·*e*^{k−1})].
It also shows that
EQI[*c*_{k} | *h*_{i}/*h*_{j}
|*b*·(*c*^{k−1}·*e*^{k−1})]
generally becomes larger with finer partitionings of the outcome
space.

Notice that this result (when proved) implies that

EQI[c_{k}|h_{i}/h_{j}|b·c^{k−1}] = ∑_{{ek−1}}EQI[c_{k}|h_{i}/h_{j}|b·(c^{k−1}·e^{k−1})] ·P[e^{k−1}|h_{i}·b·c^{k−1}]

must be non-negative, and will be positive *iff* for at least
one possible outcome *o*_{ku},

P[o_{ku}|h_{j}·b·c_{k}·(c^{ k−1}·e^{k−1})] ≠P[o_{ku}|h_{i}·b·c_{k}·(c^{ k−1}·e^{k−1})].

And since,

EQI[c^{n}|h_{i}/h_{j}|b] = ∑_{k=1}^{n}EQI[c_{k}| h_{i}/h_{j}|b·c^{k−1}],

we also get that the average EQI,
EQI
[*c*^{n} | *h*_{i}/*h*_{j} | *b*],
must be non-negative, and must be positive *iff* for some *k*,

P[o_{ku}|h_{j}·b·c_{k}·(c^{ k−1}·e^{k−1})] ≠P[o_{ku}|h_{i}·b·c_{k}·(c^{ k−1}·e^{k−1})];

and it becomes larger as finer partitionings make the component
EQI[*c*_{k} | *h*_{i}/*h*_{j} | *b*·(*c*^{k−1}·*e*^{k−1})]
larger.

Partition Theorem:

For any positive real numbersr_{1},r_{2},s_{1},s_{2}:

- if
r_{1}/s_{1}> (r_{1}+r_{2})/(s_{1}+s_{2}), then (r_{1}+r_{2}) log[(r_{1}+r_{2})/(s_{1}+s_{2})] <r_{1}log[r_{1}/s_{1}] +r_{2}log[r_{2}/s_{2}]; and- if
r_{1}/s_{1}= (r_{1}+r_{2})/(s_{1}+s_{2}), thenr_{1}log[r_{1}/s_{1}] +r_{2}log[r_{2}/s_{2}] = (r_{1}+r_{2}) log[(r_{1}+r_{2})/(s_{1}+s_{2})].

For the **Proof**, first notice that

r_{1}/s_{1}= (r_{1}+r_{2})/(s_{1}+s_{2})iff r_{1}s_{1}+r_{1}s_{2}=s_{1}r_{1}+s_{1}r_{2}iff r_{1}/s_{1}=r_{2}/s_{2}.

We establish case (2) first. Suppose the antecedent of case (2) holds. Then,

r_{1}log[r_{1}/s_{1}] +r_{2}log[r_{2}/s_{2}]

= r_{1}log[(r_{1}+r_{2})/(s_{1}+s_{2})] +r_{2}log[(r_{1}+r_{2})/(s_{1}+s_{2})]= ( r_{1}+r_{2}) log[(r_{1}+r_{2})/(s_{1}+s_{2})].

To get case (1), consider the following function of *p*:
*f*(*p*) = *p* log[*p*/*u*] + (1−*p*) log[(1−*p*)/*v*],
where we only assume that *u* > 0, *v* > 0, and
0 < *p* < 1. This function has its minimum value when
*p* = *u*/(*u*+*v*). (This is easily
verified by setting the derivative of *f*(*p*) with
respect to *p* equal to 0 to find the minimum value of
*f*(*p*); and it is easy to verified that this is a
minimum rather than a maximum value.) At this minimum, where
*p* = *u*/(*u*+*v*), we have

f(p)= − u/(u+v) log[u+v] −v/(u+v) log[u+v]= −log[ u+v].

Thus, for all values of *p* other than *u*/(*u*+*v*),

−log[ u+v]< f(p)= plog[p/u] + (1−p) log[(1−p)/v].

That is, for *p* ≠ *u*/(*u*+*v*),
−log[*u*+*v*] < *p*
log[*p*/*u*] + (1−*p*)
log[(1−*p*)/*v*]. Now, let *p* =
*r*_{1}/(*r*_{1}+*r*_{2}), let *u* =
*s*_{1}/(*r*_{1}+*r*_{2}), and let *v* =
*s*_{2}/(*r*_{1}+*r*_{2}). Plugging into the
previous formula, and multiplying both sides by
(*r*_{1}+*r*_{2}), we get:

if

r_{1}/(r_{1}+r_{2}) ≠s_{1}/(s_{1}+s_{2}) (i.e., ifr_{1}/s_{1}≠ (r_{1}+r_{2})/(s_{1}+s_{2})),then

(r_{1}+r_{2}) log[(r_{1}+r_{2})/(s_{1}+s_{2})] <r_{1}log[r_{1}/s_{1}] +r_{2}log[r_{2}/s_{2}].

This completes the proof of the theorem.

To apply this result to
EQI[*c*_{k} | *h*_{i}/*h*_{j} |
*b*·(*c*^{k−1}·*e*^{k−1})] recall
that

EQI[c_{k}|h_{i}/h_{j}|b·(c^{k−1}·e^{k−1})]

= ∑{u: P[o_{ku}|h_{j}·b·c_{k}] > 0} log[P[o_{ku}|h_{i}·b·c_{k}·(c^{ k−1}·e^{k−1})] /P[o_{ku}|h_{j}·b·c_{k}·(c^{ k−1}·e^{k−1})]] ·P[o_{ku}|h_{i}·b·c_{k}·(c^{ k−1}·e^{k−1})].

Suppose *c*_{k} has * m* alternative outcomes

*o*

_{ku}on which both

P[o_{ku}|h_{j}·b·c_{k}·(c^{k−1}·e^{ k−1})] > 0

and

P[o_{ku}|h_{i}·b·c_{k}·(c^{k−1}·e^{ k−1})] > 0.

Let's label their likelihoods relative to
*h*_{i} (i.e., their likelihoods
*P*[*o*_{ku} | *h*_{i}·*b*·*c*_{k}·(*c*^{k−1}·*e*^{k−1})])
as *r*_{1}, *r*_{2}, …,
*r*_{m}. And let's label their likelihoods
relative to *h*_{j} as *s*_{1},
*s*_{2}, …, *s*_{m}. In terms of
this notation,

EQI[ c_{k}|h_{i}/h_{j}|b]= m

∑

u= 1r_{u}·log[r_{u}/s_{u}].

Notice also that
(*r*_{1}+*r*_{2}+*r*_{3}+…+*r*_{m}) = 1
and (*s*_{1}+*s*_{2}+*s*_{3}+…+*s*_{m})
= 1.

Now, think of
EQI[*c*_{k} | *h*_{i}/*h*_{j} | *b*·(*c*^{k−1}·*e*^{k−1})]
as generated by applying the theorem in successive steps:

0 = 1· log[1/1] = ( r_{1}+r_{2}+r_{3}+…+r_{m})·log[(r_{ 1}+r_{2}+r_{3}+…+r_{m})/(s_{1}+s_{ 2}+s_{3}+…+s_{m})]≤ r_{1}·log[r_{1}/s_{1}] + (r_{2}+r_{3}+…+r_{m})· log[(r_{2}+r_{3}+…+r_{m})/(s_{2}+s_{ 3}+…+s_{m})]≤ r_{1}·log[r_{1}/s_{1}] +r_{2}·log[r_{2}/s_{2}] + (r_{3}+…+r_{m})·log[(r_{3}+…+r_{m})/(s_{ 3}+…+s_{m})]≤ … ≤

m

∑

u= 1r_{u}·log[r_{u}/s_{u}]= EQI[ c_{k}|h_{i}/h_{j}|b·(c^{k−1}·e^{k−1})].

The theorem also says that *at each step* equality holds just
in case

r_{u}/s_{u}= (r_{u}+r_{u+1}+…+r_{m})/(s_{u}+s_{u+1}+…+s_{ m}),

which itself holds just in case

r_{u}/s_{u}= (r_{u+1}+…+r_{m})/(s_{u+1}+…+s_{m}).

So,

EQI[c_{k}|h_{i}/h_{j}|b·(c^{k−1}·e^{k−1})] = 0

just in case

1 = ( r_{1}+r_{2}+r_{3}+…+r_{m})/(s_{1}+s_{ 2}+s_{3}+…+s_{m})= r_{1}/s_{1}= ( r_{2}+r_{3}+…+r_{m})/(s_{2}+s_{3}+…+s_{ m})= r_{2}/s_{2}= ( r_{3}+…+r_{m})/(s_{3}+…+s_{m})= r_{3}/s_{3}= … = r_{m}/s_{m}.

That is,

EQI[c_{k}|h_{i}/h_{j }|b·(c^{k−1}·e^{k−1})] = 0

just in case for all *o*_{ku} such that
*P*[*o*_{ku} | *h*_{j}·*b*·*c*_{k}·(*c*^{k−1}·*e*^{k−1})]
> 0 and
*P*[*o*_{ku} | *h*_{i}·*b*·*c*_{k}·(*c*^{k−1}·*e*^{k−1})]
> 0,

P[o_{ku}|h_{i}·b·c_{k}·(c^{ k−1}·e^{k−1})]/P[o_{ku}|h_{j}·b·c_{ k}·(c^{k−1}·e^{k−1})] = 1.

Otherwise,

EQI[c_{k}|h_{i}/h_{j}|b·(c^{k−1}·e^{k−1})] > 0;

and for each successive step in partitioning the outcome space to
generate
EQI[*c*_{k} | *h*_{i}/*h*_{j} | *b*·(*c*^{k−1}·*e*^{k−1})],
if

r_{u}/s_{u}≠ (r_{u}+r_{u+1}+…+r_{m})/(s_{u}+s_{u+1}+…+s_{m}),

we have the strict inequality:

(r_{u}+r_{u+1}+…+r_{m}) · log[(r_{u}+r_{u+1}+…+r_{m})/(s_{u}+s_{ u+1}+…+s_{m})] <

r_{u}·log[r_{u}/s_{u}] + (r_{u+1}+…+r_{m})·log[(r_{u+1}+…+r_{m})/(s_{u+1}+…+s_{m})].

So each such partitioning of
(*o*_{ku}∨o_{ku+1}∨…∨*o*_{km})
into two separate
propositions, *o*_{ku} and
(o_{ku+1}∨…∨*o*_{km}),
adds a strictly positive contribution to the size of
EQI[*c*_{k} | *h*_{i}/*h*_{j} | *b*·(*c*^{k−1}·*e*^{k−1})].