Stanford Encyclopedia of Philosophy
This is a file in the archives of the Stanford Encyclopedia of Philosophy.

Combinatory Logic

First published Fri Nov 14, 2008

Combinatory logic (henceforth: CL) is an elegant and powerful logical theory that is connected to many areas of logic, and has found applications in other disciplines, especially, in computer science and mathematics.

CL was originally invented as a continuation of the reduction of the set of logical constants to a singleton in classical first-order logic (FOL). CL untangles the problem of substitution, because formulas can be prepared for the elimination of bound variables by inserting combinators. Sometimes, bound variables are thought to signify “ontological commitments.” A philosophical rôle of CL is to show the variability of the ontological involvements a theory has.

Substitution is a crucial operation not only in first-order logics, but also in higher-order logics, as well as in other formal systems that contain a variable binding operator, such as the λ-calculi and the ε-calculus. Indeed, carrying out substitution correctly is particularly pressing in λ-calculi and in the closely related functional programming languages. CL can emulate λ-abstraction despite the fact that CL has no variable binding operators, which makes CL a particularly suitable language for compilers of functional programming languages.

The connection to λ-calculi might suggest—correctly—that CL is sufficiently expressive to formalize recursive functions (i.e., computable functions) and arithmetic. Consequently, CL is susceptible to Gödel-type incompleteness theorems.

CL is an archetypical term rewriting system (TRS). These systems comprise a wide range of formal calculi from syntactic specifications of programming languages and context-free grammars to Markov algorithms; even some number theoretic problems may be viewed as special instances of questions about TRSs. Several notions and proof techniques that were originally invented for CL, later turned out to be useful in applications to less well-understood TRSs.

CL is connected to nonclassical logics via typing. First, a correspondence between formulas that are provable in the implicational fragment of intuitionistic logic and the typable combinatory terms was discovered. Then the isomorphism was generalized to other combinatory bases and implicational logics (such as the logic of relevant implication, exponential-free linear logic, affine logic, etc.).

Self-reference factors into some paradoxes, most interestingly into “The liar” and Russell's paradox. The set theoretical understanding of functions also discourages the idea of self-application. Thus it is remarkable that pure untyped CL does not exclude self-application of functions. Moreover, its mathematical models showed that a theory in which functions can become their own arguments is completely sensible, in addition to being consistent (what was established earlier using proof theoretic methods).

1. Schönfinkel's elimination of bound variables

1.1 The problem of substitution

Classical first-order logic includes quantifiers that are denoted by ∀ (“for all”) and ∃ (“there is a”). A simple sentence such as “All birds are animals” may be formalized as ∀x(Bx ⊃ Ax), where x is a variable, B and A are one-place predicates, and ⊃ is a (material) implication. The occurrences of the variables in ∀x(Bx ⊃ Ax) are bound, whereas those in the open formula Bx ⊃ Ax are free. If we assume that t (“Tweety”) is a name constant, then an instance of the above sentence is Bt ⊃ At, that may be read as “Tweety is an animal, provided Tweety is a bird.” This illustrates that the instantiation of a (universal) quantifier involves substitution.

Due to the simplicity of the example, the substitution of t for x in B and in A seems to be easy to understand and to perform. However, a definition of substitution for FOL (and in general, for an abstract syntax, that is, for a language with a variable binding operator) has to guarantee that no free occurrence of a variable in the substituted expression becomes bound in the resulting expression.

To see what can go wrong, let us consider the (open) formula ∀x(Rxy ∧ Rxr), where R is a two-place predicate, r is a name constant abbreviating “Russell” and ∧ is conjunction. ∀x(Rxy ∧ Rxr) contains a free occurrence of y (that is, y is a free variable of the formula), however, y is not free for a substitution of a term that contains a free occurrence of x, for instance, x itself. More formally, the occurrence of y in the second argument place of R in ∀x(Rxy ∧ Rxr) is not bound by a quantifier (the only quantifier of the formula), whereas ∀x(Rxx ∧ Rxr) is a closed formula, that is, it contains no free occurrences of variables. Informally, the following natural language sentences could be thought of as interpretations of the previous formulas. “Everybody reads him and Russell,” (where ‘him’ is deictic, or perhaps, anaphoric) and “Everybody reads himself and Russell.” Obviously, the meanings of the two sentences are vastly different, even if we assume that everybody pens something. As a contrast, ∀x(Rxw ∧ Rxr) exhibits an unproblematic substitution of the name constant w for the free occurrence of y. (The expression, perhaps, formalizes the sentence “Everybody reads Wittgenstein and Russell.”) These examples are meant to demonstrate the more complex part of the problem Schönfinkel set out to solve, and for what he invented CL.[1]

1.2 The operators “nextand” and “U

A well-known result about classical sentential logic (SL) is that all truth-functions can be expressed in terms of ¬ and ∧ (or of ¬ and ∨, etc.). A minimal sufficient set of connectives, however, can contain just one connective such as  ∣  (“nand,” what is often called, Sheffer's stroke), or ↓ (“nor”). “Nand” is “not-and,” in other words, A ∣ B is defined as ¬(A ∧ B), where A, B range over formulas and ¬ is negation. Going into the other direction, if  ∣  is a primitive, then ¬ A is definable as A ∣ A. Although formulas with numerous vertical lines may quickly become visually confusing and hard to parse, it is straightforward to prove that  ∣  alone is sufficiently expressive to define all the truth-functions.

Schönfinkel's aim was to minimize the number of logical constants that are required for a formalization of FOL, just as Sheffer (indeed, already Ch. S. Peirce) did for classical propositional logic. One of the two quantifiers mentioned above may be assumed to be defined without loss of generality. Let us say, ∃x A is an abbreviation for ¬∀x ¬A. Even if ¬ and the rest of the connectives are traded in for  ∣ , two logical constants remain: ∀ and  ∣ . A further pressing issue is that quantifiers may be nested (i.e., the scope of a quantifier may fully contain the scope of another quantifier), and the variable bindings (that could be visualized by drawing lines between quantifiers and the variables they bind) may get quite intertwined. Keeping for a moment the familiar logical constants, we may consider the following formula that hints at the emerging difficulties—when the question is tackled in its full generality.[2]

x(∃y(Py ∧Bxy)  ⊃  ∃y(Py ∧ Bxy ∧ ∀z((Rz ∧ Ozy) ⊃ ¬Cz)))

x binds all occurrences of x; the variables in the second argument place of the two Bs are bound by one of the two ∃y's, the latter of which interacts with ∀z via Ozy.

Predicates have a fixed finite arity in FOL, and nothing precludes binding at once a variable in the first argument of one predicate and in the second argument of another predicate. (Indeed, FOL would loose some of its expressiveness, if bindings of this sort would be excluded without some means to compensate for them.) These difficulties persist when a formula is transformed into a(n equivalent) formula in prenex normal form. As long as the variable bindings can interweave and braid into arbitrarily complex patterns, there seems to be no clear strategy for the elimination of bound variables. (Note that free variables in open formulas—in a sense—behave like name constants, and their elimination is neither intended, nor achieved in the procedures described here.)

Schönfinkel's ingenuity was that he introduced combinators to untangle variable bindings. The combinators S, K, I, B and C (in contemporary notation) are his, and he established that S and K suffice to define all the other combinators. In effect, he also defined an algorithm to carry out the elimination of bound variables, which is essentially one of the algorithms used nowadays to define bracket abstraction in CL.[3]

Schönfinkel introduced a new logical constant U, that expresses the disjointness of two classes. For instance, UPQ may be written in usual FOL notation as ¬∃x(Px ∧ Qx), when P and Q are one-place predicates. (The formula may be thought to formalize the natural language sentence “No parrots are quiet.”) In the process of the elimination of the bound variables, UXY is obtained from an expression that contains ‘Xx’ and ‘Yx’, where x does not occur in X or Y, and if X and Y happen to be n-ary predicates with n ≥ 2, then x occurs (only) in their last argument place. For example, “Nobody reads Aristotle and Plato” can be formalized as ¬∃x(Rxa ∧ Rxp), where a and p are name constants that stand for “Aristotle” and “Plato,” respectively. This formula cannot be written as U(Ra)(Rp). On the other hand, “There is nobody whom both Russell and Wittgenstein read,” that is, ¬∃x(Rrx ∧Rwx) turns into U(Rr)(Rw), where the parentheses delineate the arguments of U. Often, the expressions X and Y (in UXY) consist of predicates (and name constants) together with combinators and other U's.

It is useful to have a notation for “nextand” (i.e., “not-exists-and”) without assuming either that x has a free occurrence in the expressions joined, or that if it has one, then it is the last component of the expressions. Following Schönfinkel, we use  ∣x  for the “nextand” operator that binds x. The notation  ∣  (where    is the place for a variable) closely resembles the Sheffer stroke. Indeed, Schönfinkel achieved the reduction of the set of logical constants for FOL to a singleton { ∣ }, because every formula of FOL is equivalent to a formula that contains only “nextand.”

A formula ∀x A is usually defined to be well-formed in FOL even if A has no free occurrences of x. Then, of course, ∀x A is equivalent to A as well as to ∃x A, and such quantifiers are called vacuous. In order to show that any formula can be rewritten into an equivalent formula that contains only “nextand,” it is sufficient to inspect the following definitions for ¬, ∨ and ∀—that are due to Schönfinkel.

¬A   ⇔df   A ∣x A
A ∨ B   ⇔df   (A ∣y A) ∣x (B ∣y B)
x Ax   ⇔df   (Ax ∣y Ax) ∣x (Ax ∣y Ax)

The definition for ¬, for instance, may be justified by the following equivalences. A ⇔ A ∧ A, A ∧ A ⇔ ∃x(A ∧ A), hence by replacement ¬A ⇔ ¬∃x(A ∧ A).

Now we give a concrete example to illustrate how to turn a FOL formula into one that contains only  ∣ , and then how to eliminate the bound variables using U and combinators. To put excitement into the process, we start with the sentence in (#1).

(#1)   For every natural number there is a greater prime.

A straightforward formalization of this sentence—on the foreground of the domain of numbers—is the formula in (#2), (where ‘Nx’ stands for “x is a natural number,” ‘Px’ stands for “x is a prime” and ‘Gxy’ is to be read as “x is greater that y”).

(#2)   ∀yx(Ny ⊃ (Px ∧ Gxy))

This formula is equivalent to ∀y(Ny ⊃ ∃x(Px ∧ Gxy)) and further to ¬∃y ¬(Ny ⊃ ∃x(Px ∧ Gxy)). In one or two more steps, we get Ny ∣y (Px ∣x Gxy). (Expressions are considered to be grouped to the left unless parentheses indicate otherwise. E.g., Gxy is ((Gx)y) not G(xy) as could have been, perhaps, expected based on the most common way of arranging parentheses in FOL formulas.) Unfortunately, neither  ∣x  nor  ∣y  can be replaced by U in the last expression. However, if the arguments of G were permuted then the former reduction could be carried out. One of the combinators, C does exactly what is needed: Gxy can be changed to CGyx (see the definition of combinators in section 2.1). That is, we have Ny ∣y (Px ∣x CGyx), and then Ny ∣y UP(CGy).[4] The expression may give the impression that y is the last component of UP(CGy), which is the second argument of  ∣y , but it is not so. The grouping within expressions cannot be disregarded, and another combinator, B is needed to turn UP(CGy) into the desired form B(UP)(CG)y. From Ny ∣y B (UP)(CG)y, we get UN(B(UP)(CG)) in one more step. This expression is completely free of variables, and it also makes the renaming of bound variables in FOL easily comprehensible: given two sequences of (distinct) variables that are different in their first two elements, the reversal of the above process yields formulas that are (logically equivalent) alphabetic variants of the formula in (#2).

The expression UN(B(UP)(CG)) may look “unfamiliar” when compared to formulas of FOL, but notation—to a large extent—is a matter of convention. It may be interesting to note that the first U is followed by its two arguments, however, the second U is not. B(UP) is a subexpression, but UP(CG) is not a subexpressions of UN(B(UP)(CG)). Furthermore, the whole expression can be transformed into XNPG using combinators, where X is composed of Us and combinators only. Such an X encodes the logical content of the formula with the predicates being arguments.[5]

The expressions obtained via the transformations outlined above quickly become lengthy—as trying to rewrite a simple FOL sentence such as ∃x(Px ∧ Qx) can show.[6] However, this does not diminish the importance of Schönfinkel's theoretical results. A slight increase (if any) in the length of the expressions is not even an inconvenience, let alone an impediment in the era of computers with petabytes of memory.

It seems that Schönfinkel's reduction procedure for FOL is not widely known. As a measure of how widely Sheffer's and Schönfinkel's reductions are known, we appeal to the fact that the first is part of standard intro courses in logic, whereas the second is not. Undoubtedly, one of the reasons for this is that Schönfinkel's process to eliminate bound variables is conceptually more opulent than defining a few truth functions from  ∣  (or ↓). Another reason may be that Schönfinkel, perhaps, did not place a sufficiently strong emphasis on the intermediate step that allows the elimination of all other logical connectives and quantifiers via “nextand.” We may also note that although “nextand” is an operator in the standard logical sense, it is binary—unlike ∀ and ∃, which are singulary.

If A ∣ B  ⇔df  ¬(AB) is added as a definition to SL, then the result is a conservative extension, and it becomes provable that for any formula A(p0, …, pn) (i.e., for a formula containing the displayed propositional variables and some connectives) there is a formula B(p0, …, pn) containing only the connective  ∣ . Furthermore, B(p0, …, pn)  ⇔  A(p0, …, pn) itself is provable.  ∣  is, of course, interpreted as the “nand” truth function. “Nand” as a binary connective or as a binary truth function is of the same sort of object as conjunction, disjunction, etc.

The first stage in Schönfinkel's extension of FOL is analogous. The addition of  ∣  is a conservative extension of FOL, and every occurrence of  ∣  can be eliminated. (We noted that  ∣  is a binary operator, and so it may be thought to combine a quantifier (∃) with connectives (¬, ∧), but  ∣  does not introduce any objects that are not definable in FOL.)

The second stage in Schönfinkel's extension of FOL is slightly different. UXY is definable in FOL only for one-place predicates P and Q (or for predicates of higher arity when the variable in their last argument is bound). Thus neither U is definable in general, nor the combinators are definable.

The elimination of bound variables goes beyond the resources of FOL. The combinators are not only undefinable, but they are new kinds of objects—which are absent in FOL itself. Also, the intermediate steps of the bound variable elimination procedure presuppose that functions of several arguments can be viewed as functions in one variable, and the other way around.[7] Enriching a presentation of FOL with predicate letters that have sufficiently many arguments in the right order would be more or less unproblematic, and it would add objects to the language that would have the same sort of interpretation as other predicates. A potential problem though is that for each predicate, infinitely many (ℵ0 many) new predicates would be needed—together with axioms stipulating the intended equivalences between the meaning of the predicates. Notationally, these steps amount to padding predicate symbols with extra arguments, omitting some arguments, as well as permuting and regrouping the arguments. Although such additions may look superficial, it is crucial for the understanding of Schönfinkel's procedure to eliminate bound variables to view formulas as structured strings of symbols.[8]

In conclusion to this section, it is important to emphasize that there are no questions of consistency with respect to the above reduction process, because it can be viewed—or described in contemporary terms—as a well-defined algorithm. It is a completely different issue that if we consider the language of FOL expanded with combinators, then the resulting system is inconsistent, because CL is powerful enough to define the fixed point of any function. The effect of having fixed points for all functions—including truth functions—may be thought to amount to adding certain biconditionals (which may or may not be valid) as axioms. Notably, both FOL and (pure) CL are consistent.

2. Combinatory terms and their main properties

2.1 Reduction, equality and their formalizations

Cantor's and Russell's paradoxes that were discovered in the late 19th–early 20th century both involve self-membership of a set. The ramified theory of types due to Whitehead and Russell, and ZF (the formalization of set theory named after Zermelo and Fraenkel) exclude self-membership. However, there seems to have been a desire all the time to have a theory that allows self-membership or self-application. Indeed, one of the motivations for Curry to further develop CL was the goal to construct a formal language that allows a wide range of expressions to be well-formed, perhaps, including some expression, which under some interpretations turn out to be meaningless. (This idea may be compared to the Bernays–von Neumann–Gödel formalization of set theory, that does not contain the axiom of foundation. The Russell class simply cannot be proven to be a set, that is, it is a proper class.)

A few natural language examples provide a convenient illustration to clarify the difference between (1), that is a well-formed (but meaningless) expression and (2), which is a meaningful (but ill-formed) sentence. (The meaningfulness of (2), of course, should be taken with a grain of salt: in reality, Gödel proved the system of PM to be incomplete in 1930.)

  1. The derivative of λx (x2 + 4x − 6) wishes to declare that functions are smart.
  2. Peano arithmetics prove incomplete by Gödel at 1930.

After these informal motivations, we turn to CL proper and introduce some of its notions (more) formally.

The objects in CL are called terms.[9] Terms may be thought to be interpreted as functions (as further explained in section 4.1). Primitive terms comprise variables and constants, whereas compound terms are formed by combining terms. Usually, a denumerable set (i.e., a set with cardinality ℵ0) of variables is included, and the constants include some combinators. (We use x, y, z, v, w, u, x0, … as variables in the object language, and M, N, P, Q, … as metavariables that range over terms.)

Terms are inductively defined as follows.

(t1)   If x is a variable, then x is a term,
(t2)   if c is a constant, then c is a term,
(t3)   if M and N are terms, then (MN) is a term.

In the above definition, (t3) conceals the binary operation that conjoins the two terms M and N. This operation is called application, and it is often denoted by juxtaposition, that is, by placing its two arguments next to each other as in (MN).

Application is not assumed to possess additional properties (such as commutativity), because its intended interpretation is function application. For instance, ((vw)u) and (v(wu)) are distinct terms—just as the derivative of λx.x2 + 4x − 6 applied to 8 (that is, (λx.2x + 4)8 = 20) is different from derivative of 90 (that is, (82 + 32 − 6)′ = 0). Using λ notation, the two terms in the example may be expressed as ((λy.y′)(λx.x2 + 4x − 6))8 vs (λy.y′)((λx.x2 + 4x − 6)8).

If terms are viewed as structured strings (where parentheses show grouping), then the number of distinct terms associated to a string of length n is the Catalan number Cn−1. For a non-negative integer n (i.e., for n ∈ ℕ),

Cn =
n + 1
( 2n
) .

The first seven Catalan numbers are C0 = 1, C1 = 1, C2 = 2, C3 = 5, C4 = 14, C5 = 42 and C6 = 132. As an illustration we may take—for simplicity—strings consisting of x's, because the terms are to differ only in their grouping. Clearly, if the term is x or xx, that is of length 1 or 2, then there is only one way to form a term, that is, there exists just one possible term in each case. If we start with three x's, then we may form (xx)x or x(xx). If the length of the term is 4, then the five terms are: xxxx, x(xx)x, xx(xx), x(xxx) and x(x(xx)). (It is a useful exercise to try to list the 14 distinct terms that can be formed from 5 x's.)

The usual notational convention in CL is to drop parentheses from left associated terms together with the outmost pair. For instance, xyz would be fully written as ((xy)z), whereas xy(xz) and (xy)(xz) are both “shorthand versions” of the term ((xy)(xz)) (but not of xyxz). Grouping in terms delineates subterms. For instance, xy is a subterm of each of these terms, whereas yz and yx are subterms of neither term.

Subterms of a term are recursively defined as follows.

(s1)   M is a subterm of M,
(s2)   if M is a subterm of N or of P, then M is a subterm of NP.

Incidentally, the notion of free variables is straightforwardly definable now: x is a free variable of M if x is a subterm of M. The set of free variables of M is denoted by fv(M).

All terms are interpreted as functions, and combinators are functions too. Similarly, to some numerical and geometrical functions, that can be described and grasped easily, the combinators that are frequently encountered can be characterized as perspicuous transformations on terms. (Sans serif letters denote combinators and > denotes one-step reduction.)

Definition. (Axioms of some well-known combinators)
Sxyz  >  xz(yz) Kxy  >  x Ix  >  x
Bxyz  >  x(yz) Txy  >  yx Cxyz  >  xzy
Wxy  >  xyy Mx  >  xx Yx  >  x(Yx)
   Jxyzv  >  xy(xvz)       B′xyz  >  y(xz)       Vxyz  >  zxy   

These axioms tacitly specify the arity of a combinator as well as their reduction (or contraction) pattern. Perhaps, the simplest combinator is the identity combinator I, that applied to an argument x returns the same x. K applied to x is a constant function, because when it is further applied to y, it yields x as a result, that is, K is a cancellator with respect to its second argument. W and M are duplicators, because in the result of their application one of the arguments (always) appears twice.[10] C, T and V are permutators, because they change the order of some of their arguments. B is an associator, because Bxyz turns into a term in which y is applied to z before x is applied to the result. Y is the fixed point combinator, because for any function x, Yx is the fixed point of that function (see section 2.3). The combinator B′ is an associator and a permutator, whereas S and J are also duplicators. S is very special and it is called the strong composition combinator, because when applied to two functions, let us say, f and g, as well as x, then the resulting term gx(fx) expresses the composition of g and f  both applied to the same argument x.

These informal explications did not mention any restrictions on the sort of functions x, y, z, f, g, … may be. However, the axioms above (formally) limit the applicability of the combinators to variables. Intuitively, we would like to say that given any terms, that is, any functions M and N, WMN one-step reduces to MNN. For example, M may be K and N may be yy, and then WK(yy) > K(yy)(yy). The latter term suggests a further one-step reduction, and indeed we might be interested in successive one-step reductions—such as those leading from WK(yy) to yy. A way to achieve these goals is to formalize (a theory of) CL as an extension of the standard inequational logic by adding certain axioms and rules.

Inequational Calculus for CL (CL≥).
M ≥ N   N ≥ P
M ≥ P
M ≥ N
M ≥ N

The use of metavariables encompasses substitution (that we illustrated above on the term WMN). The identity axiom and the rule of transitivity imply that ≥ is a transitive and reflexive relation. The last two rules characterize application as an operation that is monotone in both of its argument places. CL≥ includes only S and K, because the other combinators are definable from them—as we already mentioned in section 1.2, and as we explain more precisely toward the end of this section.

The set of combinators { S, K } is called a combinatory base, that is, these two combinators are the undefined constants of CL≥. To give an idea of a proof in this calculus, the following steps may be pieced together to prove SKK(KKK) ≥ K. KKK ≥ K is an instance of an axiom. Then SKK(KKK) ≥ SKKK is obtained by right monotonicity, and further, SKK(KKK) ≥ K results by instances of the S and K axioms together with applications of the transitivity rule.

The relation ≥ is called weak reduction, and it may be defined alternatively as follows. (‘Weak reduction’ is a technical term used in CL to distinguish this relation on the set of terms from some other relations, one of which is called ‘strong reduction’.) A term that is either of the form SMNP or of the form KMN is a redex, and the leading combinators (S and K, respectively) are the heads of the redexes. If a term Q contains a subterm of the form SMNP, then Q′ obtained by replacing that subterm by MP(NP) is a one-step reduct of Q. (Similarly, for the redex KMN and M.) That is, Q > Q′ in both cases. Reduction then may be defined as the reflexive transitive closure of one-step reduction. This notion is completely captured by CL≥. The calculus CL≥ is complete in the sense that if M ≥ N in the sense we have just described, then CL≥ proves M ≥ N. (It is easy to see that the converse implication is true too.)

The notion of reduction is a weaker relation than one-step reduction, and so it is useful to distinguish a subclass of terms using the stronger relation. A term is in normal form (nf) when it contains no redexes. Note that one-step reduction need not decrease the total number of redexes that a term contains, hence, is does not follow that every term can be turned into one in nf after finitely many one-step reductions. Indeed, some terms do not reduce to a term in nf.

Reduction is arguably an important relation between terms that denote functions. The typical steps in a program execution and in other concrete calculations are function applications rather than moves in the other direction, what is called expansion. However, the notion of the equality of functions is familiar to everybody from mathematics, and the analogous notion has been introduced in CL too. The symmetric closure of the reduction relation is called (weak) equality. A formalization of equational CL may be obtained by extending the standard equational logic with combinatory axioms and rules characterizing application.

Equational Calculus for CL (CL=).
M = N   N = P
M = P
M = N
N = M
M = N
M = N

The first axiom and the first two rules constitute equational logic. The constants are again the combinators S and K. Note that CL= could have been defined as an extension of CL≥ by adding the rule of symmetry, that would have paralleled the description of the definition of equality from reduction. We chose instead to repeat the inequational axioms and rules with the new notation (and add the rule of symmetry) to make the two definitions self-contained and easy to grasp. (The two characterizations of = coincide—as those of ≥ did.)

CL≥ and CL= share a feature that may or may not be desirable—depending on what sort of understanding of functions is to be captured. To illustrate the issue, let us consider the one-place combinators SKK and SK(KK). It is easy to verify that SKKM ≥ M and SK(KK)M ≥ M. However, neither SKK ≥ SK(KK) nor SK(KK) ≥ SKK is provable in CL≥; a fortiori, the equality of the two terms in not provable in CL=. This means that CL≥ and CL= formalize intensional notions of functions, where “intensionality” implies that functions that give the same output on the same input may remain distinguishable.

The archetypical intensional functions that one is likely to encounter are algorithms. As examples, we might think of various specifications to calculate the decimal expansion of π, or various computer programs that behave in the same way. For instance, compilers (for one and the same language) may differ from each other by using or not using some optimizations, and thereby, producing programs from a given piece of code that have identical input–output behavior but different run times.

If functions that are indistinguishable from the point of view of their input–output behavior are to be identified, that is, an extensional understanding of functions is sought, then CL≥ and CL= have to be extended by the following rule, (where the symbol ‡ is to be replaced by ≥ or =, respectively).

Mx ‡ Nx
M ‡ N     where x is not free in MN.

2.2 Church–Rosser theorems and consistency theorems

The calculi CL≥ and CL= of the previous section formalize reduction and equality. However, ≥ and = have some further properties that are important when the terms are thought to stand for functions. The next theorem is one of the earliest and best-known results about CL.

Church–Rosser theorem (I). If M reduces to N and P, then there is a term Q to which both N and P reduce.
figure 1
Figure 1: Illustration for the Church–Rosser theorem (I)

If we think that reduction is like computing the value of a function, then the Church–Rosser theorem—in a first approximation—can be thought to state that the final result of a series of calculations with a term is unique—independently of the order of the steps. This is a slight overstatement though, because uniqueness implies that each series of calculations ends (or “loops” on a term). That is, if there is a unique final term, then only finitely many distinct consecutive calculation steps are possible.

A coarse analogy with elementary arithmetic operations, perhaps, can shed some light on the situation. The addition and multiplication of natural numbers always yield a natural number. However, if division is included then it is no longer true that all numerical expressions evaluate to a natural number, since 7/5 is a rational number that is not a natural one, and n/0 is undefined (even if n were real). That is, some numerical expressions do not evaluate to a (natural) number. Although the analogy with combinatory terms is not very tight, it is useful. For instance, n/0 (even if the domain of the function λn.n/0 would be extended to permit n to be rational) could be implemented in a machine by a loop (that would never terminate when executed) which would go through an enumeration of the rational numbers trying to find an r such that r · 0 = n.

The combinatory terms WWW and WI(WI) are, perhaps, the simplest examples of terms that do not have a normal form. Both terms induce an infinite reduction sequence, that is, an infinite chain of successive one-step reductions. To make the example more transparent, let us assume for a moment that W, I, C, etc. are not defined from K and S, but are primitive constants. The contraction of the only redex in WWW returns the same term, which shows that uniqueness does not imply that the term is in nf. The contraction of the only redex in WI(WI) gives I(WI)(WI) that further reduces to the term we started with. A slightly more complicated example of a term that has only infinite reduction sequences is Y(CKI). This term has a reduction sequence (in which each contracted redex is headed by Y) that contains infinitely many distinct terms. To sum up, the Church–Rosser theorem, in general, does not guarantee the uniqueness of the term Q. However, if M has a normal form then that is unique.

The Church–Rosser theorem is often stated as follows.

Church–Rosser theorem (II). If N and P are equal, then there is a term Q to which both N and P reduces.
figure 2
Figure 2: Illustration for the Church–Rosser theorem (II)

The second form of the Church–Rosser theorem differs from the first in its assumption. From the definition of equality as a superset of reduction, it is obvious that the first form of the theorem is implied by the second. However, despite the weaker assumption in the second formulation of the Church–Rosser theorem, the two theorems are equivalent. Equality is the symmetric closure of reduction, which means that if two terms are equal then there is a finite path comprising reduction and expansion steps (which decompose into one-step reductions and one-step expansions, respectively). Then by finitely many applications of the first Church–Rosser theorem (i.e., by induction on the length of the path connecting N and P), the first Church–Rosser theorem implies the second formulation.

Modern proofs of the Church–Rosser theorem for CL proceed indirectly because one-step reduction fails to have the diamond property. A binary relation R (e.g., reduction) is said to have the diamond property when xRy and xRz imply that yRv and zRv for some v. If a binary relation R has the diamond property, so does its transitive closure. To exploit this insight in the proof of the Church–Rosser theorem, a suitable subrelation of reduction has to be found. The sought after subrelation should possess the diamond property, and its reflexive transitive closure should coincide with reduction.

The following counterexample illustrates that one-step reductions of a term may yield terms that further do not reduce to a common term in one step. SKK(KKK) > SKKK and SKK(KKK) > K(KKK)(K(KKK)), and then the potential reduction sequences are as follows.

  1. SKKK > KK(KK) > K
  2. K(KKK)(K(KKK)) > KKK > K
  3. K(KKK)(K(KKK)) > KK(K(KKK)) > KK(KK) > K
  4. K(KKK)(K(KKK)) > K(KKK)(KK) > KKK > K
  5. K(KKK)(K(KKK)) > K(KKK)(KK) > KK(KK) > K

The failure of the diamond property is obvious once we note that SKKK > KK(KK) (only), but K(KKK)(K(KKK)) does not reduce in one step to KK(KK).

An appropriate subrelation of reduction is the simultaneous contraction of a set of nonoverlapping redexes, which is denoted by >sc. ‘Nonoverlapping’ means that there are no shared subterm occurrences between two redexes.  >sc includes > because a one-step reduction of a redex may be viewed instead as >sc of a singleton set of redexes.  >sc is, obviously, included in ≥ (i.e., in reduction). These two facts imply that the reflexive transitive closure of >sc is reduction—when the tonicity of the reflexive transitive closure operation (denoted by *) is taken into account.

(1)–(3) summarize the key inclusions between the relations mentioned.

  1. > ⊆ >sc  ⇒  >* ⊆ >sc*,
  2. >sc ⊆ ≥  ⇒  >sc* ⊆ ≥*,
  3. >* ⊆ ≥*   and   ≥* = ≥.

The central fact about ≥sc is the content of the following theorem.

Theorem. (Diamond property for >sc) If M >sc N and M >sc P then there is a term Q such that both N >sc Q and P >sc Q.

The proof of this theorem is an easy induction on the term M. The properties of >sc guarantee that one or more one-step reductions can be performed at once, but the reductions cannot interfere (or overlap) with each other.

The consistency of CL follows from the Church–Rosser theorem together with the existence of (at least two) distinct normal forms.

Theorem. (Consistency) CL is consistent, that is, there are terms that do not reduce to each other, hence they are not equal.

Not all terms have an nf, however, many do. Examples, first of all, include S and K. (The variables, if included, of which there are ℵ0 many, are all in nf.) None of these terms contains a redex, hence they reduce only to themselves. By the Church–Rosser theorem, it is excluded that some term M could reduce to both x and S (making S equal to x).

The interaction between infinite reduction sequences and nfs deserves a more careful inspection though. The terms WWW, Y(CKI) and WI(WI) have only infinite reduction sequences. However, the existence of an infinite reduction sequence for a term does not imply that the term has no normal form (when the combinatory base is complete or contains a cancellator). Y(KI) reduces to KI(Y(KI)), KI(KI(Y(KI))), KI(KI(KI(Y(KI)))), … as well as to I.

A term weakly normalizes when it has an nf, whereas a term strongly normalizes when all its reduction sequences lead to an nf (hence, to the nf) of the term. A computational analogue of a strongly normalizing term is a (nondeterministic) program that terminates on every branch of computation, whereas termination on at least branch is akin to weak normalization.

2.3 The existence of fixed points and combinatorial completeness

Schönfinkel proved that S and K suffice to define the other combinators he introduced, and we mentioned in the definition of CL≥ that the set of constants is limited to S and K, because other combinators could be defined from them.

To demonstrate the sense in which definability is understood here we consider the example of B. The axiom for B is Bxyz > x(yz), and if we take S(KS)K instead of B, then the following reduction sequence results.

S(KS)Kxyz > KSx(Kx)yz > S(Kx)yz > Kxz(yz) > x(yz)

The term S(KS)K is in nf, however, to be in nf is not a requirement for definability. It is more convenient to work with defining terms that are in nf, because an application of a combinator that is not in nf could be started with reducing the combinator to its normal form. (Also, there are always infinitely many combinators that reduce to a combinator.) However, note that the preference for choosing combinators in nf is not meant to imply that a combinator cannot be defined by two or more terms in nf. E.g., we give here two definitions (involving only S and K) for I.

If the constants are S and K, then the combinators are all those terms that are formed from S and K (without variables). Once we have defined B as S(KS)K, we may use B in further definitions as an abbreviation, and we do that primarily to reduce the size of the resulting terms as well as to preserve the transparency of the definitions.

The following list gives definitions for the rest of the well-known combinators. (Here ‘=’ is placed between a definiendum and a definiens.)

I = SK(KK) T = B(SI)K       C = B(T(BBT))(BBT)
W = CSI M = SII Y = B(WI)(BWB)
J = W(BC(B(BC)(B(BB)(BB))))       B′ = CB V = BCT

The definitions are easily seen to imply that all these combinators depend on both S and K, but it is not obvious from the definitions that the defined combinators are mutually independent, that is, that none of the listed combinators is definable from another one. (Clearly, some subsets suffice to define some of the combinators.) We do not intend to give an exhaustive list of interdefinability between various subsets of these combinators, but to hint at the multiplicity and intricacy of such definitions, we list a handful of them. We also introduce two further combinators S′ and R.

I = SKK I = WK I = CK(KK)
B = CB′ S′ = CS S = CS′
W = S′I W = B(T(BM(BBT)))(BBT)       W = C(S(CC)(CC))
R = BBT       Y = BM(CBM) Y = B′(B′M)M

If the fixed point combinator Y is not taken to be a primitive, then there are various ways to define it.

Fixed point theorem. For any function M, there is a term N such that MN = N.

The proof of this theorem is easy using a fixed point combinator, because a term that can play the rôle of N is YM.

Some of the definitions of Y have slightly different properties with respect to reduction. But the importance of the fixed point combinator is that it ensures that all functions have a fixed point and all recursive equations can be solved.

Curry and Turing both defined fixed point combinators (in the λ-calculus). If we consider the definitions

Y1 = BM(BWB)          Y2 = W(B(BW(BT)))(W(B(BW(BT))))

(where the subscripts are to distinguish the two definitions), then we can see that Y1M  =  M(Y1M), but for Y2, Y2M  ≥  M(Y2M) holds too. In this respect, Y1 is similar to Curry's fixed point combinator (and really, to any fixed point combinator), whereas Y2 is like Turing's fixed point combinator.

The fixed point theorem demonstrates the expressive power of CL to some extent. However, fixed point combinators may be defined from bases without a cancellator (as Y1 and Y2 show). The full power of CL (with the base { S, K }) is enunciated by the following theorem.

Theorem. (Combinatorial completeness) If f(x1, …, xn) = M (where M is a term containing no other variables than those explicitly listed), then there is a combinator X such that Xx1xn reduces to M.

The theorem's assumption may be strengthened to exclude the possibility that some x's do not occur in M. Then the consequent may be strengthened by adding the qualification that X is a relevant combinator, that is, X is a combinator over a base that does not contain a combinator with cancellative effect, or equivalently, X is a combinator over { B,C, W, I } or { I, J }. (These bases correspond to Church's preferred λI-calculus.)

Combinatorial completeness is usually proven via defining a “pseudo” λ-abstraction (or bracket abstraction) in CL. There are various algorithms to define a bracket abstraction operator in CL, that behaves as the λ operator does in a λ-calculus. This operator is usually denoted by [ ] or by λ*. The algorithms differ from each other in various aspects: (i) the set of combinators they presuppose, (ii) the length of the resulting terms, (iii) whether they compose into (syntactic) identity with the algorithm that translates a combinatory term into a λ-term, and (iv) if they commute with either of the reductions or equalities.

The first algorithm, the elements of which may be found already in Schönfinkel (1924 (1967)), consists of the following clauses that are applied in the order of their listing.

(k)   [x].M = KM, where x ∉ fv(M),
(i)    [x].x = I,
(η)   [x].Mx = M, where x ∉ fv(M),
(b)   [x].MN = BM([x].N), where x ∉ fv(M),
(c)   [x].MN = C([x].M)N, where x ∉ fv(N),
(s)   [x].MN = S([x].M)([x].N).

For example, if this algorithm is applied to the term λxyz.x(yz) (that is, to the λ-translation of B), then the resulting term is B. However, if η is omitted then a much longer term results, namely, C(BB(BBI))(C(BBI)I). Another algorithm consists of clauses (i), (k) and (s).

3. Nonclassical logics and typed CL

3.1 Simple types

Combinatory terms are thought of as functions, and functions are thought to have a domain (a set of possible inputs) and a codomain (a set of possible outputs). For example, if a unary function is considered as a set of ordered pairs, then the domain and codomain are given by the first and second projections, respectively. Category theory, where functions are components of categories (without a set theoretic reduction assumed), retains the notions of a domain and a codomain; moreover, every function has a unique domain and codomain.

Functions that have the same domain and codomain may be quite different, however, by abstraction, they are of the same sort or type. As a simple illustration, let f1 and f2 be two functions defined as f1 = λx.8·x and f2 = λx.x/3. If x is a variable ranging over reals, then f1 and f2 have the same domain and codomain (i.e., they have the same type ℜ → ℜ), although f1 ≠ f2 whenever x ≠ 0. The usual notation to indicate that a function f has A as its domain and B as its codomain is f : A → B. It is a happy coincidence that nowadays ‘→’ is often used in logics as a symbol for entailment or nonclassical implication.

Given a set of basic types (that we denote by P), types are defined as follows.

  1. If pP then p is a type,
  2. if A, B are types then (A → B) is a type.

To distinguish these types from other types—some of which are introduced in the next section—they are called simple types.

The connection between combinators and types may be explained on the example of the identity combinator. Compound combinatory terms are formed by the application operation. Premises of modus ponens can be joined by fusion (denoted by circ) that in the strongest relevance logic is like the application operation. Ix ⊳ x and so if x's type is A, then Ix's type should imply A. Furthermore, Ix's type should be of the form XcircA, for some type X; then I can be of type A → A. In the example, we fixed x's type, however, I can be applied to any term, hence, it is more accurate to say that A → A is the type schema of I, or that I's type can be any formula of the form of self-implication.

The type-assignment system TACL is formally defined as the following deduction system. (When implicational formulas are considered as types, the usual convention is to omit parentheses by association to the right.)

Δ ⊢ S : (A → B → C)  →  (A → B)  → (A → C)
  Δ ⊢ K : A  →  B → A
Δ ⊢ M : A → B     Θ ⊢ N : A
Δ, Θ ⊢ MN : B

Expressions of the form M : A above are called type assignments. A characteristic feature of type-assignment systems is that if M : A is provable then A is considered to be one of the types that can be assigned to M. However, a provable assignment does not preclude other types from becoming associated to the same term M, that is a type assignment does not fix the type of a term rigidly. Δ and Θ on the left-hand side of ⊢ are sets of type assignments to variables, and they are assumed to be consistent—meaning that no variable may be assigned two or more types.

Type assignment systems are often called Curry-style typing systems. Another way to type terms is by fixing a type for each term, in which case each term has exactly one type. Such calculi are called Church-style typing systems. Then, for example, the identity combinator I of type (AA → A)  →  AA → A is not the same as the identity combinator I of type ((B → B) → B)  →  (B → B) → B. The two styles of typing have quite a lot in common, but there are certain differences between them. In particular, no self-application is typable in a Church-style typing system, whereas some of those terms can be assigned a type in a Curry-style typing system. Curry-style typing systems proved very useful in establishing various properties of CL and λ-calculi. The Church-style typing, however, emulates the translation of a concrete functional program (without objects) more closely.

There is no 1–1 correspondence between types and combinators in either style of typing: not all combinators can be assigned a type, and some implicational formulas cannot be assigned to any combinatory term. A combinator that can be assigned a type is said to be typable, and a type that can be assigned to a combinator is said to be inhabited. For instance, M has no type, because an implicational formula is never identical to its own antecedent. On the other hand, ((A → B) → A) → A is not the type of any combinator in the type assignment system TACL. Despite (or, indeed, due to) the discrepancy between implicational formulas and combinatory terms, classes of implicational formulas that can be assigned to certain sets of combinatory terms coincide with some important logics.

Theorem. A → B is a theorem of J, the intuitionistic implicational logic iff M : A → B is a provable type assignment in TACL, where the term M is built from S and K, that is, M is a combinator.

A combinator that inhabits an implicational theorem encodes a proof of that theorem in the deduction system TACL. There is an algorithm to recover the formulas that constitute a proof of the type of the combinator, moreover, the algorithm produces a proof that is minimal and well-structured. The correspondence between implicational theorems of intuitionistic logic (and their proofs) and typable closed λ-terms (or combinators) is called the Curry–Howard isomorphism. The usual notion of a proof in a Hilbert-style axiomatic system is quite lax, but it can be tidied up to obtain the notion of traversing proofs. In a traversing proof there is a 1–1 correspondence between subterms of a combinator and the formulas in the traversing proof as well as between applications and detachments therein (cf. Bimbó 2007).

The above correspondence can be modified for other implicational logics and combinatory bases. The next theorem lists correspondences that obtain between the implicational fragments of the relevance logics R and T and some combinatory bases that are of interest in themselves.

Theorem. A → B is a theorem of R (or T) iff M : A → B is a provable type assignment where M is a combinator over { B, I, W, C } (or over { B, B′, I, S, S′ }).

The calculus TACL may be amended by adding axiom schemas for the combinators in the two bases. (The axiom schemas of the combinators that are not in these bases may be omitted from the calculus or simply may be neglected in proofs.) The new axioms are as follows.

B :  (A → B)  →  (C → A)  →  C → B
B′ :  (A → B)  →  (B → C)  →  A → C
C :  (A → B → C)  →  B → A → C
W :  (A → A  →  B) → A → B
S′ :  (A → B)  →  (A  → B → C)  →  A → C
I :  A →  A

The combinatory base { B, C, W, I } is especially interesting, because these combinators suffice for a definition of a bracket abstraction that is equivalent to the abstraction of the λI-calculus. To put it differently, all functions that depend on all of their arguments can be defined by this base. The other base allows the definition of functions that can be described by terms in the class of the so-called hereditary right maximal terms (cf. Bimbó 2005a). Informally, the idea behind these terms is that functions can be enumerated, and then their successive applications should form a sequence in which the indexes are “globally increasing.”

A type assignment has two parts: a term and a formula. The questions whether some term can be assigned a type and whether some type can be assigned to a term are the problems of typability and of inhabitation, respectively. Although these questions may be posed about one and the same set of type assignments, the computational properties of these problems may differ widely.

Theorem. It is decidable if a term M can be assigned a type, that is, if M is typable.

The theorem is stated in a rather general way without specifying exactly which combinatory base or which modification of TACL is assumed, because the theorem holds for any combinatory base. Indeed, there is an algorithm that given a combinator decides if the combinator is typable, and for a typable combinator produces a type too. Of course, in the combinatorially complete base { S, K } all the combinators are expressible as terms consisting of these two combinators only. However, this assumption is not needed for a solution of typability, though it might provide an explanation for the existence of a general algorithm.

The problem of inhabitation does not have a similar general solution, because the problem of the equality of combinatory terms is undecidable. Given a set of axiom schemas that are types of combinators with detachment as the rule of inference, the problem of the decidability of a logic can be viewed as the problem of inhabitation. Indeed, if A is an implicational formula, then to decide whether A is a theorem amounts to determining if there is a term (over the base that corresponds to the axiom schemas) that can be assigned A as its type. (Of course, a more sophisticated algorithm may actually produce such a term, in which case it is easy to verify the correctness of the claim by reconstructing the proof of the theorem.)

To see from where complications can emerge in the case of decidability, we compare the rule of the formation of terms and the rule of detachment. Given a combinatory base and a denumerable set of variables, it is decidable by inspection whether a term is or is not in the set of the generated terms. That is, all the inputs of the rule are retained in the output as subterms of the resulting term. In contrast, an application of detachment results in a formula that is a proper subformula of the major premise (and in the exceptional case when the major premise is an instance of self-identity it is identical to the minor premise). The lack of the retention of all subformulas of premises through applications of modus ponens is the culprit behind the difficulty of some of the decision problems of implicational logics. A solution to the problem of inhabitation may run into similar difficulties.

For example, the combinator K can be assigned the following type.

p  →  ((q → ((q → (q → q)) → ((q → q) → (q → q))))  →  p)

SKK can be assigned the type p → p. There is a proof in TACL ending in SKK : p → p that does not contain the long formula above. However, there is a proof of SKK : p → p that contains the above formula the second antecedent of which is not a subformula of p → p, indeed, the sets of the subformulas of the two formulas are disjoint. Some important cases of the problem of inhabitation, however, are decidable.

Theorem. It is decidable if a type has an inhabitant over the base { S, K }.

This theorem amounts to the typed version of the decidability of the implicational fragment of intuitionistic logic that is part of Gentzen's decidability results (dating from 1935).

Theorem. It is decidable if a type has an inhabitant over the base { I, C, B′, W }.

The theorem is the typed equivalent of the decidability of the implicational fragment of the logic of relevant implication. The decidability of R was proven by Kripke in 1959 together with the decidability of the closely related E (the implicational fragment of the logic of entailment).

The rule of substitution is built-in into the formulation of TACL via the rule schema called detachment and the axiom schemas for the basic combinators. It is obvious that there are formulas of least complexity that are types of S and K, such that all the other types of S and K are their substitution instances. A formula that has this property is called a principal type of a combinator. Obviously, a combinator that has a principal type, has denumerably many principal types, which are all substitution instances of each other; hence it is justified to talk about the principal type schema of a combinator. The existence of principal types for complex combinators is not obvious, nevertheless, obtains.

Theorem. If the term M is typable, then M has a principal type and a principal type schema.

Principal types and principal type schemas may seem now to be interchangeable everywhere. Thus we could take a slightly different approach and define TACL to include axioms and the rule schema of detachment together with the rule of substitution. This version of TACL would assume the following form.

   Δ ⊢  S : (pqs)  →  (pq)  →  ps   
Δ ⊢  K : q  →  sq
Δ ⊢  M : AB     Θ ⊢  N : A
Δ, Θ ⊢  MN : B
Δ ⊢  M : A
Δ[P/B] ⊢  M : A[P/B]

where P ranges over propositional variables. (The substitution notation is extended—in the obvious way—to sets of type assignments.) Clearly, the two deduction systems are equivalent.

If substitution were dropped altogether, then the applicability of detachment would become extremely limited, for instance, SK no longer would be typable. A compromise between having substitution everywhere and having no substitution at all is to modify the detachment rule so that that includes as much substitution as necessary to ensure the applicability of the detachment rule. Such a rule (without combinatory terms or type assignments) was invented in the 1950s by C. A. Meredith, and it is usually called condensed detachment. The key to the applicability of detachment is to find a common substitution instance of the minor premise and of the antecedent of the major premise. This step is called unification.

Notice that it is always possible to choose substitution instances of a pair of formulas so that the sets of their propositional variables are disjoint, because formulas are finite objects. The most general common instance of two formulas A and B (that do not share a propositional variable) is C, where C is a substitution instance of both A and B, and propositional variables are identified by the substitutions only if the identification is necessary to obtain a formula that is a substitution instance of both A and B. The unification theorem (specialized to simple types) implies that if two formulas A and B have a common instance then there is a formula C such that all their common instances are substitution instances of C. Obviously, a pair of formulas either has no common instance at all, or it has ℵ0 many most general common instances.

A famous example of a pair of formulas that have no common instance is A → A and A → A → B. The instances p → p and q → q → r share no propositional variables, however, neither q → q nor (q → r) → q  → r matches the shape of the second formula. To put the problem differently, q and q → r would have to be unified, but they cannot be unified no matter what formula is substituted for q. An immediate consequence of this is that WI is not typable.

On the other hand, (rr) → rr and ((ss) → s → s) → (ss) → ss are substitution instances of pp and of qq. Furthermore, all simple types are substitution instances of a propositional variable, hence II can be assigned both the type rr and the type (ss) → ss—and, of course, the latter happens to be an instance of the former because AA is the principal type schema of II. If we apply condensed detachment to pp and qq, then we get qq (via the substitutions [p/qq] and [q/q]), and so condensed detachment yields the principal type of II. Incidentally, II and I provide an excellent example to illustrate that distinct terms may have the same principal type schema.

Condensed detachment has been used extensively to refine axiomatizations of various implicational logics, especially, in search for shorter and fewer axioms. Some logics may be formulated using axioms (rather than axiom schemas) together with the rule of condensed detachment without loss of theorems. All the logics that we mentioned so far (J, R, T and E) are D-complete, that is, they all may be axiomatized by axioms and the rule of condensed detachment. In other words, the implicational fragments of classical and intuitionistic logics, and the implicational fragments of the relevance logics R, E and T are all D-complete. (See Bimbó (2007) for some further technical details.)

Simply typed systems has been extended in various directions. Logics often contain connectives beyond implication. It is a natural modification of a type assignment system to expand the set of types via including further type constructors. Conjunction and fusion are the easiest to explain or motivate as type constructors, however, disjunction and backward implication have been introduced into types too. Types are useful, because they allow us to get a grip on classes of terms from the point of view of their behavior with respect to reduction.

Tait's theorem. If a combinatory term M is typable (with simple types) then M strongly normalizes, that is, all reduction sequences of M are finite (i.e., terminate).

The converse of this claim is, obviously, not true. For example, WI strongly normalizes but untypable, because the antecedent of contraction cannot be unified with any instance of identity. The aim to extend the set of typable terms led to the introduction of ∧ into types.

3.2 Intersection types

A different way to look at the problem of typing WI is to say that W should have a type similar to the formula A → (A → B).  →  A → B, but in which the formulas in place of the two formulas A and A → B in the antecedent can be unified. This is the motivation for the inclusion of conjunction (∧) and top (top) as new type constructors.

An extended type system, that is often called the intersection type discipline, is due to Coppo and Dezani-Ciancaglini. The set of intersection types (denoted by wff) is defined as follows.

  1. p ∈ wff if p is a propositional variable,
  2. top ∈ wff, where top is a constant proposition,
  3. A, B ∈ wff implies (A → B), (A ∧ B) ∈ wff.

Of course, if TACL is augmented with an expanded set of types, then new instances of the previously assigned types become available. However, the gist of having types with type constructors ∧ and top is that the set of types has a richer structure than the relationships between types determined by the rules of substitution and modus ponens.

The structure of intersection types is described by the conjunction–implication fragment of B, the basic relevance logic. In the following presentation of this logic, ≤ is the main connective (an implication) of a formula and ⇒ separates the premises and the conclusion of an inference rule.

A ≤ A A ≤ top top ≤ top → top
  A ≤ A ∧ A     A ∧ B ≤ A     A ∧ B ≤ B  
A ≤ B,  B ≤ C  ⇒  A ≤ C
A ≤ B,  C ≤ D  ⇒  A ∧ C ≤ B ∧ D
(A → B) ∧ (A → C)  ⇒  (A → (B ∧ C))
A ≤ B,  C ≤ D  ⇒  B → C ≤ A → D

The axiom schemas for the combinators S, K and I are as follows. Note that the axiom for S is not simply a substitution instance (with new connectives included) of the previous axiom for S.

Δ ⊢ S : (A → B → C)  →  (D → B)  →  (A ∧ D) → C
      Δ ⊢ K : A → B → A       Δ ⊢ I : A → A

There are four new rules added, and there is an axiom for top.

Δ ⊢  M : A    Δ ⊢  M : B
Δ ⊢  M : A ∧ B
Δ ⊢  M : A ∧ B
Δ ⊢  M : A
Δ ⊢  M : A ∧ B
Δ ⊢  M : B
Δ ⊢  M : A    A ≤ B
Δ ⊢  M : B
             Δ ⊢  M : top

This type assignment system is equivalent to the intersection type assignment system for the λ-calculus, and it allows a more precise characterization of classes of terms with respect to the termination of reduction sequences.

(1)   A term M is normalizable whenever M is typable (in the last type assignment system).
(2)   A term M is strongly normalizable whenever M is typable and the proof does not contain top.

4. Models

CL has various kinds of models, three of which are outlined in this section. Algebraic models (often called “term models”) may be constructed without difficulty for both the inequational and the equational systems of CL that were introduced in section 2.1. The set of terms forms an algebra, and given a suitable equivalence relation (that is also a congruence), the application operation can be lifted to the equivalence classes of terms in the standard way. The quasi-inequational characterization of the so obtained algebra provides the basis for an algebraic semantics for these logics. Isolating the Lindenbaum algebra and verifying that it is not a trivial algebra constitutes a consistency proof for CL≥ and CL=.

4.1 Scott's model

The earliest model for a typefree applicative system as a function space was given by Dana Scott in the late 60s. The following is an outline of some of the key elements of his construction.

In pure typefree CL, an expression of the form MM is a well-formed term. Moreover, terms of this form can enter into provable equations and inequations in multiple ways. For example, xx = xx is an axiom, and by one of the rules, y(xx) = y(xx) is provable too. A more interesting occurrence of a term of the form MM can be seen in the provable inequation S(SKK)(SKK)x ≥ xx.

The set-theoretic reduction of a function yields a set of pairs (in general, a set of tuples). In set theory (assuming well-foundedness, of course) a pair (e.g., { { a }, { a, b } }) is never identical to either of its two elements. Therefore, the main question concerning a mathematical model of CL is how to deal with self-application.

Scott's original model is built starting with a complete lattice (D, ≤). That is, (D, ≤) is a partially ordered set in which greatest lower bounds (infima) and least upper bounds (suprema) exist for arbitrary sets of elements. A function f from (D, ≤) into a complete lattice (E, ≤) is said to be continuous when it preserves the supremum of each ideal on D, where an ideal is an upward directed downward closed subset.

A topology may be defined on D by selecting certain increasing sets as the opens. More precisely, if I is an ideal and C is a cone, then C is open iff C ∩ I  ≠ ∅ provided that IC, that is, provided that the supremum of I is an element of C. (For example, complements of principal ideals are open.) f turns out to be continuous in the usual topological sense, that is, the inverse image of an open set being open, when D and E are taken together with their topologies. This motivates the earlier labeling of these functions as continuous. Notably, all continuous functions are monotone.

For the purposes of modeling functions in CL, the interesting functions are those that are continuous on D. However, these functions by themselves are not sufficient to obtain a modeling of self-application, because none of them has as its domain a set of functions—as D is not assumed to be a function space. The solution starts with defining a hierarchy of function spaces D0, D1, D2, … so that each function space Dn, is a complete lattice on which continuous functions may be defined (creating the function space Dn+1). The importance of selecting continuous functions is that the emerging function space has the same cardinality as the underlying set, which allows defining embeddings between function spaces adjacent within the hierarchy.

The hierarchy of all the function spaces Dn may be accumulated together. A standard construction in model theory is to form the disjoint union of structures. (Disjointness can always be guaranteed by indexing the carrier sets of the structures.) Scott defined D to be the disjoint union of the function spaces Dn, for all n, except that the extremal elements are “glued together.” (More formally, the top elements and the bottom elements of the function spaces, respectively, are identified with each other.) D is a complete lattice, and by Tarski's fixed point theorem, a continuous function that maps D into D has a fixed point, which implies that D is isomorphic to D → D.

The above construction may also be conceptualized in terms of strings and Cartesian products. The back-and-forth moves between functions of one and more than one variable—the “uncurrying” and “currying” of functions—algebraically corresponds to the two directions of residuation. For example, a function f : A × B → C may be represented by a function f ′ : A →  B → C, and vice versa. Thus, without loss of generality, it is sufficient to consider unary functions. If a is a fixed element of the function space D, then x = (a, x) holds when x is the fixed point of a. In terms of tuples, the solution may be viewed as the infinite tuple (a, (a, (a, ….

4.2 Relational semantics

A further model that we briefly outline falls into the set theoretical semantics paradigm of nonclassical logic and it is due to Dunn and Meyer. The connection between CL and nonclassical logics is way stonger than the link between CL and classical logic. In particular, the CL≥ and CL= calculi are nonclassical logics. Set theoretical semantics in which the intensional connectives are modeled from relations on a collection of situations have been the preferred interpretation of nonclassical logics since the early 1960s. These sorts of semantics are sometimes called “Kripke semantics” (after Kripke who defined such semantics for some normal modal logics) or “gaggle semantics” (after the pronunciation of the abbreviation ‘ggl’ that stands for “generalized Galois logics”).

A model for CL≥ may be defined as follows. Let (W, ≤, R, S, K, v) comprise a (nonempty) partially ordered set (W, ≤) with a three-place relation R on W, and let S, KW. Furthermore, for any α, β, γ, δ ∈ W, the conditions (s) and (k) are true.

(s)   ∃ζ1, ζ2, ζ3WRSαζ1  ∧  Rζ1βζ2  ∧  Rζ2γδ
∃ζ1, ζ2, ζ3WRαγζ1  ∧  Rβγζ2  ∧  Rζ1ζ2δ,
(k)   ∃ζ1WRKαζ1  ∧  Rζ1βγ
α ≤ γ.

The ternary relation is stipulated to be antitone in its first two argument places and monotone in the third. These components define a frame for CL≥. The valuation function v maps variables x, y, z, … into (nonempty) cones on W, and it maps the two primitive combinators S and K into the cones generated by S and K, respectively. Recall, that the standard notation in CL hides application, a binary operation that allows forming compound terms. The next clause extends v to compound terms and makes this operation explicit again.

v(MN) = { β : ∃α, γ. Rαγβ  ∧  α ∈ v(M)  ∧  γ ∈ v(N) }

An inequation M ≥ N is valid if v(M)  ⊆  v(N) under all valuations on frames for CL≥. (That is, the relationship between the interpretations of the two terms is invariant whenever v is varied on the set of variables.)

Informally, the underlying set W is a set of situations, and R is an accessibility relation connecting situations. All the terms are interpreted as sets of situations, and function application is the existential image operation derived from R. A difference from the previous model is that the result of an application of a term to a term is not determined by the objects themselves that interpret the two terms—rather the application operation is defined from R.

This semantics parallels the possible worlds semantics for normal modal logics. Therefore, it is important to note that the situations are not maximally consistent theories, but theories possessing the property that for any pair of formulas they contain a formula that implies both of them. Equivalently, the situations may be taken to be dual ideals on the Lindenbaum algebra of CL≥. These situations are typically consistent in the sense that they do not contain all the terms in all but one case. (The notion of negation consistency, of course, cannot be defined for CL≥ (or for CL=).)

Relational semantics can be defined for CL= along similar lines. Then soundness and completeness—that is, the following theorem—obtains.

(1)   An inequation M ≥ N is provable in CL≥ if and only if v(M) ⊆ v(N) in any model for CL≥.
(2)   An equation M = N is provable in CL= if and only if v(M) = v(N) in any model for CL=.

Relational and operational semantics for systems of CL that include dual and symmetric combinators can be found in Bimbó (2004).

5. Computational functions and arithmetic

A remarkable feature of CL is that despite its seeming simplicity it is a powerful formalism. Of course, the strength of CL cannot be appreciated without discovering certain relationships between combinatory terms or without an illustration that computable functions are definable.

An important beginning step in the formalization of mathematics is the formalization of arithmetic, that was first achieved by the Dedekind–Peano axiomatization. There are various ways to formalize arithmetic in CL; two representations of numbers are described in this section together with some functions on them.

Numbers may be thought to be objects (or abstract objects) of some sort. (Here by numbers we mean natural numbers, that is, 0 and the positive integers.) Numbers are characterized, first of all, by the structure they possess as a set. This structure possesses properties such as 0 ≠1, and that the sum of n and m is the same number as the sum of m and n. A less elementary property is, for example, the existence of infinitely many prime numbers.

Numbers can be represented in CL by terms, and one way is to choose the terms KI, I, SBI, SB(SBI), … for 0, 1, 2, 3, etc. Depending on which terms stand for the numbers, the terms that represent the arithmetic operations vary. Note that unlike the Dedekind–Peano formalization of arithmetic, CL makes no syntactic distinction that would parallel that between individual constants and function symbols, because in CL the only objects are terms. The above list of terms already shows the successor function, that is SB. (SB(KI) strongly equals to I, that is, 1 is the successor of 0.)

Addition is the term BS(BB), and multiplication is the term B. The usual recursive definition of multiplication relying on addition may induce the idea that addition should be a simpler operation than multiplication. However, in CL the numbers themselves are functions, and so they have properties that allows B—a simpler looking term—to be chosen for the function that is often perceived to be more complex than addition. As a classical example, we may consider the term BII, that is strongly equal to I, that is, 1 × 1 = 1—as expected.

Another way to represent numbers in CL is to start with a different choice of terms for the numbers. Previously, I stood for 1, now we take I to be 0. The successor of a number n is V(KI)n, where the second occurrence of n indicates a numeral, that is, the combinator that represents n. In other words, the successor function is V(KI). (The numeral for n is often denoted—more precisely—by an overlined or otherwise decorated n. However, the double usage in a limited context should not cause any confusion.) Notice that the numbers in the present representation are terms over a more restricted combinatory base than in the former case. For example, no combinator with duplicative effect is definable from { I, K, V }.

Some simple recursive functions may be defined as follows. The predecessor function P on numbers is “−1” (i.e., subtracting one) for all numbers greater than or equal to 1, and the predecessor of 0 is defined to be 0. The next term defines the predecessor function which is abbreviated by P.

P = C(W(BB(C(TK)I)))(KI)

If n is a numeral, then Pn reduces to nKI(n(KI)), which suggests that for positive numbers P could have been defined to be the term T(KI) because T(KI)n reduces to n−1 whenever n is of the form V(KI)(n−1).

Some models of computation (such as register machines) and certain programming languages (explicitly) include a test for zero. It is useful to define a function Z such that Znxy reduces to x if n is zero, whereas Znxy reduces to y when n is positive. Znxy may be thought of as the conditional instruction “If n = 0 then x else y,” where x and y are themselves functions. (Of course, in the pseudo-code one should have assumed that n is of integer type and cannot take a negative value, that can be guaranteed by a declaration of variables and an additional conditional statement.) The following definition works for branching on zero.

Z = TK

TKnxy = nKxy, and if n is zero, that is, n = I, then by another step Kxy and then x results; whereas if n is positive, then after a few more reductions, one gets KIxy, that is, y. The two terms, Kxy and KIxy, hint toward an interpretation of K and KI as truth and falsity, or they can be viewed as terms that can select, respectively, the first or the second argument. These ideas may be further developed into definitions of truth functions and a representation of tuples.

Addition may be defined by the recursive equation + mn = Zmn(V(KI)( + (Pm)n)), where m and n are numerals, and P and Z are the already defined functions. (The abbreviations are used to enhance the readability of the terms—they can be replaced everywhere by the defining combinators.) To put into words, if m is 0 then the sum is n, otherwise m + n is the successor of (m−1) + n. Of course, this definition closely simulates the definition of addition from recursion theory, where addition is often defined by the two equations + (0, n) = n and + (s(m), n) = s( + (m, n)) (with s denoting the successor function). The fact that CL can express addition in this form shows—once again—the versatility of CL.

Combinatorial completeness guarantees that the term on the right hand side of the equation (i.e., the term Zmn(V(KI)( + (Pm)n))) can be transformed into a term in which + is the first, m and n are the second and third arguments, respectively. Then + can be defined explicitly as the fixed point of the combinator


Of course, we can abbreviate the so obtained term as + for the sake of transparency, just as we could use P and Z as shorthands for longer combinatory terms.

Multiplication is often denoted by  · . The recursive equation  · mn = ZmI( + m( · (Pm)n)) defines multiplication and it can be deciphered as “if m is 0 then the result is 0, else m is added to the result of (m−1) · n.” The next step in the definition brings the right-hand side term to the form X · mn, where X does not contain occurrences of  · , m or n. Then taking the fixed point of X, and setting  ·  to be YX concludes the definition of the multiplication function. For instance, the abstraction can yield the combinator


The factorial function is definable from the predecessor function plus from multiplication, and it is useful e.g., in combinatorics. The factorial function (denoted by  ! ) is recursively definable by the equation  ! m = Zm(V(KI)I)( · m( ! (Pm))), that may be read as “if m is 0, then  ! m = 1, otherwise  ! m equals to m multiplied by the factorial of m−1.”

Of course, it is not essential to define various numerical functions by simulating their recursive definitions. As we saw above in the case of the first representation of numbers, we might just happen to have the right terms such as BS(BB) and B, that behave as the target functions do on numbers. An equally good way to define arithmetic functions is to simply list some terms and then show that they behave as expected. However, once it has been shown that the basic primitive recursive functions together with recursion and minimization can be emulated in CL, we have got not only a nice collection of arithmetic functions in the form of combinators to work with, but also a proof that combinatory logic is sufficiently expressive to formalize all partial recursive functions. Indeed, the remaining steps of such a proof can be carried out in CL, though the details are beyond the scope of this brief entry.

5.1 Gödel sentence

The abbreviations and the interspersed explanations in the sketch above may obscure that arithmetic has been formalized in a language that consists of five symbols (when juxtaposition is not counted): S, K, = plus two delimiters,  (  and  ) . The finite (and perhaps, surprisingly small) number of symbols and the availability of recursive functions conjure the thought that an arithmetization of the syntax of CL could be attempted.

Gödel achieved an encoding of a formal language by assigning numbers to symbols, formulas and sequences of formulas, which later became known as “Gödel numbers.” Gödel relied on the prime factorization theorem from algebra, however, it is possible to arithmetize the language of CL without any similar strong assumptions. (See for example, Smullyan (1985) and Smullyan (1994).) The five symbols get as their Gödel numbers the first five positive integers. A string is assigned the number in base 10 that results from the concatenation of the corresponding numbers for the symbols.

The following outline gives the flavor of an analogue of Gödel's incompleteness proof for CL. It is possible to define a combinator such that if this combinator is applied to a numeral n, then the whole term reduces to the numeral m that is the numeral denoting the Gödel number of the numeral n. Slightly more formally, there is a combinator δ such that δn = G(n) (where G(n) denotes the Gödel number of the expression n). Furthermore, there is a combinatory term, which returns the numeral itself followed by G(n), when applied to a numeral n. For any term A there is a term B such that the equation AB) = B is true. This statement (or its concrete variant for a particular formal system) is usually called the second fixed point theorem. Computable characteristic functions of sets of numbers can be represented by combinators with the choice of K for truth and KI for falsity. The complements of such functions are computable too. Finally, it can be proven that there is no combinator that represents the set of all true equations. To put it differently, any combinator either represents a set of equations that fails to include some true equations, or represents a set of equations that includes all true but also some false equations.

Church proved the undecidability of classical first-order logic relying on Gödel's incompleteness theorem. Scott proved that if A is a nonempty proper subset of λ-terms that is closed under equality then A is not recursive. The analogous claim for CL, that follows from the existence of a Gödelian sentence for CL, is that it is undecidable if two terms of CL are equal.


I am grateful to the referees—both to the “internal referees” of this Encyclopedia and to J. P. Seldin—for their helpful comments and suggestions, as well as certain corrections. Heeding the advice of the referees, I repeatedly tried to make the entry less technical and more readable.


Due to obvious limitations of size, only some representative publications are listed here. More comprehensive bibliographies may be found in Curry, Hindley and Seldin 1972, Hindley 1997, Anderson, Belnap and Dunn 1992, Terese 2003 as well as Hindley and Seldin 2008.

Other Internet Resources

Related Entries

Church, Alonzo | Gödel, Kurt | Gödel: incompleteness theorem | Turing, Alan | algebra | category theory | computability | function: recursive | logic: intuitionistic | logic: linear | logic: nonclassical | logic: relevance | logic: substructural | ontology and ontological commitment | Peano, Giuseppi | reasoning: automated | type theory