First published Fri Oct 26, 2012

Philosophy of Information deals with the philosophical analysis of the notion of information both from a historical and a systematic perspective. With the emergence of empiricist theory of knowledge in early modern philosophy, the development of various mathematical theories of information in the 20th century and the rise of information technology, the concept of ‘information’ has conquered a central place in the sciences and in society. This interest also led to the emergence of a separate branch of philosophy that analyzes information in all its guises (Adriaans and van Benthem 2008a,b; Lenski 2010; Floridi 2002, 2011). Information has become a central category in both the sciences and the humanities and the reflection on information influences a broad range of philosophical disciplines varying from logic (Dretske 1981; van Benthem en van Rooij 2003; van Benthem 2006) to ethics (Floridi 1999) and esthetics (Schmidhuber 1997a; Adriaans 2008) to ontology (Zuse 1969; Wheeler 1990; Schmidhuber 1997b; Wolfram 2002; Hutter 2010).

The term ‘information’ in colloquial speech is currently predominantly used as an abstract mass-noun used to denote any amount of data, code or text that is stored, sent, received or manipulated in any medium. The detailed history of both the term ‘information’ and the various concepts that come with it is complex and for the larger part still has to be written (Seiffert 1968; Schnelle 1976; Capurro 1978, 2009; Capurro and Hjørland 2003). The exact meaning of the term ‘information’ varies in different philosophical traditions and its colloquial use varies geographically and over different pragmatic contexts. Although an analysis of the notion of information has been a theme in Western philosophy from its early inception, the explicit analysis of information as a philosophical concept is recent, and dates back to the second half of the 20th century. Historically the study of the concept of information can be understood as an effort to make the extensive properties of human knowledge measurable. In the 20th century various proposals for formalization of concepts of information were made:

  1. Fisher information: the amount of information that an observable random variable X carries about an unknown parameter θ upon which the probability of X depends (Fisher 1925).
  2. Shannon information: the entropy, H, of a discrete random variable X is a measure of the amount of uncertainty associated with the value of X (Shannon 1948; Shannon & Weaver 1949).
  3. Kolmogorov complexity: the information in a binary string x is the length of the shortest program p that produces x on a reference universal Turing machine U (Solomonoff 1960, 1964a,b, 1997; Kolmogorov 1965; Chaitin 1969, 1987).
  4. Quantum Information: The qubit is a generalization of the classical bit and is described by a quantum state in a two-state quantum-mechanical system, which is formally equivalent to a two-dimensional vector space over the complex numbers (Von Neumann 1955; Redei & Stoeltzner 2001).
  5. Information as a state of an agent: the formal logical treatment of notions like knowledge and belief was initiated by Hintikka (1962, 1973). Dretske (1981) and van Benthem & van Rooij (2003) studied these notions in the context of information theory, cf. van Rooij (2004) on questions and answers, or Parikh & Ramanujam (2003) on general messaging. Also Dunn seems to have this notion in mind when he defines information as “what is left of knowledge when one takes away believe, justification and truth” (Dunn 2001 pg. 423, 2008).
  6. Semantic Information: Bar-Hillel and Carnap developed a theory of semantic Information (1953). Floridi (2002, 2003, 2011) defines semantic information as well-formed, meaningful and truthful data. Formal entropy based definitions of information (Fisher, Shannon, Quantum, Kolmogorov) do not imply wellformedness or truthfulness.

The first four concepts are quantitative, the last two qualitative. These proposals can roughly be classified in terms of the nature of the definiens: Probability in the case of Fisher and Shannon Information, computation in the case of Kolmogorov complexity, quantum mechanics in the case of quantum information, true beliefs as the core concept of Semantic Information, whereas information states of agents seem to correlate with the formal notion propositions that not necessarily have to be true. The philosophical interpretation of the definiendum ‘Information’ naturally depends on the views one holds about the definiens. Until recently the possibility of a unification of these theories was generally doubted (Adriaans and van Benthem 2008a) but in the past decade conversions and reductions between various formal models have been studied (Cover and Thomas 2006; Grünwald and Vitányi 2008; Bais and Farmer 2008). The situation that seems to emerge is not unlike the concept of energy: there are various formal sub-theories about energy (kinetic, potential, electrical, chemical, nuclear) with well-defined transformations between them. Apart from that, the term ‘energy’ is used loosely in colloquial speech. There is no consensus about the exact nature of the field of philosophy of information. Some authors like Floridi (2002, 2003, 2011) present ‘Philosophy of Information’ as a completely new development with a capacity to revolutionize philosophy per se. Others (Adriaans and van Benthem 2008a; Lenski 2010) see it more as a technical discipline with deep roots in the history of philosophy and consequences for various disciplines like methodology, epistemology and ethics.

1. Information in colloquial speech

The lack of preciseness and the universal usefulness of the term ‘information’ go hand in hand. In our society, in which we explore reality by means of instruments and installations of ever increasing complexity (telescopes, cyclotrons) and communicate via more advanced media (newspapers, radio, television, SMS, the Internet), it is useful to have an abstract mass-noun for the ‘stuff’ that is created by the instruments and that ‘flows’ through these media. Historically this general meaning emerged rather late and seems to be associated with the rise of mass media and intelligence agencies (Devlin & Rosenberg 2008; Adriaans and van Benthem 2008b).

In present colloquial speech the term information is used in various loosely defined and often even conflicting ways. Most people, for instance, would consider the following inference prima facie to be valid:

“If I get the information that p then I know that p.”

The same people would probably have no problems with the statement that “Secret services sometimes distribute false information”, or with the sentence “The information provided by the witnesses of the accident was vague and conflicting”. The first statement implies that information necessarily is true, while the other statements allow for the possibility that information is false, conflicting and vague. In everyday communication these inconsistencies do not seem to create great trouble and in general it is clear from the pragmatic context what type of information is designated. These examples suffice to argue that references to our intuitions as speakers of the English language are of little help in the development of a rigorous philosophical theory of information. There seems to be no pragmatic pressure in everyday communication to converge to a more exact definition of the notion of information.

2. History of the term and the concept of information

Until the second half of the 20th century almost no modern philosopher considered ‘information’ to be an important philosophical concept. The term has no lemma in the well-known encyclopedia of Edwards (1967) and is not mentioned in Windelband (1921). In this context the interest in ‘Philosophy of Information’ is a recent development. Yet, with hindsight from the perspective of a history of ideas, reflection on the notion of ‘information’ has been a predominant theme in the history of philosophy. The reconstruction of this history is relevant for the study of information.

A problem with any ‘history of ideas’ approach is the validation of the underlying assumption that the concept one is studying has indeed continuity over the history of philosophy. In the case of the historical analysis of information one might ask whether the concept of ‘informatio’ discussed by Augustine has any connection to Shannon information, other than a resemblance of the terms. At the same time one might ask whether Locke's ‘plain historical method’ is an important contribution to the emergence of the modern concept of information although in his writings Locke hardly uses the term ‘information’ in a technical sense. As is shown below, there is a conglomerate of ideas involving a notion of information that has developed from antiquity till recent times, but further study of the history of the concept of information is necessary.

An important recurring theme in the early philosophical analysis of knowledge is the paradigm of manipulating a piece of wax: either by simply deforming it, by imprinting a signet ring in it or by writing characters on it. The fact that wax can take different shapes and secondary qualities (temperature, smell, touch) while the volume (extension) stays the same, make it a rich source of analogies, natural to Greek, Roman and medieval culture, where wax was used both for sculpture, writing (wax tablets) and encaustic painting. One finds this topic in writings of such diverse authors as Democritus, Plato, Aristotle, Theophrastus, Cicero, Augustine, Avicenna, Duns Scotus, Aquinas, Descartes and Locke.

2.1 Classical philosophy

In classical philosophy ‘information’ was a technical notion associated with a theory of knowledge and ontology that originated in Plato's (427–347 BCE) theory of forms, developed in a number of his dialogues (Phaedo, Phaedrus, Symposium, Timaeus, Republic). Various imperfect individual horses in the physical world could be identified as horses, because they participated in the static atemporal and aspatial idea of ‘horseness’ in the world of ideas or forms. When later authors like Cicero (106–43 BCE) and Augustine (354–430 CE) discussed Platonic concepts in Latin they used the terms informare and informatio as a translation for technical Greek terms like eidos (essence), idea (idea), typos (type), morphe (form) and prolepsis (representation). The root ‘form’ still is recognizable in the word in-form-ation (Capurro and Hjørland 2003). Plato's theory of forms was an attempt to formulate a solution for various philosophical problems: the theory of forms mediates between a static (Parmenides, ca. 450 BCE) and a dynamic (Herakleitos, ca. 535–475 BCE) ontological conception of reality and it offers a model to the study of the theory of human knowledge. According to Theophrastus (371–287 BCE) the analogy of the wax tablet goes back to Democritos (ca. 460–380/370 BCE) (De Sensibus 50). In the Theaetetus (191c,d) Plato compares the function of our memory with a wax tablet in which our perceptions and thoughts are imprinted like a signet ring stamps impressions in wax. Note that the metaphor of imprinting symbols in wax is essentially spatial (extensive) and can not easily be reconciled with the aspatial interpretation of ideas supported by Plato.

One gets a picture of the role the notion of ‘form’ plays in classical methodology if one considers Aristotle's (384–322 BCE) doctrine of the four causes. In Aristotelian methodology understanding an object implied understanding four different aspects of it:

Material Cause:
that as the result of whose presence something comes into being—e.g., the bronze of a statue and the silver of a cup, and the classes which contain these
Formal Cause:
the form or pattern; that is, the essential formula and the classes which contain it—e.g., the ratio 2:1 and number in general is the cause of the octave-and the parts of the formula.
Efficient Cause:
The source of the first beginning of change or rest; e.g., the man who plans is a cause, and the father is the cause of the child, and in general that which produces is the cause of that which is produced, and that which changes of that which is changed.
Final Cause:
The same as “end”; i.e., the final cause; e.g., as the “end” of walking is health. For why does a man walk? “To be healthy,” we say, and by saying this we consider that we have supplied the cause. (Aristotle, Metaphysics 1013a)

Note that Aristotle, who rejects Plato's theory of forms as atemporal aspatial entities, still uses ‘form’ as a technical concept. This passage states that knowing the form or structure of an object, i.e., the information, is a necessary condition for understanding it. In this sense information is a crucial aspect of classical epistemology.

The fact that the ratio 2:1 is cited as an example also illustrates the deep connection between the notion of forms and the idea that the world was governed by mathematical principles. Plato believed under influence of an older Pythagorean (Pythagoras 572–ca.500 BCE) tradition that ‘everything that emerges and happens in the world’ could be measured by means of numbers (Politicus 285a). On various occasions Aristotle mentions the fact that Plato associated ideas with numbers (Vogel 1974, pg. 139). Although formal mathematical theories about information only emerged in the 20th century, and one has to be careful not to interpret the Greek notion of a number in any modern sense, the idea that information was essentially a mathematical notion, dates back to classical philosophy: the form of an entity was conceived as a structure or pattern that could be described in terms of numbers. Such a form had both an ontological and an epistemological aspect: it explains the essence as well as the understandability of the object. The concept of information thus from the very start of philosophical reflection was already associated with epistemology, ontology and mathematics.

Two fundamental problems that are not explained by the classical theory of ideas or forms are 1) the actual act of knowing an object (i.e., if I see a horse in what way is the idea of a horse activated in my mind) and 2) the process of thinking as manipulation of ideas. Aristotle treats these issues in De Anime, invoking the signet-ring-impression-in-wax analogy:

By a ‘sense’ is meant what has the power of receiving into itself the sensible forms of things without the matter. This must be conceived of as taking place in the way in which a piece of wax takes on the impress of a signet-ring without the iron or gold; we say that what produces the impression is a signet of bronze or gold, but its particular metallic constitution makes no difference: in a similar way the sense is affected by what is coloured or flavoured or sounding, but it is indifferent what in each case the substance is; what alone matters is what quality it has, i.e., in what ratio its constituents are combined. (De Anime, Book II, Chp. 12)

Have not we already disposed of the difficulty about interaction involving a common element, when we said that mind is in a sense potentially whatever is thinkable, though actually it is nothing until it has thought? What it thinks must be in it just as characters may be said to be on a writing-tablet on which as yet nothing actually stands written: this is exactly what happens with mind. (De Anime, Book III, Chp. 4)

These passages are rich in influential ideas and can with hindsight be read as programmatic for a philosophy of information: the process of informatio can be conceived as the imprint of characters on a wax tablet (tabula rasa), thinking can be analyzed in terms of manipulation of symbols.

2.2 Medieval philosophy

Throughout the middle ages the reflection on the concept of informatio is taken up by successive thinkers. Illustrative for the Aristotelian influence is the passage of Augustine in De Trinitate book XI. Here he analyzes vision as an analogy for the understanding of the Trinity. There are three aspects: the corporeal form in the outside world, the informatio by the sense of vision, and the resulting form in the mind. For this process of information Augustine uses the image of a signet ring making an impression in wax (De Trinitate, XI Cap 2 par 3). Capurro (2009) observes that this analysis can be interpreted as an early version of the technical concept of ‘sending a message’ in modern information theory, but the idea is older and is a common topic in Greek thought (Plato Theaetetus 191c,d; Aristotle De Anime, Book II, Chp. 12, Book III, Chp. 4; Theophrastus De Sensibus 50).

The tabula rasa notion was later further developed in the theory of knowledge of Avicenna (c.980–1037 CE):

The human intellect at birth is rather like a tabula rasa, a pure potentiality that is actualized through education and comes to know. Knowledge is attained through empirical familiarity with objects in this world from which one abstracts universal concepts. (Sajjad 2006, Other Internet Resources)

The idea of a tabula rasa development of the human mind was the topic of a novel Hayy ibn Yaqdhan by the Arabic Andalusian philosopher Ibn Tufail (1105–1185 CE, known as “Abubacer” or “Ebn Tophail” in the West). This novel describes the development of an isolated child on a deserted island. A later translation in Latin under the title Philosophus Autodidactus (1761) influenced the empiricist John Locke in the formulation of his tabula rasa doctrine.

Apart from the permanent creative tension between theology and philosophy, medieval thought, after the rediscovery of Aristotle's Metaphysics in the 12th century inspired by Arabic scholars, can be characterized as an elaborate and subtle interpretation and development of, mainly Aristotelian, classical theory. Reflection on the notion of informatio is taken up, under influence of Avicenna, by thinkers like Aquinas (1225–1274 CE) and Duns Scotus (1265/66–1308 CE). When Aquinas discusses the question whether angels can interact with matter he refers to the Aristotelian doctrine of hylomorphism (i.e., the theory that substance consists of matter (hylo-wood, matter) and form (morphè)). Here Aquinas translates this as the in-formation of matter (informatio materiae) (Summa Theologiae, 1a 110 2, Capurro 2009). Duns Scotus refers to informatio in the technical sense when he discusses Augustine's theory of vision in De Trinitate, XI Cap 2 par 3 (Duns Scotus, 1639, De imagine, Ordinatio, I, d.3, p.3).

The tension that already existed in classical philosophy between Platonic idealism(universalia ante res) and Aristotelian realism (universalia in rebus) is recaptured as the problem of universals: do universal qualities like ‘humanity’ or the idea of a horse exist apart from the individual entities that instantiate them? It is in the context of his rejection of universals that Ockham (c. 1287–1347 CE) introduces his well-known razor: entities should not be multiplied beyond necessity. Throughout their writings Aquinas and Scotus use the Latin terms informatio and informare in a technical sense, although this terminology is not used by Ockham.

2.3 Modern philosophy

The history of the concept of information in modern philosophy is complicated. Probably starting in the 14th century the term ‘information’ emerged in various developing European languages in the general meaning of ‘education’ and ‘inquiry’. The French historical dictionary by Godefroy (1881) gives action de former, instruction, enquête, science, talent as early meanings of ‘information’. The term was also used explicitly for legal inquiries (Dictionnaire du Moyen Français (1330–1500) 2010). Because of this colloquial use the term ‘information’ loses its association with the concept of ‘form’ gradually and appears less and less in a formal sense in philosophical texts.

At the end of the middle ages society and science are changing fundamentally (Hazard 1935; Ong 1958; Dijksterhuis 1986). In a long complex process the Aristotelian methodology of the four causes was transformed to serve the needs of experimental science:

  1. The Material Cause developed in to the modern notion of matter.
  2. The Formal Cause was reinterpreted as geometric form in space.
  3. The Efficient Cause was redefined as direct mechanical interaction between material bodies.
  4. The Final Cause was dismissed as unscientific. Because of this, Newton's contemporaries had difficulty with the concept of the force of gravity in his theory. Gravity as action at a distance seemed to be a reintroduction of final causes.

In this changing context the analogy of the wax-impression is reinterpreted. A proto-version of the modern concept of information as the structure of a set or sequence of simple ideas is developed by the empiricists, but since the technical meaning of the term ‘information’ is lost, this theory of knowledge is never identified as a new ‘theory of information’.

The consequence of this shift in methodology is that only phenomena that can be explained in terms of mechanical interaction between material bodies can be studied scientifically. This implies in a modern sense: the reduction of intensive properties to measurable extensive properties. For Galileo this insight is programmatic:

To excite in us tastes, odors, and sounds I believe that nothing is required in external bodies except shapes, numbers, and slow or rapid movements. (Galileo 1623)

These insights later led to the doctrine of the difference between primary qualities (space, shape, velocity) and secondary qualities (heat, taste, color etc.). In the context of philosophy of information Galileo's observations on the secondary quality of ‘heat’ is of particular importance since they lay the foundations for the study of thermodynamics in the 19th century:

Having shown that many sensations which are supposed to be qualities residing in external objects have no real existence save in us, and outside ourselves are mere names, I now say that I am inclined to believe heat to be of this character. Those materials which produce heat in us and make us feel warmth, which are known by the general name of “fire,” would then be a multitude of minute particles having certain shapes and moving with certain velocities. (Galileo 1623)

A pivotal thinker in this transformation is René Descartes (1596–1650 CE). In his Meditationes, after ‘proving’ that the matter (res extensa) and mind (res cogitans) are different substances (i.e., forms of being existing independently), the question of the interaction between these substances becomes an issue. The malleability of wax is for Descartes an explicit argument against influence of the res extensa on the res cogitans (Meditationes II, 15). The fact that a piece of wax loses its form and other qualities easily when heated, implies that the senses are not adequate for the identification of objects in the world. True knowledge thus can only be reached via ‘inspection of the mind’. Here the wax metaphor that for more than 1500 years was used to explain sensory impression is used to argue against the possibility to reach knowledge via the senses. Since the essence of the res extensa is extension, thinking fundamentally can not be understood as a spatial process. Descartes still uses the terms ‘form’ and ‘idea’ in the original scholastic non-geometric (atemporal, aspatial) sense. An example is the short formal proof of God's existence in the second answer to Mersenne in the Meditationes de Prima Philosophia

I use the term idea to refer to the form of any given thought, immediate perception of which makes me aware of the thought.
(Idea nomine intelligo cujuslibet cogitationis formam illam, per cujus immediatam perceptionem ipsius ejusdem cogitationis conscious sum)

I call them ‘ideas’ says Descartes

only in so far as they make a difference to the mind itself when they inform that part of the brain.
(sed tantum quatenus mentem ipsam in illam cerebri partem conversam informant). (Descartes, 1641, Ad Secundas Objections, Rationes, Dei existentiam & anime distinctionem probantes, more Geometrico dispositae.)

Because the res extensa and the res cogitans are different substances, the act of thinking can never be emulated in space: machines can not have the universal faculty of reason. Descartes gives two separate motivations:

Of these the first is that they could never use words or other signs arranged in such a manner as is competent to us in order to declare our thoughts to others: (…) The second test is, that although such machines might execute many things with equal or perhaps greater perfection than any of us, they would, without doubt, fail in certain others from which it could be discovered that they did not act from knowledge, but solely from the disposition of their organs: for while reason is an universal instrument that is alike available on every occasion, these organs, on the contrary, need a particular arrangement for each particular action; whence it must be morally impossible that there should exist in any machine a diversity of organs sufficient to enable it to act in all the occurrences of life, in the way in which our reason enables us to act. (Discourse de la méthode, 1647)

The passage is relevant since it directly argues against the possibility of artificial intelligence and it even might be interpreted as arguing against the possibility of a universal Turing machine: reason as a universal instrument can never be emulated in space. This conception is in opposition to the modern concept of information which as a measurable quantity is essentially spatial, i.e., extensive (but in a sense different from that of Descartes).

Descartes does not present a new interpretation of the notions of form and idea, but he sets the stage for a debate about the nature of ideas that evolves around two opposite positions:

The Cartesian notion that ideas are innate and thus a priori. This form of rationalism implies an interpretation of the notion of ideas and forms as atemporal, aspatial, but complex structures i.e., the idea of ‘a horse’ (i.e., with a head, body and legs). It also matches well with the interpretation of the knowing subject as a created being (ens creatu). God created man after his own image and thus provided the human mind with an adequate set of ideas to understand his creation. In this theory growth, of knowledge is a priori limited. Creation of new ideas ex nihilo is impossible. This view is difficult to reconcile with the concept of experimental science.
Concepts are constructed in the mind a posteriori on the basis of ideas associated with sensory impressions. This doctrine implies a new interpretation of the concept of idea as:
whatsoever is the object of understanding when a man thinks … whatever is meant by phantasm, notion, species, or whatever it is which the mind can be employed about when thinking. (Locke 1691, Essay, I,i,8)
Here ideas are conceived as elementary building blocks of human knowledge and reflection. This fits well with the demands of experimental science. The downside is that the mind can never formulate apodeictic truths about cause and effects and the essence of observed entities, including its own identity. Human knowledge becomes essentially probabilistic (Locke 1691, Essay, IV 25).

Locke's reinterpretation of the notion of idea as a ‘structural placeholder’ for any entity present in the mind is an essential step in the emergence of the modern concept of information. Since these ideas are not involved in the justification of apodeictic knowledge, the necessity to stress the atemporal and aspatial nature of ideas vanishes. The construction of concepts on the basis of a collection of elementary ideas based in sensorial experience opens the gate to a reconstruction of knowledge as an extensive property of an agent: more ideas implies more probable knowledge.

In the second half of the 17th century formal theory of probability is developed by researchers like Pascal (1623–1662), Fermat (1601 or 1606–1665) and Christiaan Huygens (1629–1695). The work De ratiociniis in ludo aleae of Huygens was translated in to English by John Arbuthnot (1692). For these authors, the world was essentially mechanistic and thus deterministic, probability was a quality of human knowledge caused by its imperfection:

It is impossible for a Die, with such determin'd force and direction, not to fall on such determin'd side, only I don't know the force and direction which makes it fall on such determin'd side, and therefore I call it Chance, wich is nothing but the want of art;… (John Arbuthnot Of the Laws of Chance (1692), preface)

This text probably influenced Hume, who was the first to marry formal probability theory with theory of knowledge:

Though there be no such thing as Chance in the world; our ignorance of the real cause of any event has the same influence on the understanding, and begets a like species of belief or opinion. (…) If a dye were marked with one figure or number of spots on four sides, and with another figure or number of spots on the two remaining sides, it would be more probable, that the former would turn up than the latter; though, if it had a thousand sides marked in the same manner, and only one side different, the probability would be much higher, and our belief or expectation of the event more steady and secure. This process of the thought or reasoning may seem trivial and obvious; but to those who consider it more narrowly, it may, perhaps, afford matter for curious speculation. (Hume 1748, Section VI, “On probability” 1)

Here knowledge about the future as a degree of belief is measured in terms of probability, which in its turn is explained in terms of the number of configurations a deterministic system in the world can have. The basic building blocks of a modern theory of information are in place. With this new concept of knowledge empiricists laid the foundation for the later development of thermodynamics as a reduction of the secondary quality of heat to the primary qualities of bodies.

At the same time the term ‘information’ seems to have lost much of its technical meaning in the writings of the empiricists so this new development is not designated as a new interpretation of the notion of ‘information’. Locke sometimes uses the phrase that our senses ‘inform’ us about the world and occasionally uses the word ‘information’.

For what information, what knowledge, carries this proposition in it, viz. ‘Lead is a metal’ to a man who knows the complex idea the name lead stands for? (Locke 1691, VIII, 4)

Hume seems to use information in the same casual way when he observes:

Two objects, though perfectly resembling each other, and even appearing in the same place at different times, may be numerically different: And as the power, by which one object produces another, is never discoverable merely from their idea, it is evident cause and effect are relations, of which we receive information from experience, and not from any abstract reasoning or reflection. (Hume 1739, Part III, section 1)

The empiricists methodology is not without problems. The biggest issue is that all knowledge becomes probabilistic and a posteriori. Immanuel Kant (1724–1804) was one of the first to point out that the human mind has a grasp of the meta-concepts of space, time and causality that itself can never be understood as the result of a mere combination of ‘ideas’. What is more, these intuitions allow us to formulate scientific insights with certainty: i.e., the fact that the sum of the angles of a triangle in Euclidean space is 180 degrees. This issue cannot be explained in the empirical framework. If knowledge is created by means of combination of ideas then there must exist an a priori synthesis of ideas in the human mind. According to Kant, this implies that the human mind can evaluate its own capability to formulate scientific judgements. In his Kritik der reinen Vernunft (1781) Kant developed transcendental philosophy as an investigation of the necessary conditions of human knowlevdge. Although Kant's transcendental program did not contribute directly to the development of the concept of information, he did influence research in to the foundations of mathematics and knowledge relevant for this subject in the 19th and 20th century: e.g., the work of Frege, Husserl, Russell, Brouwer, L. Wittgenstein, Gödel, Carnap, Popper and Quine.

2.4 Historical development of the meaning of the term ‘information’

The history of the term ‘information’ is intricately related to the study of central problems in epistemology and ontology in Western philosophy. After a start as a technical term in classical and medieval texts the term ‘information’ almost vanished from the philosophical discourse in modern philosophy, but gained popularity in colloquial speech. Gradually the term obtained the status of an abstract mass-noun, a meaning that is orthogonal to the classical process-oriented meaning. In this form it was picked up by several researchers (Fisher 1925; Shannon 1948) in the 20th century who introduced formal methods to measure ‘information’. This, in its turn, lead to a revival of the philosophical interest in the concept of information. This complex history seems to be one of the main reasons for the difficulties in formulating a definition of a unified concept of information that satisfies all our intuitions. At least three different meanings of the word ‘information’ are historically relevant:

‘Information’ as the process of being informed.
This is the oldest meaning one finds in the writings of authors like Cicero (106–43 BCE) and Augustine (354–430 CE) and it is lost in the modern discourse, although the association of information with processes (i.e., computing, flowing or sending a message) still exists. In classical philosophy one could say that when I recognize a horse as such, then the ‘form’ of a horse is planted in my mind. This process is my ‘information’ of the nature of the horse. Also the act of teaching could be referred to as the ‘information’ of a pupil. In the same sense one could say that a sculptor creates a sculpture by ‘informing’ a piece of marble. The task of the sculptor is the ‘information’ of the statue (Capurro & Hjørland 2003). This process-oriented meaning survived quite long in western European discourse: even in the 18th century Robinson Crusoe could refer to the education of his servant Friday as his ‘information’ (Defoe 1719). It is also used in this sense by Berkeley: “I love information upon all subjects that come in my way, and especially upon those that are most important” (Alciphron Dialogue 1, Section 5, Paragraph 6/10, see Berkeley 1732).
‘Information’ as a state of an agent,
i.e., as the result of the process of being informed. If one teaches a pupil the theorem of Pythagoras then, after this process is completed, the student can be said to ‘have the information about the theorem of Pythagoras’. In this sense the term ‘information’ is the result of the same suspect form of substantiation of a verb (informare > informatio) as many other technical terms in philosophy (substance, consciousness, subject, object). This sort of term-formation is notorious for the conceptual difficulties it generates. Can one derive the fact that I ‘have’ consciousness from the fact that I am conscious? Can one derive the fact that I ‘have’ information from the fact that I have been informed? The transformation to this modern substantiated meaning seems to have been gradual and seems to have been general in Western Europe at least from the middle of the fifteenth century. In the renaissance a scholar could be referred to as ‘a man of information’, much in the same way as we now could say that someone received an education (Adriaans and van Benthem 2008b; Capurro & Hjørland 2003). In ‘Emma’ by Jane Austen one can read: “Mr. Martin, I suppose, is not a man of information beyond the line of his own business. He does not read” (Austen 1815, pg 21).
‘Information’ as the disposition to inform,
i.e., as a capacity of an object to inform an agent. When the act of teaching me Pythagoras' theorem leaves me with information about this theorem, it is only natural to assume that a text in which the theorem is explained actually ‘contains’ this information. The text has the capacity to inform me when I read it. In the same sense, when I have received information from a teacher, I am capable of transmitting this information to another student. Thus information becomes something that can be stored and measured. This last concept of information as an abstract mass-noun has gathered wide acceptance in modern society and has found its definitive form in the 19th century, allowing Sherlock Homes to make the following observation: “… friend Lestrade held information in his hands the value of which he did not himself know.” (“The Adventure of the Noble Bachelor,” Conan Doyle 1892). The association with the technical philosophical notions like ‘form’ and ‘informing’ has vanished from the general consciousness although the association between information and processes like storing, gathering, computing and teaching still exist.

3. Building blocks of modern theories of information

Leaving aside for a moment the exact nature of information bearers (an ‘idea’, a text, number, message, physical object, system, proposition or structure) there are various ways to measure the amount of information stored in an information bearer x. Let I(x) be an indeterminate information function that assigns a scalar value to x measuring it's ‘information’. There are two basic intuitions or maxims that any such measurement proposal should observe:

  1. Information is extensive. Our intuition is that longer text potentially contains more information. Thus when we have two structures A and B that are mutually independent, then the total information in the combination should be the sum of both the information in A and B: I(A and B)=I(A)+I(B).
  2. Information reduces uncertainty. Information grows with the reduction of uncertainty it creates. When we are absolutely certain about a state of affairs we cannot receive new information about it. This suggests an association between information and probability. Improbable structures contain more information. If we measure the probability of an event in terms of a real number between 0 and 1, then when P(A) = 1, i.e., it is absolutely certain that A will occur, we should have that I(A) = 0, i.e., the occurrence of A contains no information.

Both intuitions are related to the methodology of empiricism (Locke 1691; Hume 1748) and it's underlying theory of knowledge. The simplest mathematical function that unifies these two intuitions is the one that defines the information in terms of the negative log of the probability: I(A)= −log P(A) (Shannon 1948; Shannon & Weaver 1949). The elegance of this formula however does not shield us from the conceptual problems it harbors and the history of its genesis is involved. In the following paragraphs we discuss some developments that contributed to the emergence of modern theories of information.

With hindsight many notions that have to do with optimal code systems, ideal languages and the association between computing and processing language have been recurrent themes in the philosophical reflection since the seventeenth century.

3.1 Languages

One of the most elaborate proposals for a universal ‘philosophical’ language was made by bishop John Wilkins: “An Essay towards a Real Character, and a Philosophical Language” (London 1668). Wilkins' project consisted of an elaborate system of symbols that supposedly were associated with unambiguous concepts in reality. Proposals such as these made philosophers sensitive to the deep connections between language and thought. The empiricist methodology made it possible to conceive the development of language as a system of conventional signs in terms of associations between ideas in the human mind. The issue that currently is known as the symbol grounding problem (how do arbitrary signs acquire their inter-subjective meaning) was one of the most heavily debated questions in the 18th century in the context of the problem of the origin of languages. Diverse thinkers as Vico, Condillac, Rousseau, Diderot, Herder and Haman made contributions. The central question was whether language was given a priori (by God) or whether it was constructed and hence an invention of man himself. Typical was the contest issued by the Berlin Academy in 1769:

En supposant les hommes abandonnés á leurs facultés naturelles, sont-ils en état d'inventer le langage, et par quels moyens parviendront-ils á cette invention?

Assuming men abandoned to their natural faculties, are they able to invent language and by what means will they come to this invention?

The controversy raged on for over a century without any conclusion and in 1866 the French Academy of Science banished the issue from the scientific arena.

Philosophically more relevant is the work of Leibniz (1646–1716) on a so-called characteristica universalis: the notion of a universal logical calculus that would be the perfect vehicle for scientific reasoning. A central presupposition in Leibniz' philosophy is that such a perfect language of science is in principle possible because of the perfect nature of the world as God's creation (ratio essendi = ration cognoscendi, the origin of being is the origin of knowing). This principle was rejected by Wolff (1679–1754) who suggested more heuristically oriented characteristica combinatoria (van Peursen 1987). These ideas had to wait for thinkers like Boole (1854, An Investigation of the Laws of Thought), Frege (1879, Begriffschrift), Peirce (who in 1886 already suggested that electrical circuits could be used to process logical operations) and Russell and Whitehead (1910–1913, Principia Mathematica) to find a more fruitful treatment.

3.2 Optimal codes

The fact that frequencies of letters vary in a language was known since the invention of book printing. Printers needed many more ‘e’s and ‘t’s than ‘x’s or ‘q’s to typeset an English text. This knowledge was used extensively to decode ciphers since the 17th century (Kahn 1967; Singh 1999). In 1844 an assistant of Samuel Morse, Alfred Vail, determined the frequency of letters used in a local newspaper in Morristown and used them to optimize Morse code. Thus the core of theory of optimal codes was already established long before Shannon developed its mathematical foundation (Shannon 1948; Shannon & Weaver 1949). Historically important but philosophically less relevant are the efforts of Charles Babbage to construct computing machines (Difference Engine in 1821, and the Analytical Engine 1834–1871) and the attempt of Ada Lovelace (1815–1852) to design what is considered to be the first programming language for the Analytical Engine.

3.3 Numbers

The simplest way of representing numbers is via a unary system. Here the length of the representation of a number is equal to the size of the number itself, i.e., the number ‘ten’ is represented as ‘\\\\\\\\\\’. The classical Roman number system is an improvement since it contains different symbols for different orders of magnitude (one = I, ten = X, hundred = C, thousand = M). This system has enormous drawbacks since in principle one needs an infinite amount of symbols to code the natural numbers and because of this the same mathematical operations (adding, multiplication etc.) take different forms at different orders of magnitude. Around 500 CE the number zero was invented in India. Using zero as a placeholder we can code an infinity of numbers with a finite set of symbols (one = I, ten = 10, hundred = 100, thousand = 1000 etc.). From a modern perspective an infinite number of position systems is possible as long as we have 0 as a placeholder and a finite number of other symbols. Our normal decimal number system has ten digits ‘0, 1, 2, 3, 4, 5, 6, 7, 8, 9’ and represents the number two-hundred-and-fifty-five as ‘255’. In a binary number system we only have the symbols ‘0’ and ‘1’. Here two-hundred-and-fifty-five is represented as ‘11111111’. In a hexadecimal system with 16 symbols (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, a, b, c, d, e, f) the same number can be written as ‘ff’. Note that the length of these representations differs considerable. Using this representation, mathematical operations can be standardized irrespective of the order of magnitude of numbers we are dealing with, i.e., the possibility of a uniform algorithmic treatment of mathematical operations (addition, subtraction, multiplication and division etc.) is associated with such a position system.

The concept of a positional number system was brought to Europe by the Persian mathematician al-Khwarizmi (ca.780–ca.850 AD). His main work on numbers (ca. 820 CE) was translated into Latin as Liber Algebrae et Almucabola in the 12th century, which gave us amongst other things the term ‘algebra’. Our word ‘algorithm’ is derived from Algoritmi, the Latin form of his name. Positional number systems simplified commercial and scientific calculations.

In 1544 Michael Stifel introduced the concept of the exponent of a number in Arithmetica integra (Stifel 1544). Thus 8 can be written as 23 and 25 as 52. The notion of an exponent immediately suggests the notion of a logarithm as its inverse function: logb(ba) = a. Stifel compared the arithmetic sequence:

−3, −2, −1, 0, 1, 2, 3

in which the term 1 have a difference of 1 with the geometric sequence:

⅛, ¼, ½, 1, 2, 4, 8

in which the terms have a ratio of 2. The exponent notation allowed him to rewrite the values of the second table as:

2−3, 2−2, 2−1, 20, 21, 22, 23

which combines the two tables. This arguably was the first logarithmic table. A more definitive and practical theory of logarithms is developed by John Napier (1550–1617) in his main work (Napier 1614). He coined the term logarithm (logos + arithmetic: ratio of numbers). As is clear from the match between arithmetic and geometric progressions, logarithms reduce products to sums:

logb(xy) = logb(x) + logb(y)

They also reduce divisions to differences:

logb(x/y) = logb(x) − logb(y)

and powers to products:

logb(xp) = p logb(x)

After publication of the logarithmic tables by Briggs (1624) this new technique of facilitating complex calculations rapidly gained popularity.

3.4 Physics

Galileo (1623) already had suggested that the analysis of phenomena like heat and pressure could be reduced to the study of movements of elementary particles. Within the empirical methodology this could be conceived as the question how the sensory experience of the secondary quality of heat of an object or a gas could be reduced to movements of particles. Bernoulli (Hydrodynamica published in 1738) was the first to develop a kinetic theory of gases in which macroscopically observable phenomena are described in terms of microstates of systems of particles that obey the laws of Newtonian mechanics, but it was quite an intellectual effort to come up with an adequate mathematical treatment. Clausius (1850) made a conclusive step when he introduced the notion of the mean free path of a particle between two collisions. This opened the way for a statistical treatment by Maxwell who formulated his distribution in 1857, which was the first statistical law in physics. The definitive formula that tied all notions together (and that is engraved on his tombstone, though the actual formula is due to Planck) was developed by Boltzmann:

S = k log W

It describes the entropy S of a system in terms of the logarithm of the number of possible microstates W, consistent with the observable macroscopic states of the system, where k is the well-known Boltzmann constant. In all its simplicity the value of this formula for modern science can hardly be overestimated. The expression ‘log W’ can, from the perspective of information theory, be interpreted in various ways:

  • As the amount of entropy in the system.
  • As the length of the number needed to count all possible microstates consistent with macroscopic observations.
  • As the length of an optimal index we need to identify the specific current unknown microstate of the system, i.e., it is a measure of our ‘lack of information’.
  • As a measure for the probability of any typical specific microstate of the system consistent with macroscopic observations.

Thus it connects the additive nature of logarithm with the extensive qualities of entropy, probability, typicality and information and it is a fundamental step in the use of mathematics to analyze nature. Later Gibbs (1906) refined the formula:

S = −Σi pi ln pi

where pi is the probability that the system is in the ith microstate. This formula was adopted by Shannon (1948; Shannon & Weaver 1949) to characterize the communication entropy of a system of messages. Although there is a close connection between the mathematical treatment of entropy and information, the exact interpretation of this fact has been a source of controversy ever since (Harremoës & Topsøe 2008; Bais & Farmer 2008).

3.5 Logic

Dunn (2001, 2008) has pointed out that the analysis of information in logic is intricately related to the notions of intension and extension. The distinction between intension and extension is already anticipated in the Port Royal Logic (1662) and the writings of Mill (1843), Boole (1847) and Peirce (1868) but was systematically introduced in logic by Frege (1879, 1892). In a modern sense the extension of a predicate, say “X is a bachelor”, is simply the set of bachelors in our domain. The intension is associated with the meaning of the predicate and allows us to derive from the fact that ‘John is a bachelor’ the facts that ‘John is male’ and ‘John is unmarried’. It is clear that this phenomenon has a relation with both the possible world interpretation of modal operators and the notion of information. A bachelor is by necessity also male, i.e., in every possible world in which John is a bachelor he is also male, consequently: If someone gives me the information that John is a bachelor I get the information that he is male and unmarried for free. The possible world interpretation of modal operators (Kripke 1959) is related to the notion of ‘state description’ introduced by Carnap (1947). A state description is a conjunction that contains exactly one of each atomic sentence or its negation. The ambition to define a good probability measure for state descriptions was one of the motivations for Solomonoff (1960, 1997) to develop algorithmic information theory.

4. Developments in philosophy of Information

The modern theories of information emerged in the middle of the 20th century in a specific intellectual climate in which the distance between the sciences and parts of academic philosophy was quite big. Some philosophers displayed a specific anti-scientific attitude: Heidegger, “Der Wissenschaft denkt nicht.” On the other hand the philosophers from the Wiener Kreis overtly discredited traditional philosophy as dealing with illusionary problems (Carnap 1928). The research program of logical positivism was a rigorous reconstruction of philosophy based on a combination of empiricism and the recent advances in logic. It is perhaps because of this intellectual climate that early important developments in the theory of information took place in isolation from mainstream philosophical reflection. A landmark is the work of Dretske in the early eighties (Dretske 1981). Since the turn of the century, interest in Philosophy of Information has grown considerably, largely under the influence of the work of Luciano Floridi on semantic information. Also the rapid theoretical development of quantum computing and the associated notion of quantum information have had it repercussions on philosophical reflection.

4.1 Popper: Information as degree of falsifiability

The research program of logical positivism of the Wiener Kreis in the first half of the 20th century revitalized the older project of empiricism. Its ambition was to reconstruct scientific knowledge on the basis of direct observations and logical relation between statements about those observations. The old criticism of Kant on empiricism was revitalized by Quine (1951). Within the framework of logical positivism induction was invalid and causation could never be established objectively. In his Logik der Forschung (1934) Popper formulates his well-known demarcation criterion and he positions this explicitly as a solution to Hume's problem of induction (Popper 1934 [1977], pg. 42). Scientific theories formulated as general laws can never be verified definitively, but they can be falsified by only one observation. This implies that a theory is ‘more’ scientific if it is richer and provides more opportunity to be falsified:

Thus it can be said that the amount of empirical information conveyed by a theory, or its empirical content, increases with its degree of falsifiability” (Popper 1934 [1977], pg. 113, emphasis in original).

This quote, in the context of Popper's research program, shows that the ambition to measure the amount of empirical information in scientific theory conceived as a set of logical statements was already recognized as a philosophical problem more than a decade before Shannon formulated his theory of information. Popper is aware of the fact that the empirical content of a theory is related to its falsifiability and that this in its turn has a relation with the probability of the statements in the theory. Theories with more empirical information are less probable. Popper distinguishes logical probability from numerical probability (“which is employed in the theory of games and chance, and in statistics” (Popper 1934 [1977], pg. 119, emphasis in original)). In a passage that is programmatic for the later development of the concept of information he defines the notion of logical probability:

The logical probability of a statement is complementary to its falsifiability: it increases with decreasing degree of falsifiability. The logical probability 1 corresponds to the degree 0 of falsifiability and vice versa. (Popper 1934 [1977], p. 119, emphasis in original)

It is possible to interpret numerical probability as applying to a subsequence (picked out from the logical probability relation) for which a system of measurement can be defined, on the basis of frequency estimates. (Popper 1934 [1977], pg. 119, emphasis in original)

Popper never succeeded in formulating a good formal theory to measure this amount of information although in later writings he suggests that Shannon's theory of information might be useful (Popper 1934 [1977], appendix ix (1954), pg. 404). These issues were later developed in philosophy of science. Theory of conformation studies induction theory and the way in which evidence ‘supports’ a certain theory (Huber 2007, Other Internet Resources). Although the work of Carnap motivated important developments in both philosophy of science and philosophy of information the connection between the two disciplines seems to have been lost. There is no mention of information theory or any of the more foundational work in philosophy of information in Kuipers (2007a), but the two disciplines certainly have overlapping domains. (See, e.g., the discussion of the so-called Black Ravens Paradox by Kuipers (2007b) and Rathmanner & M. Hutter (2011).)

4.2 Shannon: Information defined in terms of probability

In two landmark papers Shannon (1948; Shannon & Weaver 1949) characterized the communication entropy of a system of messages A:

H(P) = −Σ(iA) pi log2 pi

Here pi is the probability of message i in A. This is exactly the formula for Gibb's entropy in physics. The use of base-2 logarithms ensures that the code length is measured in bits (binary digits). It is easily seen that the communication entropy of a system is maximal when all the messages have equal probability and thus are typical.

The amount of information I in an individual message x is given by:

I(x) = −log px

This formula, that can be interpreted as the inverse of the Boltzmann entropy, covers a number of our basic intuitions about information:

  • A message x has a certain probability px between 0 and 1 of occurring.
  • If px = 1 then I(x) = 0. If we are certain to get a message it literally contains no ‘news’ at al. The lower the probability of the message is, the more information it contains. A message like “The sun will rise tomorrow” seems to contain less information than the message “Jesus was Caesar” exactly because the second statement is much less likely to be defended by anyone (although it can be found on the web).
  • If two messages x and y are unrelated then I(x and y)=I(x) + I(y). Information is extensive. The amount of information in two combined messages is equal to the sum of the amount of information in the individual messages.

Information as the negative log of the probability is the only mathematical function that exactly fulfills these constraints (Cover & Thomas 2006). Shannon offers a theoretical framework in which binary strings can be interpreted as words in a (programming) language containing a certain amount of information (see 3.1 Languages). The expression -log px exactly gives the length of an optimal code for message x and as such formalizes the old intuition that codes are more efficient when frequent letters get shorter representations (see 3.2 Optimal codes ). Logarithms as a reduction of multiplication to addition (see 3.3 Numbers) are a natural representation of extensive properties of systems and already as such had been used by physicists in the 19th century (see 3.4 Physics).

One aspect of information that Shannon's definition explicitly does not cover is the actual content of the messages interpreted as propositions. So the statement “Jesus was Caesar” and “The moon is made of green cheese” may carry the same amount of information while their meaning is totally different. A large part of the effort in philosophy of information has been directed to the formulation of more semantic theories of information (Bar-Hillel and Carnap 1953; Floridi 2002, 2003, 2011). Although Shannon's proposals at first were almost completely ignored by philosophers it has in the past decennia become apparent that their impact on philosophical issues is big. Dretske (1981) was one of the first to analyze the philosophical implications of Shannon's theory, but the exact relation between various systems of logic and theory of information are still unclear (see 3.5 Logic).

4.3 Solomonoff, Kolmogorov, Chaitin: Information as the length of a program

This problem of relating a set of statements to a set of observations and defining the corresponding probability was taken up by Carnap (1945, 1950). He distinguished two forms of probability: Probability1 or “degree of confirmation” P1(h;e) is a logical relation between two sentences, a hypothesis h and a sentence e reporting a series of observations. Statements of this type are either analytical or contradictory. The second form, Probability2 or “relative frequency”, is the statistical concept. In the words of his student Solomonoff (1997):

Carnap's model of probability started with a long sequence of symbols that was a description of the entire universe. Through his own formal linguistic analysis, he was able to assign a priori probabilities to any possible string of symbols that might represent the universe.

The method for assigning probabilities Carnap used, was not universal and depended heavily on the code systems used. A general theory of induction using Bayes' rule can only be developed when we can assign a universal probability to ‘any possible string’ of symbols. In a paper in 1960 Solomonoff (1960, 1964a,b) was the first to sketch an outline of a solution for this problem. He formulated the notion of a universal distribution:

consider the set of all possible finite strings to be programs for a universal Turing machine U and define the probability of a string x of symbols in terms of the length of the shortest program p that outputs x on U.

This notion of Algorithmic Information Theory was invented independently somewhat later separately by Kolmogorov (1965) and Chaitin (1969). Levin (1974) developed a mathematical expression of the universal a priori probability as a universal (that is, maximal) lower semicomputable semimeasure M, and showed that the negative logarithm of M(x) coincides with the Kolmogorov complexity of x up to an additive logarithmic term.

Algorithmic Information Theory (a.k.a. Kolmogorov complexity theory) has developed into a rich field of research with a wide range of domains of applications many of which are philosophically relevant (Li and Vitányi 1997):

  • It provides us with a general theory of induction. The use of Bayes' rule allows for a modern reformulation of Ockham's razor in terms of Minimum Description Length (Rissanen 1978, 1989; Barron, Rissanen, and Yu 1998; Grünwald 2007) and minimum message length (Wallace 2005). Note that Domingos (1998) has argued against the general validity of these principles.
  • It allows us to formulate probabilities and information content for individual objects. Even individual natural numbers.
  • It lays the foundation for a theory of learning as data compression (Adriaans 2007).
  • It gives a definition of randomness of a string in terms of incompressibility. This in itself has led to a whole new domain of research (Niess 2009; Downey & Hirschfeld 2010).
  • It allows us to formulate an objective a priori measure of the predictive value of a theory in terms of its randomness deficiency: i.e., the best theory is the shortest theory that makes the data look random conditional to the theory. (Vereshchagin and Vitányi 2004).

There are also down-sides:

  • Algorithmic complexity is uncomputable, although it can in a lot of practical cases be approximated and commercial compression programs in some cases come close to the theoretical optimum (Cilibrasi and Vitányi 2005).
  • Algorithmic complexity is an asymptotic measure (i.e., it gives a value that is correct up to a constant). In some cases the value of this constant is prohibitive for use in practical purposes.
  • Although the shortest theory is always the best one in terms of randomness deficiency, incremental compression of data-sets is in general not a good learning strategy since the randomness deficiency does not decrease monotonically with the compression rate (Adriaans and Vitányi 2009).
  • The generality of the definitions provided by Algorithmic Information Theory depends on the generality of the concept of a universal Turing machine and thus ultimately on the interpretation of the Church-Turing-Thesis.

Algorithmic Information Theory has gained rapid acceptance as a fundamental theory of information. The well-known introduction in Information Theory by Cover and Thomas (2006) states: “… we consider Kolmogorov complexity (i.e., AIT) to be more fundamental than Shannon entropy” (pg 3).

The idea that algorithmic complexity theory is a foundation for a general theory of artificial intelligence (and theory of knowledge) has already been suggested by Solomonoff (1997) and Chaitin (1987). Several authors have defended that data compression is a general principle that governs human cognition (Chater & Vitányi 2003; Wolff 2006). Hutter (2005, 2007a,b) argues that Solomonoff's formal and complete theory essentially solves the induction problem. Hutter (2007a) and Rathmanner & Hutter (2011) enumerate a plethora of classical philosophical and statistical problems around induction and claim that Solomonoff's theory solves or avoids all these problems. Probably because of its technical nature, the theory has been largely ignored by the philosophical community. Yet, it stands out as one of the most fundamental contributions to information theory in the 20th century and it is clearly relevant for a number of philosophical issues, such as the problem of induction.

4.4 Applications

The first domain that could benefit from philosophy of information is of course philosophy itself. The concept of information potentially has an impact on almost all philosophical main disciplines, ranging from logic, theory of knowledge, to ontology and even ethics and esthetics (see introduction above). Philosophy of science and philosophy of information, with their interest in the problem of induction and theory formation, probably both could benefit from closer cooperation (see 4.1 Popper: Information as degree of falsifiability). The concept of information plays an important role in the history of philosophy that is not completely understood (see 2. History of the term and the concept of information).

As information has become a central issue in almost all of the sciences and humanities this development will also impact philosophical reflection in these areas. Archaeologists, linguists, physicists, astronomers all deal with information. The first thing a scientist has to do before he can formulate a theory is gathering information. The application possibilities are abundant. Datamining and the handling of extremely large data sets seems to be an essential for almost every empirical discipline in the 21st century.

In biology we have found out that information is essential for the organization of life itself and for the propagation of complex organisms (see entry on biological information). One of the main problems is that current models do not explain the complexity of life well. Valiant has started a research program that studies evolution as a form of computational learning (Valiant 2007) in order to explain this discrepancy. Aaronson (2011, Other Internet Resources) has argued explicitly for a closer cooperation between complexity theory and philosophy.

Until recently the general opinion was that the various notions of information were more or less isolated but in recent years considerable progress has been made in the understanding of the relationship between these concepts. Cover and Thomas (2006), for instance, see a perfect match between Kolmogorov complexity and Shannon information. Similar observations have been made by Grünwald and Vitányi (2008). Also the connections that exist between the theory of thermodynamics and information theory have been studied (Bais and Farmer 2008; Harremoës & Topsøe 2008) and it is clear that the connections between physics and information theory are much more elaborate than a mere ad hoc similarity between the formal treatment of entropy and information suggests (Gell-Mann & Lloyd 2003; Verlinde 2010 (Other Internet Resources)). A unified theory of information, however, seems beyond our reach at this moment.

5. Conclusion

The notion of information has become central in both our society and in the sciences. Information technology plays a central role in the way we organize our lives. It also has become a central category in the sciences and the humanities. Philosophy of information, both as a historical and a systematic discipline, offers a new perspective on old philosophical problems and also suggest some new research domains. A deeper analysis of some of the more technical problems concerning the philosophical analysis of information is given in the supplementary document Open Problems in the Study of Information and Computation.


  • Adriaans, P.W., 2007, ‘Learning as Data Compression’, in S. B. Cooper, B. Löwe & A. Sorbi, Computation and Logic in the Real World (Lecture Notes in Computer Science: Volume 449), Berlin: Springer, pp. 11–24.
  • –––, 2008, “Between Order and Chaos: The Quest for Meaningful Information,” Theory of Computing Systems (Special Issue: Computation and Logic in the Real World; Guest Editors: S. Barry Cooper, Elvira Mayordomo and Andrea Sorbi), 45 (July): 650–674.
  • Adriaans, P.W. and J.F.A.K. van Benthem, 2008a, ‘Information is what is does’, in Adriaans and van Benthem 2008b.
  • ––– (eds.), 2008b, Handbook of Philosophy of Information, Elsevier Science Publishers.
  • Adriaans, P. and P.M.B. Vitányi, 2009, “Approximation of the Two-Part MDL Code,” IEEE Transactions on Information Theory, 55(1): 444–457.
  • Aristotle. Aristotle in 23 Volumes, Vols. 17, 18, translated by Hugh Tredennick. Cambridge, MA, Harvard University Press; London, William Heinemann Ltd. 1933, 1989.
  • Antunes, L. and L. Fortnow, 2003, “Sophistication Revisited,” in Proceedings of the 30th International Colloquium on Automata, Languages and Programming (Lecture Notes in Computer Science: Volume 2719), Berlin: Springer, pp. 267–277.
  • Antunes, L., L. Fortnow, D. Van Melkebeek and N. V. Vinodch, 2006, “Computational depth: Concept and application,” Theoretical Computer Science, volume 354.
  • Aquinas, St. Thomas, 1265–1274, Summa Theologiae.
  • Arbuthnot, J., 1692, Of the Laws of Chance, or, a method of Calculation of the Hazards of Game, Plainly demonstrated, And applied to Games as present most in Use, translation of Huygens’ De Ratiociniis in Ludo Aleae,
  • Austen, J., 1815, Emma, London: Richard Bentley and Son.
  • Bar-Hillel, Y. and R. Carnap, 1953, ‘Semantic Information’, The British Journal for the Philosophy of Science, 4(14): 147–157.
  • Bais, F.A. and J.D. Farmer, 2008, “The Physics of Information,” in Adriaans and van Benthem 2008b.
  • Barron, A., J. Rissanen, and B. Yu, 1998, “The minimum description length principle in coding and modeling,” IEEE Transactions on Information Theory, 44(6): 2743–2760.
  • Barwise, J. and J. Perry, 1983, Situations and Attitudes, Cambridge, MA: MIT Press.
  • Bennett, C. H., 1988, “Logical depth and physical complexity,” in R. Herken (ed.), The Universal Turing Machine: A Half-Century Survey, Oxford: Oxford University Press, pp. 227–257.
  • van Benthem, J.F.A.K., 1990, “Kunstmatige Intelligentie: Een Voortzetting van de Filosofie met Andere Middelen,” Algemeen Nederlands Tijdschrift voor Wijsbegeerte, 82: 83–100.
  • –––, 2006, “Epistemic Logic and Epistemology: the state of their affairs,” Philosophical Studies, 128: 49–76.
  • van Benthem, J.F.A.K. and R. van Rooij, eds., 2003, “Connecting the Different Faces of Information,” Journal of Logic, Language and Information, 12(4): 375–379.
  • Berkeley, G., 1732, Alciphron: Or the Minute Philosopher, Edinburgh: Thomas Nelson, 1948–57.
  • Birkhoff, G.D., 1950, Collected Mathematical Papers, New York: American Mathematical Society.
  • Boole, G., 1847, Mathematical Analysis of Logic, Cambridge: Macmillan, Barclay, & Macmillan. [available online].
  • Bovens, L. and S. Hartmann, 2003, Bayesian epistemology, Oxford: Oxford University Press.
  • Briggs, H., 1624, Arithmetica Logarithmica, London: Gulielmus Iones.
  • Capurro, R., 1978, Information. Ein Beitrag zur etymologischen und ideengeschichtlichen Begründung des Informationsbegriffs [Information: A contribution to the foundation of the concept of information based on its etymology and in the history of ideas]. Munich, Germany: Saur. [available online].
  • –––, 2009, “Past, present and future of the concept of information,” tripleC (Cognition, Communication, Co-operation), 7(2): 125–141.
  • Capurro, R. & B. Hjørland, 2003, “The Concept of Information,” in Blaise Cronin (ed.), Annual Review of Information Science and Technology (ARIST), 37 (Chapter 8), 343–411.
  • Carnap, R., 1928, Scheinprobleme in der Philosophie (Pseudoproblems of Philosophy). Berlin: Weltkreis-Verlag.
  • –––, 1945, “The Two Concepts of Probability: The Problem of Probability,” Philosophy and Phenomenological Research, 5(4): 513–532.
  • –––, 1947, Meaning and Necessity, Chicago: The University of Chicago Press.
  • –––, 1950, Logical Foundations of Probability, Chicago: The University of Chicago Press.
  • Chaitin, G. J., 1969, “On the length of programs for computing finite binary sequences: statistical considerations,” J. Assoc. Comput. Mach., 16: 145–159.
  • –––, 1987, Algorithmic information theory, New York: Cambridge University Press.
  • Chater, N. and P.M.B. Vitányi, 2003, “Simplicity: a unifying principle in cognitive science,” Trends in Cognitive Science, 7(1): 19–22.
  • Cilibrasi, R. and P.M.B. Vitányi, 2005, “Clustering by compression,” IEEE Transactions on Information Theory, 51(4), 1523–1545.
  • Clausius, R., 1850, “Über die bewegende Kraft der Wärme und die Gesetze welche sich daraus für die Wärmelehre selbst ableiten lassen,” Poggendorffs Annalen der Physik und Chemie, 79: 368–97.
  • Conan Doyle, A., 1892, The Adventures of Sherlock Holmes, George Newnes Ltd.
  • Cover, T.M. and J.A. Thomas, 2006, Elements of Information Theory, 2nd edition, New York: John Wiley & Sons.
  • Crawford, J.M. and L.D. Auton, 1993, “Experimental Results on the Cross over Point in Satisfiability Problems,” Proceedings of the Eleventh National Conference on Artificial Intelligence, AAAI Press, pp. 21–27.
  • Crutchfield, J.P. and K. Young, 1989, “Inferring Statistical Complexity,” Physical Review Letters, 63:105.
  • –––, 1990, “Computation at the Onset of Chaos,” in Entropy, Complexity, and the Physics of Information, W. Zurek, editor, SFI Studies in the Sciences of Complexity, VIII, Reading, MA: Addison-Wesley, pp. 223–269.
  • Defoe, D., 1719, The Life and Strange Surprising Adventures of Robinson Crusoe of York, Mariner: who lived Eight and Twenty Years, all alone in an uninhabited Island on the coast of America, near the Mouth of the Great River of Oroonoque; Having been cast on Shore by Shipwreck, wherein all the Men perished but himself. With An Account how he was at last as strangely deliver'd by Pirates. Written by Himself, London: W. Taylor.
  • Dershowitz, N. and Y. Gurevich, 2008, “A Natural Axiomatization of Computability and Proof of Church's Thesis,” Bulletin of Symbolic Logic, 14(3): 299–350.
  • Descartes, R., 1641, Meditationes de Prima Philosophia (Meditations on First Philosophy), Paris.
  • –––, 1647, Discours de la Méthode (Discourse on Method), Leiden.
  • Devlin, K. and D. Rosenberg, 2008, “Information in the Study of Human Interaction,” in Adriaans and van Benthem 2008b.
  • Dictionnaire du Moyen Français (1330–1500) 2010, [available online]
  • Domingos, P., 1998, “Occam's Two Razors: The Sharp and the Blunt,” in Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining (KDD–98), New York: AAAI Press, pp. 37–43.
  • Downey, R.G. and D.R. Hirschfeldt, 2010, Algorithmic Randomness and Complexity (Series: Theory and Applications of Computability), New York: Springer.
  • Dretske, F., 1981, Knowledge and the Flow of Information, Cambridge, MA: The MIT Press.
  • Dufort, P.A. and C.J. Lumsden, 1994, “The Complexity and Entropy of Turing Machines,” Workshop on Physics and Computation. PhysComp '94 Proceedings, 227–232.
  • Dunn, J.M., 2001, “The Concept of Information and the Development of Modern Logic,” in Non-classical Approaches in the Transition from Traditional to Modern Logic, W. Stelzner (ed.), de Gruyter, pp. 423–427.
  • –––, 2008, “Information in computer science,” in Adriaans and van Benthem 2008b.
  • Dijksterhuis, E. J., 1986, The Mechanization of the World Picture: Pythagoras to Newton, Princeton University Press.
  • Duns Sotus, Opera Omnia. ("The Wadding edition") Lyon, 1639; reprinted Hildesheim: Georg Olms Verlagsbuchhandlung, 1968.
  • Edwards, P., 1967, The Encyclopedia of Philosophy, Macmillan Publishing Company.
  • Fisher, R.A., 1925, “Theory of statistical estimation,” Proceedings Cambridge Philosophical Society, 22(5): 700–725.
  • Floridi, L., 1999, “Information Ethics: On the Theoretical Foundations of Computer Ethics,” Ethics and Information Technology, 1(1): 37–56.
  • –––, 2002, “What Is the Philosophy of Information?” Metaphilosophy, 33(1–2): 123–145.
  • –––, ed., 2003, The Blackwell Guide to the Philosophy of Computing and Information, Blackwell, Oxford.
  • –––, 2011, The Philosophy of Information, Oxford University Press.
  • Frege, G., 1879, Begriffsschrift: eine der arithmetischen nachgebildete Formelsprache des reinen Denkens, Halle.
  • –––, 1892, Über Sinn und Bedeutung Zeitschrift für Philosophie und philosophische Kritik, NF 100.
  • Galileo Galilei, 1623, Il Saggiatore (in Italian) (Rome); The Assayer, English trans. Stillman Drake and C. D. O'Malley, in The Controversy on the Comets of 1618 (University of Pennsylvania Press, 1960).
  • Garey, M.R. and D.S.Johnson, 1979, Computers and Intractability, W.H.Freeman & Co.
  • Gell-Mann, M. and S. Lloyd, 2003, “Effective Complexity,” Working papers Santa Fe Institute, 387–398.
  • Gibbs, J.W., 1906, The scientific papers of J. Willard Gibbs in Two Volumes, 1. Longmans, Green, and Co.
  • Godefroy, F.G., 1881, Dictionnaire de l'ancienne langue française et de tous ses dialectes du 9e au 15e siècle, Paris F. Vieweg.
  • Grünwald, P.D., 2007, The Minimum Description Length Principle, MIT Press.
  • Grünwald, P. and P.M.B. Vitányi, 2008, “Algorithmic Information Theory,” in Adriaans and van Benthem 2008b.
  • Harremoës, P. and F. Topsøe, 2008, “The quantitative theory of information,” in Adriaans and van Benthem 2008b.
  • Hazard, P., 1935, La Crise de la conscience européenne, Paris.
  • Hintikka, J., 1962, Knowledge and Belief, Cornel University Press, Ithaca.
  • –––, 1973, Logic, Language Games, and Information, Clarendon, Oxford.
  • Hume, D., 1739–40, A Treatise of Human Nature.
  • –––, 1748, An Enquiry concerning Human Understanding, P.F. Collier & Son. 1910, ISBN 0198250606. [available online]
  • Hutter, M., 2005, Universal Artificial Intelligence: Sequential Decisions based on Algorithmic Probability, EATCS Book, Berlin: Springer.
  • –––, 2007a, “On Universal Prediction and Bayesian Confirmation,” Theoretical Computer Science, 384(1): 33–48.
  • ––– 2007b, “Algorithmic Information Theory: a brief non-technical guide to the field,” Scholarpedia, 2(3): 2519.
  • –––, 2010, “A Complete Theory of Everything (will be subjective),” Algorithms, 3(4): 329–350.
  • Ibn Tufail, Hayy ibn Yaqdhan, translated as Philosophus Autodidactus, published by Edward Pococke the Younger in 1671.
  • Kahn, D., 1967, The Code-Breakers, The Comprehensive History of Secret Communication from Ancient Times to the Internet, New York: Scribner.
  • Kolmogorov, A.N., 1965, “Three Approaches to the Quantitative Definition of Information,” Problems Inform. Transmission, 1(1): 1–7.
  • Koppel, M., 1987, “Complexity, Depth, and Sophistication,” in Complex Systems, 1(6): 1087–1091.
  • Kripke, S.A., 1959, “A Completeness Theorem in Modal Logic,” The Journal of Symbolic Logic, 24(1): 1–14.
  • Kuipers, Th.A.F. (ed.), 2007a, General Philosophy of Science, Amsterdam: Elsevier Science Publishers.
  • Kuipers, Th.A.F., 2007b, “Explanation in Philosophy of Science,” in Kuipers 2007a.
  • Langton, C.G., 1990, “Computation at the edge of chaos: Phase Transitions and Emergent Computation,” Physica D, 42(1–3): 12–37.
  • Lenski, W., 2010, “Information: a conceptual investigation,” Information 2010, 1(2): 74–118.
  • Levin, L.A., 1974, “Laws of information conservation (non-growth) and aspects of the foundation of probability theory,” Problems Information Transmission, 10(3): 206–210.
  • Li, M. and P.M.B. Vitányi, 2008, An introduction to Kolmogorov complexity and its applications, Berlin: Springer-Verlag, third edition.
  • Lloyd, S. and J. Ng, 2004, “Black Hole Computers,” Scientific American, 291(5): 30–39.
  • Locke, J., 1689, An Essay Concerning Human Understanding, J. W. Yolton (ed.), London: Dent; New York: Dutton, 1961.
  • Mill, J.S., 1843, A System of Logic, London.
  • Napier, J., 1614, Mirifici Logarithmorum Canonis Descriptio, Edinburgh: Andre Hart. [translation available online].
  • Von Neumann, J., 1955, Mathematische Grundlagen der Quantenmechanik, Berlin: Springer.
  • Nielsen, M.A. and I.L. Chuang, 2000, Quantum Computation and Quantum Information, Cambridge: Cambridge University Press.
  • Niess, A., 2009, Computability and Randomness (Oxford Logic Guides 51), Oxford: Oxford University Press.
  • Ong, W. J., 1958, 2004, Ramus, Method, and the Decay of Dialogue, From the Art of Discourse to the Art of Reason, Chicago: University of Chicago Press.
  • Parikh, R. and R. Ramanujam, 2003, “A Knowledge Based Semantics of Messages,” Journal of Logic, Language and Information, 12: 453–467.
  • Peirce, C. S., 1868, “Upon Logical Comprehension and Extension,” Proceedings of the American Academy of Arts and Sciences, 7: 416–432.
  • –––, 1886, Letter, Peirce to A. Marquand, dated 1886, W 5:541–3, Google Preview. See Burks, Arthur W., Review: Charles S. Peirce, The new elements of mathematics, Bulletin of the American Mathematical Society, 84(5) (1978): 913–18.
  • van Peursen, C.A., 1987, “Christian Wolff's Philosophy of Contingent Reality,” Journal of the History of Philosophy, 25(1): 69–82
  • Popper, K., 1934, The Logic of Scientific Discovery, (Logik der Forschung), English translation 1959, London: Hutchison, 1977).
  • Putnam, H., 1988, Representation and reality, Cambridge: The MIT Press.
  • Quine, W.V.O., 1951, “Two Dogmas of Empiricism,” The Philosophical Review, 60: 20–43. Reprinted in his 1953 From a Logical Point of View, Harvard University Press.
  • Rathmanner, S. and M. Hutter, 2011, “A Philosophical Treatise on Universal Induction,” Entropy, 13(6): 1076–1136.
  • Redei, M. and M. Stoeltzner, eds., 2001, John von Neumann and the Foundations of Quantum Physics, Dordrecht: Kluwer Academic Publishers.
  • Rissanen, J.J., 1978, “Modeling by Shortest Data Description,” Automatica, 14(5): 465–471.
  • –––, 1989, Stochastic Complexity in Statistical Inquiry, World Scientific Series in Computer Science, 15, Singapore: World Scientific.
  • van Rooij, R., 2004, “Signalling games select Horn strategies,” Linguistics and Philosophy, 27: 493–527.
  • Schmidhuber, J. 1997a, “Low-Complexity Art,” Leonardo, Journal of the International Society for the Arts, Sciences, and Technology, 30(2): 97–103, MIT Press.
  • –––, 1997b, “A Computer Scientist's View of Life, the Universe, and Everything,” Lecture Notes in Computer Science, 1337: 201–208.
  • Schnelle, H., 1976, “Information,” in J. Ritter (ed.), Historisches Wörterbuch der Philosophie, IV [Historical dictionary of philosophy, IV] (pp. 116–117). Stuttgart, Germany: Schwabe.
  • Searle, J.R., 1990, “Is the Brain a Digital Computer?” Proceedings and Addresses of the American Philosophical Association, 64: 21–37.
  • Seiffert, H., 1968, Information über die Information [Information about information] Munich: Beck.
  • Shannon, C. 1948, “A Mathematical Theory of Communication,” Bell System Technical Journal, 27: 379–423, 623–656.
  • Shannon, C. E. and W. Weaver, 1949, The Mathematical Theory of Communication, Urbana: University of Illinois Press.
  • Simon, J.C. and Olivier Dubois, 1989, “Number of Solutions of Satisfiability Instance—Applications to Knowledge Bases,” International Journal of Pattern Recognition and Artificial Intelligence (IJPRAI), 3(1):53–65.
  • Singh, S., 1999, The Code Book: The Science of Secrecy from Ancient Egypt to Quantum Cryptography, New York: Anchor Books.
  • Solomonoff, R.J., 1960, “A preliminary report on a general theory of inductive inference,” Techical Report ZTB-138, Zator.
  • –––, 1964a, “A Formal Theory of Inductive Inference Part I,” Information and Control, 7(1): 1–22.
  • –––, 1964b, “A Formal Theory of Inductive Inference Part II,” Information and Control, 7(2): 224–254.
  • –––, 1997, “The Discovery of Algorithmic Probability,” Journal of Computer and System Sciences, 55(1): 73–88.
  • Stalnaker, R., 1984, Inquiry, Cambridge, MA: MIT Press.
  • Stifel, M. 1544, Arithmetica integra, Nuremberg: Johan Petreium.
  • Tarski, A. 1944, “The Semantic Conception of Truth,” Philosophy and Phenomenological Research, 4: 13–47.
  • Valiant, L. G., 2007, “Evolvability,” Journal of the ACM, 56(1): Article 3.
  • Vereshchagin, N.K. and P.M.B. Vitányi, 2004, “Kolmogorov's Structure functions and model selection,” IEEE Transactions on Information Theory, 50(12): 3265–3290.
  • Vitányi, P.M.B., 2006, “Meaningful information,” IEEE Transactions on Information Theory, 52(10): 4617–4626. [available online].
  • de Vogel, C.J., 1974, Plato: De filosoof van het transcendente, Baarn: Het Wereldvenster, 1968.
  • Wallace, C. S., 2005, Statistical and Inductive Inference by Minimum Message Length, Springer, Berlin.
  • Wheeler, J. A., 1990, “Information, physics, quantum: The search for links,” in W. Zurek (ed.) Complexity, Entropy, and the Physics of Information, Redwood City, CA: Addison-Wesley.
  • Windelband, W., 1921, Lehrbuch der Geschichte der Philosophie, Tübingen.
  • Wolff, J.G., 2006, Unifying Computing and Cognition,
  • Wolfram, S., 2002, A New Kind of Science, Wolfram Media Inc.
  • Wolpert, D.H. and W. Macready, 2007, “Using self-dissimilarity to quantify complexity,” Complexity, 12(3): 77–85.
  • Zuse, K., 1969, Rechnender Raum, Friedrich Vieweg & Sohn, Braunschweig. Translated as “Calculating Space” MIT Technical Translation AZT-70-164-GEMIT, MIT (Proj. MAC), Cambridge, MA, Feb. 1970.

Other Internet Resources

Copyright © 2012 by
Pieter Adriaans <>

This is a file in the archives of the Stanford Encyclopedia of Philosophy.
Please note that some links may no longer be functional.