|This is a file in the archives of the Stanford Encyclopedia of Philosophy.|
From the seventeenth century onward it was realized that material systems could often be described by a small number of descriptive parameters that were related to one another in simple lawlike ways. These parameters referred to geometric, dynamical and thermal properties of matter. Typical of the laws was the ideal gas law that related product of pressure and volume of a gas to the temperature of the gas.
It was soon realized that a fundamental concept was that of equilibrium. Left to themselves systems would change the values of their parameters until they reached a state where no further changes were observed, the equilibrium state. Further, it became apparent that this spontaneous approach to equilibrium was a time-asymmetric process. Uneven temperatures, for example, changed until temperatures were uniform. This same "uniformization" process held for densities.
Profound studies by S. Carnot of the ability to extract mechanical work out of engines that ran by virtue of the temperature difference between boiler and condenser led to the introduction by R. Clausius of one more important parameter describing a material system, its entropy. How was the existence of this simple set of parameters for describing matter and the lawlike regularities connecting them to be explained? What accounted for the approach to equilibrium and its time asymmetry? That the heat content of a body was a form of energy, convertible to and from mechanical work formed one fundamental principle. The inability of an isolated system to spontaneously move to a less orderly state, to lower its entropy, constituted another. But why were these laws true?
One approach, that of P. Duhem and E. Mach and the "energeticists," was to insist that these principles were autonomous phenomenological laws that needed no further grounding in some other physical principles. An alternative approach was to claim that the energy in a body stored as heat content was an energy of motion of some kind of hidden, microscopic constituents of the body, and to insist that the laws noted, the thermodynamic principles, needed to be accounted for out of the constitution of the macroscopic object out of its parts and the fundamental dynamical laws governing the motion of those parts. This is the kinetic theory of heat.
Early work on kinetic theory by W. Herepath and J. Waterston was virtually ignored, but the work of A. Krönig made kinetic theory a lively topic in physics. J. C. Maxwell made a major advance by deriving from some simple postulates a law for the distribution of velocities of the molecules of a gas when it was in equilibrium. Both Maxwell and L. Boltzmann went further, and in different, but related, ways derived an equation for the approach to equilibrium of a gas. The equilibrium distribution earlier found by Maxwell could then be shown to be a stationary solution of this equation.
This early work met with vigorous objections. H. Poincaré had proven a recurrence theorem for bounded dynamical systems that seemed to contradict the monotonic approach to equilibrium demanded by thermodynamics. Poincaré's theorem showed that any appropriately bounded system in which energy was conserved would of necessity, over an infinite time, return an infinite number of times to states arbitrarily close to the initial dynamical state in which the system was started. J. Loschmidt argued that the time irreversibility of thermodynamics was incompatible with the symmetry under time reversal of the classical dynamics assumed to govern the motion of the molecular constituents of the object.
Partly driven by the need to deal with these objections explicitly probabilistic notions began to be introduced into the theory by Maxwell and Boltzmann. Both realized that equilibrium values for quantities could be calculated by imposing a probability distribution over the microscopic dynamical states compatible with the constraints placed on the system, and identifying the observed macroscopic values with averages over quantities definable from the microscopic states using that probability distribution. But what was the physical justification for this procedure?
Both also argued that the evolution toward equilibrium demanded in the non-equilibrium theory could also be understood probabilistically. Maxwell, introducing the notion of a "demon" who could manipulate the microscopic states of a system, argued that the law of entropic increase was only probabilistically valid. Boltzmann offered a probabilistic version of his equation describing the approach to equilibrium. Without considerable care, however, the Boltzmannian picture can still appear contrary to the objections from recurrence and reversibility interpreted in a probabilistic manner.
Late in his life Boltzmann responded to the objections to the probabilistic theory by offering a time-symmetric interpretation of the theory. Systems were probabilistically almost always close to equilibrium. But transient fluctuations to non-equilibrium states could be expected. Once in a non-equilibrium state it was highly likely that both after and before that state the system was closer to equilibrium. Why then did we live in a universe that was not close to equilibrium? Perhaps the universe was vast in space and time and we lived in a "small" non-equilibrium fluctuational part of it. We could only find ourselves in such an "improbable" part, for only in such a region could sentient beings exist. Why did we find entropy increasing toward the future and not toward the past? Here the answer was that just as the local direction of gravity defined what we meant by the downward direction of space, the local direction in time in which entropy was increasing fixed what we took to be the future direction of time.
In an important work (listed in the bibliography), P. and T. Ehrenfest also offered a reading of the Boltzmann equation of approach to equilibrium that avoided recurrence objections. Here the solution of the equation was taken to describe not "the overwhelmingly probable evolution" of a system, but, instead, the sequence of states that would be found overwhelmingly dominant at different times in a collection of systems all started in the same non-equilibrium condition. Even if each individual system approximately recurred to its initial conditions, this "concentration curve" could still show monotonic change toward equilibrium from an initial non-equilibrium condition.
Many of the philosophical issues in statistical mechanics center around the notion of probability as it appears in the theory. How are these probabilities to be understood? What justified choosing one probability distribution rather than another? How are the probabilities to be used in making predictions within the theory? How are they to be used to provide explanations of the observed phenomena? And how are the probability distributions themselves to receive an explanatory account? That is, what is the nature of the physical world that is responsible for the correct probabilities playing the successful role that they do play in the theory?
Philosophers concerned with the interpretation of probability are usually dealing with the following problem: Probability is characterized by a number of formal rules, the additivity of probabilities for disjoint sets of possibilities being the most central of these. But what ought we to take the formal theory to be a theory of? Some interpretations are "objectivist," taking probabilities to be, possibly, frequencies of outcomes, or idealized limits of such frequencies or perhaps measures of "dispositions" or "propensities" of outcomes in specified test situations.
Other interpretations are "subjectivist," taking probabilities to be measures of "degrees of belief," perhaps evidenced in behavior in situations of risk by choices of available lotteries over outcomes. Still another interpretation reads probabilities as measures of a kind of "partial logical entailment" among propositions.
Although subjectivist (or, rather, logical) interpretations of probability in statistical mechanics have been proffered (by E. Jaynes, for example), most interpreters of the theory opt for an objectivist interpretation of probability. This still leaves open, however, important questions about just what "objective" feature the posited probabilities of the theory are and how nature contrives to have such probabilities evinced in its behavior.
Philosophers dealing with statistical explanation have generally focussed on everyday uses of probability in explanation, or the use of probabilistic explanations in such disciplines as the social sciences. Sometimes it has been suggested that to probabilistically explain an outcome is to show it likely to have occurred given the background facts of the world. In other cases it is suggested that to explain an outcome probabilistically is to produce facts which raise the probability of that outcome over what it would have been those facts being ignored. Still others suggest that probabilistic explanation is showing an event to have been the causal outcome of some feature of the world characterized by a probabilistic causal disposition.
The explanatory patterns of non-equilibrium statistical mechanics place the evolution of the macroscopic features of matter in a pattern of probabilities over possible microscopic evolutions. Here the types of explanation offered do fit the traditional philosophical models. The main open questions concern the explanatory grounds behind the posited probabilities. In equilibrium theory, as we shall see, the statistical explanatory pattern has a rather different nature.
The standard method for calculating the properties of an energetically isolated system in equilibrium was initiated by Maxwell and Boltzmann and developed by J. Gibbs as the microcanonical ensemble. Here a probability distribution is imposed over the set of microscopic states compatible with the external constraints imposed on the system. Using this probability distribution, average values of specified functions of the microscopic conditions of the gas (phase averages) are calculated. These are identified with the macroscopic conditions. But a number of questions arise: Why this probability distribution? Why average values for macroscopic conditions? How do phase averages related to measured features of the macroscopic system?
Boltzmann thought of the proper average values to identify with macroscopic features as being averages over time of quantities calculable from microscopic states. He wished to identify the phase averages with such time averages. He realized that this could be done if a system started in any microscopic state eventually went through all the possible microscopic states. That this was so became known as the ergodic hypothesis. But it is provably false on topological and measure theoretic grounds. A weaker claim, that a system started in any state would go arbitrarily close to each other microscopic state is also false, and even if true would not do the job needed.
The mathematical discipline of ergodic theory developed out of these early ideas. When can a phase average be identified with a time average over infinite time? G. Birkhoff (with earlier results by J. von Neumann) showed that this would be so for all but perhaps a set of measure zero of the trajectories (in the standard measure used to define the probability function) if the set of phase points was metrically indecomposable, that is if it could not be divided into more than one piece such that each piece had measure greater than zero and such that a system started in one piece always evolved to a system in that piece.
But did a realistic model of a system ever meet the condition of metric indecomposability? What is needed to derive metric indecomposability is sufficient instability of the trajectories so that the trajectories do not form groups of non-zero measure which fail to wander sufficiently over the entire phase region. The existence of a hidden constant of motion would violate metric indecomposability. After much arduous work, culminating in that of Ya. Sinai, it was shown that some "realistic" models of systems, such as the model of a gas as "hard spheres in a box," conformed to metric indecomposability. On the other hand another result of dynamical theory, the Kolmogorov-Arnold-Moser (KAM) theorem shows that more realistic models (say of molecules interacting by means of "soft" potentials) are likely not to obey ergodicity in a strict sense. In these cases more subtle reasoning (relying on the many degrees of freedom in a system composed of a vast number of constituents) is also needed.
If ergodicity holds what can be shown? It can be shown that for all but a set of measure zero of initial points, the time average of a phase quantity over infinite time will equal its phase average. It can be shown that for any measurable region the average time the system spends in that region will be proportional to the region's size (as measured by the probability measure used in the microcanonical ensemble). A solution to a further problem is also advanced. Boltzmann knew that the standard probability distribution was invariant under time evolution given the dynamics of the systems. But how could we know that it was the only such invariant measure? With ergodicity we can show that the standard probability distribution is the only one that is so invariant, at least if we confine ourselves to probability measures that assign probability zero to every set assigned zero by the standard measure.
We have, then, a kind of "transcendental deduction" of the standard probability assigned over microscopic states in the case of equilibrium. Equilibrium is a time-unchanging state. So we demand that the probability measure by which equilibrium quantities are to be calculated be stationary in time as well. If we assume that probability measures assigning non-zero probability to sets of states assigned zero by the usual measure can be ignored, then we can show that the standard probability is the only such time invariant probability under the dynamics that drives the individual systems from one microscopic state to another.
As a full "rationale" for standard equilibrium statistical mechanics, however, much remains questionable. There is the problem that strict ergodicity is not true of realistic systems. There are many problems encountered if one tries to use the rationale as Boltzmann hoped to identify phase averages with measured quantities relying on the fact that macroscopic measurements take "long times" on a molecular scale. There are the problems introduced by the fact that all of the mathematically legitimate ergodic results are qualified by exceptions for "sets of measure zero." What is it physically that makes it legitimate to ignore a set of trajectories just because it has measure zero in the standard measure? After all, such neglect leads to catastrophically wrong predictions when there really are hidden, global constants of motion. In proving the standard measure uniquely invariant, why are we entitled to ignore probability measures that assign non-zero probabilities to sets of conditions assigned probability zero in the standard measure? After all, it was just the use of that standard measure that we were trying to justify in the first place.
In any case, equilibrium theory as an autonomous discipline is misleading. What we want, after all, is a treatment of equilibrium in the non-equilibrium context. We would like to understand how and why systems evolve from any initially fixed macroscopic state, taking equilibrium to be just the "end point" of such dynamic evolution. So it is to the general account of non-equilibrium we must turn if we want a fuller understanding of how this probabilistic theory is functioning in physics.
Boltzmann provided an equation for the evolution of the distribution of the velocities of particles from a non-equilibrium initial state for dilute gases, the Boltzmann equation. A number of subsequent equations have been found for other types of systems, although generalizing to, say, dense gases has proven intractable. All of these equations are called kinetic equations.
How may they be justified and explained? In the discussions concerning the problem of irreversibility that ensued after Boltzmann's work, attention was focussed on a fundamental assumption he made: the hypothesis with regard to collision numbers. This time-asymmetrical assumption posited that the motions of the molecules in a gas were statistically uncorrelated prior to the molecules colliding. In deriving any of the other kinetic equations a similar such posit must be made. Some general methods for deriving such equations are the master equation approach and an approach that relies upon coarse-graining the phase space of points representing the micro-states of the system into finite cells and assuming fixed transition probabilities from cell to cell (Markov assumption). But such an assumption was not derived from the underlying dynamics of the system, and, for all they knew so far, might have been inconsistent with that dynamics.
A number of attempts have been made to do without such an assumption and to derive the approach to equilibrium out of the underlying dynamics of the system. Since that dynamics is invariant under time reversal and the kinetic equations are time asymmetric, time asymmetry must be put into the explanatory theory somewhere.
One approach to deriving the kinetic equations relies upon work which generalizes ergodic theory. Relying upon the instability of trajectories, one tries to show that a region of phase points representing the possible micro-states for a system prepared in a non-equilibrium condition will, if the constraints are changed, eventually evolve into a set of phase points that is "coarsely" spread over the entire region of phase space allowed by the changed constraints. The old region cannot "finely" cover the new region by a fundamental theorem of dynamics (Liouville's theorem). But, in a manner first described by Gibbs, it can cover the region in a coarse-grained sense. To show that a collection of points will spread in such a way (in the infinite time limit at least) one tries to show the system possessed of an appropriate "randomization" property. In order of increasing strength such properties include weak-mixing, mixing, being a K system or being a Bernoulli system. Other, topological as opposed to measure-theoretic, approaches to this problem exist as well.
As usual, many caveats apply. Can the system really be shown to have such a randomizing feature (in the light of the KAM theorem, for example)? Are infinite time limit results relevant to our physical explanations? If the results are finite time, are they relativized in the sense of saying that they only hold for some coarse partitionings of the system rather than to those of experimental interest?
Most importantly, mixing and its ilk cannot be the whole story. All the results of this theory are time symmetric. To get time asymmetric results, and to get results that hold in finite times and which show evolution in the manner described by the kinetic equation over those finite times, requires an assumption as well about how the probability is to be distributed over the region of points allowed as representing the system at the initial moment.
What must that probability assumption look like and how may it be justified? These questions were asked, and partly explored, by N. Krylov. Attempts at rationalizing this initial probability assumption have ranged from Krylov's own suggestion that it is the result of a non-quantum "uncertainty" principle founded physically on the modes by which we prepare systems, to the suggestion that it is the result of an underlying stochastic nature of the world described as in the Ghirardi-Rimini-Weber approach to understanding measurement in quantum mechanics. The status and explanation of the initial probability assumption remains the central puzzle of non-equilibrium statistical mechanics.
There are other approaches to understanding the approach to equilibrium at variance with the approaches that rely on mixing phenomena. O. Lanford, for example, has shown that for an idealized infinitely dilute gas one can show, for very small time intervals, an overwhelmingly likely behavior of the gas according to the Boltzmann equation. Here the interpretation of that equation by the Ehrenfests, the interpretation suitable to the mixing approach, is being dropped in favor of the older idea of the equation describing the overwhelmingly probable evolution of a system. This derivation has the virtue of rigorously generating the Boltzmann equation, but at the cost of applying only to one severely idealized system and then only for a very short time (although the result may be true, if unproven, for longer time scales). Once again an initial probability distribution is still necessary for time asymmetry.
The thermodynamic principles demand a world in which physical processes are asymmetric in time. Entropy of an isolated system may increase spontaneously into the future but not into the past. But the dynamical laws governing the motion of the micro-constituents are, at least on the standard views of those laws as being the usual laws of classical or quantum dynamics, time reversal invariant. Introducing probabilistic elements into the underlying theory still does not by itself explain where time asymmetry gets into the explanatory account. Even if, following Maxwell, we take the Second Law of thermodynamics to be merely probabilistic in its assertions, it remains time asymmetric.
Throughout the history of the discipline suggestions have often been made to the effect that some deep, underlying dynamical law itself introduces time asymmetry into the motion of the micro-constituents.
Other proposals take the entropic change of a system to be mediated by an actually uneliminable "interference" into the system of random causal influences from outside the system. It is impossible, for example, to genuinely screen the system from subtle gravitational influences from the outside. The issue of the role of external interference in the apparently spontaneous behavior of what is idealized as an isolated system has been much discussed. Here the existence of special systems (such as spin echo systems encountered in nuclear magnetic resonance) plays a role in the arguments. For these systems seem to display spontaneous approach to equilibrium when isolated, yet can have their apparent entropic behavior made to "go backward" with an appropriate impulse from out side the system. This seems to show entropic increase without the kind of interference from the outside that genuinely destroys the initial order implicit in the system. In any case, it is hard to see how outside interference would do the job of introducing time asymmetry unless such asymmetry is put in "by hand" in characterizing that interference.
It was Boltzmann who first proposed a kind of "cosmological" solution to the problem. As noted above he suggested a universe overall close to equilibrium with "small" sub-regions in fluctuations away from that state. In such a sub-region we would find a world far from equilibrium. Introducing the familiar time-symmetric probabilistic assumptions, it becomes likely that in such a region one finds states of lower entropy in one time direction and states of higher entropy in the other. Then finish the solution by introducing the other Boltzmann suggestion that what we mean by the future direction of time is fixed as that direction of time in which entropy is increasing.
Current cosmology sees quite a different universe than that posited by Boltzmann. As far as we can tell the universe as a whole is in a highly non-equilibrium state with parallel entropic increase into the future everywhere. But the structure of the cosmos as we know it allows for an alternative solution to the problem of the origin of time asymmetry in thermodynamics. The universe seems to be spatially expanding, with an origin some tens of billions of years ago in an initial singularity, the Big Bang. Expansion, however, by itself does not provide the time asymmetry needed for thermodynamics, for an expanding universe with static or decreasing entropy is allowed by physics. Indeed, in some cosmological models in which the universe contracts after expanding, it is usually, though not always, assumed that even in contraction entropy continues to increase.
The source of entropic asymmetry is sought, rather, in the physical state of the world at the Big Bang. Matter "just after" the Big Bang is usually posited to be in a state of maximum entropy – to be in thermal equilibrium. But this does not take account of the structure of "space itself," or, if you wish, of the way in which the matter is distributed in space and subject to the universal gravitational attraction of all matter for all other matter. A world in which matter is distributed with uniformity is one of low entropy. A high entropy state is one in which we find a clustering of matter into dense regions with lots of empty space separating these regions. This deviation from the usual expectation – spatial uniformity as the state of highest entropy – is due to the fact that gravity, unlike the forces governing the interaction of molecules in a gas for example, is a purely attractive force.
One can then posit an initial "very low entropy" state for the Big Bang, with the spatial uniformity of matter providing an "entropic resevoir." As the universe expands, matter goes from a uniformly distributed state with temperature also uniform to one in which matter is highly clumped into hot stars in an environment of cold empty space. One then has the universe as we know it, with its thermally highly non-equilibrium condition. "Initial low entropy," then, will be a state in the past not (as far as we know) matched by any singularity of any kind, much less one of low entropy, in the future. If one conditionalizes on that initial low entropy state one then gets, using the time symmetric probabilities of statistical mechanics, a prediction of a universe whose entropy increased in time.
But it is not, of course, the entropy of the whole universe with which the Second Law is concerned, but, rather, that of "small" systems temporarily energetically isolated from their environments. One can argue, in a manner tracing back to H. Reichenbach, that the entropic increase of the universe as a whole will lead, again using the usual time symmetric probabilistic posits, to a high probability that a random "branch system" will show entropic increase parallel to that of the universe and parallel to that of other branch systems. Most of the arguments in the literature that this will be so are flawed, but the inference is reasonable nonetheless.
Positing initial low entropy for the Big Bang gives rise to its own set of "philosophical" questions: Given the standard probabilities in which high entropy is overwhelmingly probable, how could we explain the radically "unexpected" low entropy of the initial state? Indeed, can we apply probabilistic reasoning appropriate for systems in the universe as we know it to an initial state for the universe as a whole? The issues here are reminiscent of the old debates over the teleological argument for the existence of God.
It comes as no surprise that the relationship of the older thermodynamic theory to the new statistical mechanics on which it is "grounded" is one of some complexity.
The older theory had no probabilistic qualifications to its laws. But as Maxwell was clearly aware, it could not then be "exactly" true if the new probabilistic theory correctly described the world. One can either keep the thermodynamic theory in its traditional form and carefully explicate the relationship its principles bear to the newer probabilistic conclusions, or one can, as has been done in deeply interesting ways, generate a new "statistical thermodynamics" that imports into the older theory probabilistic structure.
Conceptually the relationship of older to newer theory is quite complex. Concepts of the older theory (volume, pressure, temperature, entropy) must be related to the concepts of the newer theory (molecular constitution, dynamical concepts governing the motion of the molecular constituents, probabilistic notions characterizing either the states of an individual system or distributions of states over an imagined ensemble of systems subject to some common constraints).
A single term of the thermodynamic theory such as entropy will be associated with a wide variety of concepts defined in the newer account. There is, for example, Boltzmann entropy which is the property of a single system defined in terms of the spatial and momentum distribution of its molecules. On the other hand there are the Gibbs'entropies, definable out of the probability distribution over some Gibbsian ensemble of systems. Adding even more complications there is, for example, Gibbs' fine grained entropy which is defined by the ensemble probability alone and is very useful in characterizing equilibrium states and Gibbs' coarse grained entropy whose definition requires some partitioning of the phase space into finite cells as well as the original probability distribution and which is a useful concept in characterizing approach to equilibrium from the ensemble perspective. In addition to these notions which are measure theoretic in nature, there are topological notions which can play the role of a kind of entropy as well.
Nothing in this complexity stands in the way of claiming that statistical mechanics describes the world in a way that explains why thermodynamics works and works as well as it does. But the complexity of the inter-relationship between the theories should make the philosopher cautious in using this relationship as a well understood and simple paradigm of inter-theoretic reduction.
It is of some philosophical interest that the relationship of thermodynamics to statistical mechanics shows some similarity to aspects uncovered in functionalist theories of the mind-body relationship. Consider, for example, the fact that systems of very different physical constitutions (say a gas made up of molecules interacting by means of forces on the one hand and on the other hand radiation whose components are energetically coupled wave lengths of light) can share thermodynamic features. They can, for example, be at the same temperature. Physically this means that the two systems, if initially in equilibrium and then energetically coupled, will retain their original equilibrium conditions. The parallel with the claim that a functionally defined mental state (a belief, say) can be instantiated in a wide variety of physical devices is clear.
We have noted that it was Boltzmann who first suggested that our very concept of the future direction of time was fixed by the direction in time in which entropy was increasing in our part of the universe. Numerous authors have followed up this suggestion and the "entropic" theory of time asymmetry remains a much debated topic in the philosophy of time.
We must first ask what the theory is really claiming. In a sensible version of the theory there is no claim being made to the effect that we find out the time order of events by checking the entropy of systems and taking the later event as the one in which some system has its higher entropy. The claim is, rather, that it is the facts about the entropic asymmetry of systems in time that "ground" the phenomena that we usually think of as marking out the asymmetrical nature of time itself.
What are some features whose intuitive temporal asymmetry we think of as, perhaps, "constituting" the asymmetrical nature of time? There are asymmetries of knowledge: We have memories and records of the past, but not of the future. There are asymmetries of determination: We think of causation as going from past through present to future, and not of going the other way round. There are asymmetries of concern: We may regret the past, but we anxiously anticipate the future. There are alleged asymmetries of "determinateness" of reality: It is sometimes claimed that past and present have determinate reality, but that the future, being a realm of mere possibilities, has no such determinate being at all.
The entropic theory in its most plausible formulation is a claim to the effect that we can explain the origin of all of these intuitive asymmetries by referring to fact about the entropic asymmetry of the world.
This can be best understood by looking at the very analogy used by Boltzmann: the gravitational account of up and down. What do we mean by the downward direction at a spatial location? All of the phenomena by which we intuitively identify the downward direction (as the direction in which rocks fall, for example) receive an explanation in terms of the spatial direction of the local gravitational force. Even our immediate awareness of which direction is down is explainable in terms of the effect of gravity on the fluid in our semi-circular canals. It comes as no shock to us at all that "down" for Australia is in the opposite direction as "down" for Chicago. Nor are we dismayed to be told that in outer space, far from a large gravitating object such as the Earth, there is no such thing as the up-down distinction and no direction of space which is the downward direction.
Similarly the entropic theorist claims that it is the entropic features that explain the intuitive asymmetries noted above, that in regions of the universe in which the entropic asymmetry was counter-directed in time the past-future directions of time would be opposite, and that in a region of the universe without an entropic asymmetry neither direction of time would count as past or as future.
The great problem remains in trying to show that the entropic asymmetry is explanatorily adequate to account for all the other asymmetries in the way that the gravitational asymmetry can account for the distinction of up and down. Despite many interesting contributions to the literature on this, the problem remains unresolved.
Table of Contents
First published: April 12, 2001
Content last modified: April 12, 2001