What Andrey Kolmogorov lets you do

Jun 13

Our first forays into the study of probability usually start with elementary examples of a finite number of equally probable events. A coin flip has two outcomes; let's assign a probability of 1/2 to each. A roll of a standard d6 die has six outcomes, so we assign 1/6. Once we (or our students) understand how that works, we can start adding some combinatorics. What is the probability of getting heads at least twice on 5 flips? What are the most likely outcomes of the sum of two independent d6 dice? The latter computation is important for the players (and designers) of board games, such as Catan. The most likely outcome is 7, which activates the robber mechanic. The runner-ups are 6 and 8, printed on resource tiles which thus are the most lucrative ones to settle. Still, tiles with 2 and 12 will fire up occasionally, and the most likely outcome of 7 doesn't happen all that often.

This elementary logic is useful in practical cases, but what if you need something more complicated? What if you have good grounds to believe that chances of different events are not equal? What if you have an infinite number of outcomes? What if the space of outcomes is continuous and maybe high-dimensional? We need some more sophisticated understanding of what "probability" is. In my probability course, this moment came about a third of the way in, when we learned about something called Kolmogorov's axiomatics. When I saw this formalism, it appeared simultaneously too restrictive and too generic. It didn't help that my instructor was not eager to explain whatever the "sigma algebras" are. Ten years ago, I didn't get the main idea.

For a long time, probability theory was not treated as a serious branch of mathematics. Partly, because it was often used to reason about such frivolous and marginally moral subjects as gambling. Partly, because it tried to tame the random and unpredictable world, which seemingly exists away from mathematically rigorous structures. But by the early 20th century, the rise of statistical analysis and statistical mechanics created a demand to cast probability theory in the same language as the rest of mathematics. Out of several attempts by different researchers, the one formulated in the early 1930s by the Russian mathematician Andrey Kolmogorov ended up the most complete and accepted. I still find it ironic that the word "Kolmogorov" is in the name of the Russian Wikipedia page, but in the English one you have to read the first line of actual article content. Kolmogorov was a very respected and prolific scientist, very fond of both creating scientific schools and training students from high school to PhD level. Arguably, discovering axiomatic probability theory early in his life turbocharged his career and gave him enough resources to embark on massive reforms of Russian mathematics education. But this here is not a big story about Kolmogorov's life. It is merely a small story about his axioms.

So what are these axioms and why do we care so much? Kolmogorov posits the existence of an event space equipped with a sigma algebra; in other words, rules on how to combine events. To each event one can assign a measure called probability, which needs to fulfill three axioms. First, the probability of any event is non-negative. Second, the probability of any of the events happening is normalized, usually to be 1. Third, the probability that either of two incompatible events happen is the sum of respective probabilities. There are many different ways to assign probabilities that would fulfill these axioms: one has to make choices.

But where there are choices, there is model building. And where there is model building, there is a playground. To me, this playground allows making a statistical model, or statistical mechanics, of any phenomenon. From a practical standpoint, the axioms expose affordances - things you can do with them, but don't have to. Let's deal with the axiom affordances one by one.

What kind of mapping can one devise from the event space to probability measures that is always non-negative? One popular choice is the exponential family of functions. To each event, we assign an "energy", which can be any real number we can pick based physical principles. We then choose the probability of the event to be proportional to the exponential of energy (maybe with a common prefactor). Since energies are always real, the probabilities will always be non-negative, thus automatically fulfilling the first axiom. Why the exponential function of all the things? Well, because it naturally pops out of thermal equilibrium arguments, or out of maximal entropy arguments! But if those arguments don't apply for your problem, you can always pick something else.

The second axiom prescribes a normalization of the whole probability distribution, which is conventionally chosen to be 1, or 100%. For most choices of probability distributions, this adds a normalization constant in front of the probability expression, as I already lamented before. This also requires coining unwieldy terminology like "statistical weight" or "Boltzmann factor" when you don't know the normalization yet (or maybe if you will never know it). But if we shift a bit of intellectual commitment from knowing the normalization a priori to knowing the exact probabilities a priori, we can write down the model and formulate the computational task of finding the normalization. Now just call the normalization "partition function" and you have statistical mechanics!

The third axiom tells us how to combine events into larger groups - but why would you want that? The main reason is since there are fewer groups than events within them, it is easier to compare probabilities of groups. If you managed to come up with a complicated index of all "elementary" events, you need a much simpler index for groups of events - what a relief. But wait, this directly underlies the idea of coarse graining that revolutionized statistical mechanics in the mid-20th century! Above we coarse grained the roll of two dice into a single variable to understand the valuation of Catan tiles. But with a few more computational tricks, coarse graining opens up the study of any kind of collective behavior that emerges from small events. The axiom just tells you that you can combine the probabilities you made up, and that allows you to combine them in all kinds of funky ways!

Having to obey the three axioms provides a crucial scaffolding for our theorizing. This is fundamentally liberating, as it sets the boundaries of your playground. If you know the boundaries, you can play within them, or you can consciously step beyond the boundaries - if you have a good reason to. Above I consciously relaxed the second axiom about a priori known normalization, but other works relax other axioms. Sometimes, probabilities can be negative. Sometimes, the elementary events can belong to multiple groups. Sometimes, probabilities of events add up in strange ways, like in quantum mechanics. But surprisingly often, all we need to be creative and productive with our theories is the 1930s framework with some extra accounting gimmicks. And having this sort of a creative space would be very much in Andrey Kolmogorov's spirit.

statistical mechanics

Andrei Klishin

What Andrey Kolmogorov lets you do

Fake video game physics

Pan demos