Statistical mechanics is not natural

My mom asked me once how to describe my job in brief, in case she needs to explain it to others. I said I am a "statistical mechanic": when other people's statistics break, I come fix them. That's one fat white lie because I don't actually know much statistics, and because what I do is conceptually different from statistics.

In statistics, one gets a hold of a finite sample of data points and analyzes it to make qualified claims about the population from which the sample was drawn. This is a very important interface we have with the real world: you will never read all the books, meet all the people, or go to all the places. As humans, we necessarily have to make decisions based on finite information, and are in fact pretty good at that. Just think of the last time you watched the first half a movie and your brain naturally autocompleted the second half, having trained on decades of story tropes in Hollywood products.

This comes in tension with humans being bad at mathematical probabilities. Just recently the internet made fun of "the obscure maths theorem" described by The Guardian. While the "obscure" Bayes theorem is very important for devising public health policy for Covid and other conditions, the gears in my head start screeching when I try to work out the posterior probabilities. This is not my unique failing: a number of psychological studies catch these systematic fallacies of probabilistic reasoning across most subjects.

The paper that inspired me to write this attempts to reconcile this natural inference of events with the ubiquitous fallacies of probabilities. It posits that the brain is not a probability computer, but a Bayesian sampler. Instead of picking events from the whole probability distribution, the brain can judge the relative probability of similar events and navigate towards the most likely one. This navigation is similar to local search algorithms such as gradient descent or Markov Chain Monte Carlo (MCMC). This view explains a bunch of observations of people's systematic failures at probability, such as comparing the frequencies of very disparate events. Under a local search in the event space, crossing the valley between two disjoint high-probability hills becomes near-impossible without external prompting, which is cleverly illustrated by the word recall tasks presented by the paper authors to the reader.

We are taught in classes that all probability distributions have to be normalized on existential grounds (after all, something has to happen in every experiment). When you write a formula for a probability distribution, it usually has an obnoxious scalar prefactor in front that you need to carry around in your derivations to "preserve normalization". For common distributions, the prefactor is easy enough to look up on Wikipedia (it's year 2021 today), but for more complex cases it is a pain to calculate. It is similarly difficult to get the probabilities of groups of events to compare which group is more common.

Herein we approach the problem of probability from the other side: instead of inferring from a sample, we assume a certain model of how the degrees of freedom in the problem interact and try to work out the consequences of such a model by summing over all realizations of these degrees of freedom. This exercise, termed statistical mechanics, is essentially glorified accounting. Just like it is hard for our brains to keep track of a probability distribution explicitly, it is hard to keep it in computer memory or written down on paper. Absolute probabilities resist our efforts to count them with sums and integrals, and more often than not require various approximations.

One common way to deal with these exhaustive sums is to not deal with them, in favor of finite samples. Just like the brain can draw samples from the distribution without knowing its normalization, so can the MCMC algorithm. Just like the brain is subject to fallacies of comparing disjoint groups, MCMC is plagued with endless issues in trying to detect rare transitions between high-probability regions. Computational scientists are working tirelessly on advanced sampling algorithms that tackle these sampling issues, with most methods appropriate for one problem but not the next one. Focusing on sampling methods like MCMC or on other accounting tricks is largely a matter of one's personal scientific taste.

Most of my published work so far is concerned with devising clever ways to do the accounting with reasonable approximations. The idea of Systems Physics is in exploring the full space of possible design solutions, instead of relying on incremental improvements that our brains are predisposed to. The "physics" part of it comes from identifying statistical correlations with causal interactions; in words of my PhD advisor, we can treat the emergent large-scale design patterns as concrete abstractions, or real measurable forces that push the space of solutions in a particular direction, while being invisible in design realizations.

Statistical mechanics, this bizarre cocktail of accounting tricks that conflate correlation with causation, does not seem natural at all and is contrary to our lived experiences. But on the other side, I find the "naturalness" to be quite a weak reason to do or not do something. Flying in a metal box in the sky is not natural, just like traveling over land along long suspiciously smooth ribbons that someone laid out just so. "Not natural" means that as humans we have strong built-in biological instincts to move through vast combinatorial probabilistic spaces, but not to describe the whole spaces. Of course, as with any discipline, with training one's Bayesian brain can learn the habits of mind and patterns of work to productively analyze such spaces. The reasons to do something then depend not on whether it is natural, but whether it is good or bad; useful or... merely clever.

Previous
Previous

Lev Landau and his free energy

Next
Next

Statistical mechanics is boring