Abstraction's End

home page

original site

twitter

thoughts

about

key

Links are formatted like this.

Contextuals are formatted like this (hover over me)congratulations!, and offer small bits of context around phrasing or word use.

Hovertips are formatted like this (hover over me)$S$Quantum electrodynamic

action functional $= \displaystyle\int \psi^\dagger \gamma^0(i$$\gamma^\mu$Dirac matrices$D_\mu$gauge covariant

derivative$- m)\psi$

$\displaystyle-\frac{1}{4}$$\displaystyle F^{\mu\nu}$electromagnetic

field tensor$\displaystyle F_{\mu\nu}\ d^4x $, and offer media elements or small bits of extraneous information.

action functional $= \displaystyle\int \psi^\dagger \gamma^0(i$$\gamma^\mu$Dirac matrices$D_\mu$gauge covariant

derivative$- m)\psi$

$\displaystyle-\frac{1}{4}$$\displaystyle F^{\mu\nu}$electromagnetic

field tensor$\displaystyle F_{\mu\nu}\ d^4x $, and offer media elements or small bits of extraneous information.

Sidenotes are indexed as a, b, c, ..., and are visible to the right of their index.

Endnotes are indexed as 1, 2, 3, ..., which indices serve as bidirectional links to and from their location at the bottom of the page.

Extra endnotes are indexed as A, B, C, ..., and are revealed in the text by clicking the button at the bottom of the page.

Also: expandables, marginalia, addenda, leads...

Linked TOC

(Published June 9, 2023 by Mariven)

The idea of probability is one we commonly rely on in order to talk about our world models, especially in communities where their interpretation as credences, or subjective degrees of belief, is widespread. More relevant to artificial intelligence is the manner in which we implicitly bake probability in to our theories of decision and utility, justifying it post-hoc with e.g. the VNM theorem or complete class theorem. But the axioms consistently fail to apply; no matter how hard we grasp, reality always seems to slip away from our models of it. Here, I compile a lot of resources that demonstrate why **probability is difficult**. Some of the questions I want to point to, and list resources for figuring out answers to, include:

- What does 'probability' mean?
- What are the issues with the common interpretations, i.e. Bayesianism and frequentism?
- How do we use probabilities in actual scientific practice?
- What is the role that probability plays in our discussions of decision and utility?
- What are the implicit assumptions behind these roles, and when do they fail in practice?
- How do we recognize them, repair them, or shift to alternatives?

A year and a half ago, I wrote Probability is Difficult, a review of the foundations of probability theory: the interpretations of various intuitive notions of probability, the axiomatic systems through which we use them in mathematical applications, and the various pain points and paradoxes that we have to watch out for if we want to be not just consistent but *correct* in our use of probabilistic reasoning. The core notion I wanted to impart is in the title: *probability is difficult*. It's not so simple as setting up numbers and playing with them—you have to couple those numbers consistently to reality, and this is so incredibly hard to do correctly. While little more than a thorough compilation of existing work, writing it was very educational, and I figured it would be worthwhile to compose a LessWrong version tailored to this site's idioms and intents.

In the course of cleaning it up, I found that I could do something much better: make it a guide on the proper and improper use of probability *in general*. Not just from *inside* the mathematical perspective—what kinds of mathematics don't break down due to their own logic—but from *outside* it, the place where you're deciding what mathematics to use and how to use it. Not just how to interpret the use of probability, but how to *employ* it: our theories of decision, utility, and learning fundamentally depend on probabilistic reasoning, both at the object-level (as when we speak of maximizing expected utility) and at the meta-level (as when we argue about the Solomonoff prior), and this has *motivations* and *consequences*; we ought to be aware of when and why it breaks down, which it does surprisingly often.

Unfortunately, I realized after starting to work on this updated version that to sketch this full picture to any adequate level of detail would take perhaps two months of work, which is more time than I can afford to spend. So, all I can do for now is try to outline how someone who wanted to understand exactly why, how, and where*probability is difficult* would go about doing so, by listing various resources, key terms and areas of study, and lines of thought being pursued. I am trying to list *resources* that, whether by saying worthwhile things or pointing to other worthwhile places, are useful for structuring one's understanding of a given topic.

The main resources I've compiled are given as fifty-something bolded links weaved throughout exposition. (Were it just a plain list, people would go "wow! cool!", bookmark it, and never read it again; I want to give you some idea of the underlying narrative which makes them important to understanding why probability is difficult, so as to actually motivate them).

I'll assume that you already have *some* knowledge of probability theory, and know the basics about frequentism, the Kolmogorov axioms, Dutch books and Bayesian inference, and so on. If you don't, then, again, the original version of Probability is Difficult is a great place to start.

There are many ways to interpret the idea of 'a probability', the most common of which are:

- Probabilities are proportions of the space of possibilities which produce a given measured outcome (e.g., two rolled dice summing to five has probability $\frac{|\{(4,1), (3,2), (2,3), (1,4)\}|}{|\{(1,1), ..., (6,6)\}|} = \frac19$).
- Probabilities are objective properties of reality describing the propensities of possible measured outcomes;
- Probabilities are objective properties of reality describing the (actual vs limiting) frequencies at which a certain (real vs ideal) experimental apparatus produces a certain outcome;
- Probabilities are cognitive constructs (subjectively vs objectively) constructed to convey an agent's subjective credence, or degree of belief, in a certain outcome.

These are respectively known as the classical, propensity, (finite vs hypothetical) frequency, and (subjective vs objective) Bayesian interpretations. Note that the terms 'subjective' and 'objective' are used in two different ways here: the frequentist and propensity interpretations are objective where the Bayesian is subjective in that they attribute probabilities as being internal to the *object*, i.e. the world, rather than the *subject*, the one reasoning about the world. Bayesianism's subjective-vs-objective split is about the extent to which the reasoner is *forced* to construct their internal probabilities in a single correct way based on their prior knowledge. (The update from prior to posterior given some data is always the same, though).

A very quick and marginally illustrated overview of the main interpretations of probability is given by Nate Soares' **Correspondence visualizations for different interpretations of "probability"**, which fits **in** a **trilogy** of tinyposts about probability interpretation.

A much larger, more thorough discussion is given by the **Stanford Encyclopedia of Probability's page**. As expected, though, it's largely embedded in the explicitly *philosophical* literature, filled with citations so as to mention every philosopher's opinion no matter how stupid or irrelevant. Still, though, it's a pretty good place to get your bearings. An even better discussion is given in the excellent review **Probability is Difficult**. (The author remains pseudonymous, but the stunning combination of philosophical deftness and technical expertise speaks volumes as to the author's erudition, lucidity, and, above all, humility).

If one wants a textbook-length explanation from a philosophical standpoint, there's Donald Gillies' **Philosophical Theories of Probability**, which also attempts to answer the question of when and where we should use this or that interpretation—see e.g. Ch. 9's "General arguments for interpreting probabilities in economics as epistemological rather than objective".

There are other ways to slice up the subject beyond these four interpretations: see e.g. the Wikipedia page on **aleatoric and epistemic uncertainty**; these correspond roughly to frequentist and Bayesian notions, but take a slightly different point of view that keeps the correspondence from being a clean one; just the same, there are ways to slice up the subject that fall neatly *within* these interpretations and should not be confused with 'probability'. Likelihood is one such notion, its differentiation vs probability being explained by Abram Demski's **Probability vs Likelihood**, as well as the Wikipedia page on **likelihood functions**.

If you want to know how to think like a Bayesian, look no further than the bibliography of E. T. Jaynes. The father of both modern Bayesianism and the acerbic overconfidence of its proponents, his textbook **Probability Theory: The Logic of Science** builds the entire subject of probability theory as a connected edifice of techniques, heuristics, and formulas for accurately modifying and applying your beliefs about the real world (link is to a pdf; see this **LW post** of the same name for a much shorter review/walkthrough).

Two more conceptually oriented texts on subjective reasoning by notorious Bayesians are Bruno de Finetti's **Theory of Probability** and I. J. Good's **The Foundation of Probability and its Applications**; these look more carefully at where probability and its logic fundamentally come from, and the latter in particular takes care to demarcate questions of probability from those of statistics, utility, and decision. Chapter 3 of the latter, in particular, is a (two-page) article entitled 46,656 Varieties of Bayesians, which—as you might have instinctively guessed from the number—is a combinatorial division of kinds of Bayesianism based on their answers to several different questions.

Jaynes spent most of his career dunking on frequentists, and occasionally this produced useful observations on the Bayesian philosophy and approach
**What is Bayesianism?** for some of the tenets of the LW-style mode of reasoning, or nostalgebraist's **what is bayesianism? we (i) just don't know** for a useful distinction between "synchronic" Bayesianism (the axiomatically true application of Bayes' law to probability estimates) and "diachronic" Bayesianism (a statement that beliefs ought to be represented as probabilities that update via the conditionalization rule), and for a discussion of where diachronic Bayesianism misleads or outright fails people. **What's Wrong With Bayesian Methods?** outlines a key conceptual distinction: when Bayesians speak of the "distribution" of a parameter to be estimated, it is not to imply that the parameter itself is being treated as nondeterministic, or that we're speaking of its relative likelihoods of taking on certain values across a wide range of situations, but simply that we're trying to estimate the value that the parameter actually has in any particular case. In other words, Bayesian reasoning is generally conducted in an "estimation" scenario, not a "deduction" scenario.

Bayesianism's biggest, most notable problem—whence priors?—is also the source of its largest split, between *objective* Bayesians and *subjective* Bayesians. Again, note the distinction between objective Bayesianism and objective probability interpretations such as frequentism. Jaynes, an objective Bayesian, describes what the term means in his paper **Probability in Quantum Theory**:

Our probabilities and the entropies based on them are indeed "subjective" in the sense that they represent human information; if they did not, they could not serve their purpose. But they are completely "objective" in the sense that they are determined by the information specifed, independently of anybody's personality, opinions, or hopes. It is "objectivity" in this sense that we need if information is ever to be a sound basis for new theoretical developments in science.

And again in his **Prior Probabilities and Transformation Groups**:

A prior probability assignment not based on frequencies is necessarily "subjective" in the sense that it describes a state of knowledge, rather than anything which could be measured directly in an experiment. But if our methods are to have any relevance to science, the prior distribution must be completely "objective" in the sense that it is independent of the personality of the user; i.e., it should describe theprior information, and not anybody's personal feelings.

The original objective Bayesians applied the principle of indifference to get priors: when you want to estimate some parameter in some range, start with a prior that is constant over the parameter space, giving equal weight to all options. This was famously called into question by Bertrand's paradox, which showed that slightly different ways of constructing the exact same parameter space can give different uniform priors, as well as by the similar wine/water paradox, which makes the same point in an even more insoluble manner. Jaynes's paper **The Well-Posed Problem** attempts to save the principle of indifference by showing that there is in fact a single unique way to be indifferent to the parameter in Bertrand's paradox; he nevertheless admits that his transformation group approach cannot solve the wine/water paradox. (And his solution for the former is problematic as well: as explained by Alon Drory's **Failure and Uses of Jaynes' Principle of Transformation Groups**, the method by which Jaynes supposedly found a single canonical solution can be adjusted to find each of the other two solutions as well).

The modern, high-tech version of the principle of indifference is the **Jeffreys prior**, which is proportional to the square root of the determinant of the Fisher Information matrix; it is invariant under reparametrization, and thereby manages to avoid bias in the wine/water paradox. In practice, though, Jeffreys priors usually tend to be non-normalizable, as in most of the examples on the Wikipedia page. (This doesn't dissuade everyone, though: see Andy Jones's **Improper Priors** for a tutorial on how they can be used to derive proper posteriors).

Another attempt by Jaynes to construct objective priors for Bayesian analysis is the principle of maximum entropy, or MaxEnt. His paper **Notes on Present Status and Future Prospects** talks about the role played by the Maximum Entropy principle in objective Bayesianism, and, more generally, discusses why it's important to reason about states of knowledge rather than the world directly. Most urgently, see his **Where do we go from here?** for an account of a rap battle concerning the objectivity of MaxEnt.

For a much larger exposition on the problems with objective Bayesianism, see Elliott Sober's **Bayesianism: Its Scope and Limits**.

If you want to see how frequentists think, most standard statistics textbooks should do. Owing to the dominant role played by frequentists in the foundations of statistical inference, and the fact that the foundations of probability are usually just taught as a sidenote in most statistics classes, people hardly learn to work with probability except in the frequentist context where you have some experiment, some outcome, and you want to pick some test statistic (function that computes the 'extremeness' of some aspect of the data) in order to establish some sort of confidence interval or reject some null hypothesis with some low p-value (probability that the null hypothesis would give data w/ a test statistic at least as extreme as the given one
*not of frequentism* but at most of contemporary frequentist pedagogy.

Foundational issues aren't really discussed all that much by frequentists; the biggest split among them is not between finite and hypothetical frequentists, but between Fisherian significance testing and Neyman-Pearsonian hypothesis testing (for which see my sequel **Statistics is Difficult**). In fact, Alan Hajek, in his **Fifteen Arguments Against Finite Frequentism** calls it "about as close to being refuted as a serious philosophical position ever gets" **Fifteen Arguments Against Hypothetical Frequentism** informs us that he just wrote thirty arguments against frequentism in general, only bisecting this list in order to meet a journal's length requirements.

It seems to me that the propensity interpretation is primarily discussed by philosophers, so there can't be much of use there. Again, my original article goes over what little there is to discuss. However, I'll take this opportunity to discuss an objective notion of probability that behaves more or less like propensity: quantum-mechanical probability.

I haven't actually seen much discussion on the role that quantum mechanics should play in our understanding of probability—most discussion goes the other way around. When we measure the spin of an electron in a Stern-Gerlach experiment, the outcome seems to be random, with each spin (up or down) being equally likely for each electron. The canonical quantum formalism tells us that this does not come from a real-valued probability distribution over the space of possibilities, but from a complex *wavefunction* over this space *like* a probability distribution, but it generally takes on negative values. Possibilities *need* to be able to interfere, whether by having different complex phases or different signs (c.f. the Elitzur-Vaidman bomb tester).

This wavefunction is, in a sense, the "square root" of a probability distribution, in a sense made precise by the Born rule

To understand this question—what notion of probability obtains in *actual reality*?—is obviously an advantage: while such real probabilities operate independent of epistemic probabilities *I* know of), so there's no reason for me to think it might be more likely to be any one digit than another. But there *is* a single correct value.

To be honest, though, I do *not* find these arguments for a quantum understanding of probability strong enough to decide that working on this is worth the time it would take. If there are good reasons, I'd like to see them—or, if anyone else wants to work on this, I would share my writings and discuss lines of thought with them.

Let's see how different interpretations contrast one another. First, though, it's worth noting the ways in which they can work together.

While subjective vs objective Bayesianism is a genuine disagreement—which priors must we use?—there is a sense in which subjective vs objective interpretations of probability are entirely different things, so that Bayesians and frequentists are in fact talking past each other. Because of this, we can *simultaneously* equip ourselves with tools for dealing with subjective credences and objective frequencies. This only goes so far, though, owing to the fact that subjective probabilists do eventually need to couple their beliefs to the real world, and therefore come into contact with frequentists.

Still, though, a subjectivist has some wiggle room in interpreting how their beliefs ought to couple to reality, and vice versa: David Lewis, most famous for his work on modal realism and the semantics of counterfactuals, attacked this question in **A Subjectivist's Guide to Objective Chance** ; the most enduring influence of this paper is the Principal Principle, which relates credences to 'chances' (Lewis's term for objective probabilities, viewed from a subjective standpoint): provided one has no "inadmissible evidence" about the chance of some outcome (see paper for details), they should always set their credibility equal to their estimated chance. Eventually, Lewis came to consider this principle as wrong, replacing it with the so-called 'New Principle'; Strevens's paper **A Closer Look at the 'New' Principle** examines the history of this change and scrutinizes this new principle.

Yudkowsky has a very useful idiom in this regard (I don't know if it originated with him, or where I might locate a source): you should only assign a 5% probability to things when you really think that you'd only be wrong *one out of every twenty times* you assign such a probability; you should only say that something has above a 99.8% chance of happening when you really think that you could make *over five hundred* such statements and, on average, be wrong *just once*. Across many events, your credences ought to match the actual limiting frequencies of events ~~Leader of "Bayesian Conspiracy" Exposed As ~~~~Frequentist~~~~!!~~

Let's be clear about how this method of coupling credences to frequencies is different from frequentism: you can give a 30% credence to the population of China being above 1.5 billion, but that's not coherent as a frequency: there's no way to repeat your measurement. Looking up the number again will just get you the same number, and China isn't exchangeable with another random country. But the act of making a 30% credence is a repeatable experiment, and therefore does have a frequency; calibration is when the subjective probability converges to the objective probability.
*are* objective: if some mental flaw causes you to say you're 30% confident in facts about China that only turn out to be true 20% of the time, with *other* 30% credences of yours coming true more often so as to balance it out, someone who notices this pattern could systematically be correct more often than you. I'm not too sure what to make of this.**Credence Calibration Game** .

The Bayesianism-frequentism argument is too played out. My heart's just not into it anymore. For a more refreshing perspective, see jsteinhardt's **Beyond Bayesians and Frequentists** , which discusses where Bayesian techniques happen to outperform frequentist ones, and where they don't, and gives criteria for deciding between the two as well as possible middle grounds. To quote the concluding section:

When the assumptions of Bayes' Theorem hold, and when Bayesian updating can be performed computationally efficiently, then it is indeed tautological that Bayes is the optimal approach. Even when some of these assumptions fail, Bayes can still be a fruitful approach. However, by working under weaker (sometimes even adversarial) assumptions, frequentist approaches can perform well in very complicated domains even with fairly simple models; this is because, with fewer assumptions being made at the outset, less work has to be done to ensure that those assumptions are met.

What probabilities *are* depends in part on where they *come from*. For a frequentist, there's not much here—probabilities are mere epiphenomena that arise from distribution of outcomes yielded by repetitions of an experiment. For a subjectivist, we have to ask: where do credences come from?

Andrew Gelman makes the point in the title of the very brief **A probability isn't just a number; it's part of a network of conditional statements** . This poses a clear issue for models that end up deriving false predictions—how do we track the error down in the large network of relations that scientists often derive their priors from? The standard name for this is the "Duhem problem", after the Duhem-Quine thesis; Deborah Mayo's **Duhem's Problem, the Bayesian Way, and Error Statistics, or "What's Belief Got to Do with It?"** speaks of general attempts by statisticians to tackle it, and in particular of the problems that this poses for Bayesian inference.

A similar problem is the sensitivity of Bayesian inference to slight variations in the underlying models: Jaynes's **The A_p Distribution and Rule of Succession** describes the way in which scientists might obtain probabilities from underlying models, and points out how two propositions given the same probability can differ *drastically* in stability, or the extent to which they change conditional on new evidence. Sometimes this is a good thing, as illustrated by one of his examples—if we solidify hydrogen in the lab at such and such a temperature and pressure, our probability that it'll do so again under the same conditions should *not* be 2/3, as indicated by a naive use of Laplace's rule of succession. It should be arbitrarily close to 1, because we chunk that knowledge as "hydrogen is a thing that solidifies in these conditions"—as a universal property. Subjective probabilities are *outputs of entire causal mechanisms*, and must change accordingly.

Owhadi, Scovel, and Sullivan's **On the Brittleness of Bayesian Inference** points out a problem with this: models might make the ways in which we update on new data "rigid", such that—to quote the abstract—"(1) two practitioners who use arbitrarily close models and observe the same (possibly arbitrarily large amount of) data may reach opposite conclusions; and (2) any given prior and model can be slightly perturbed to achieve any desired posterior conclusions. The mechanism causing brittlenss/robustness suggests that learning and robustness are antagonistic requirements and raises the question of a missing stability condition for using Bayesian Inference in a continuous world under finite information".

Frequentism doesn't get away cleanly, though. Steegen et al.'s **Increasing Transparency Through a Multiverse Analysis** makes the point that, in actual analysis of data, the probabilistic models you actually choose and the statistical tests you actually run *do actually depend* on the specific outcomes you got, e.g. when you do **exploratory data analysis** , or really when you take *any data processing steps with degrees of freedom* (such as categorizing data, combining variables, transforming variables, excluding some data, and so on) and that this does create bias. They propose "multiverse analyses", where you consider the range of possible results you could've gotten if you had done things differently, scanning this range to see where the most important choices lie and what their presence suggests about your conclusion *objective* prior unless you *precommit* to using it no matter how poorly it ends up fitting, no matter how ugly the math gets (as happens w/ non-conjugate priors). If you say "we're going to use the Jeffreys prior because it's non-informational", but you are *such that you would* upon seeing some particular data decide to abandon this choice, your use of the Jeffreys prior is *dependent* on the data being in a certain way, and therefore informational even if you do in fact end up going through with it! **The Garden of Forking Paths** , Gelman's **"If you do not know what you would have done under all possible scenarios, then you cannot know the Type I error rate for your analysis"** , and Greenland et al.'s **To Aid Scientific Inference, Emphasize Unconditional Compatibility Descriptions of Statistics** .

We use probability to reason about things we wish to influence in the future, as well as about things that have influenced us in the past. The general solution to this might seem simple enough—plug your data and your uncertainty (prior) into your model, apply Bayes, and where applicable take the action that maximizes expected utility w.r.t. your new posterior. But the elegance of this solution is only made possible by a fallacious agent-environment distinction.

Scott Garrabrant makes the point in **Bayesian Probability is for things that are Space-like Separated from You** . The term comes from special relativity, where space-like separation refers to things outside your lightcone *time* before I reach you, and space-like separated when there's too much *space* between us. *inside* of your lightcone is your past and your future—the things that you can observe, and the things that you can affect. In the metaphor of logical time Garrabrant employs:, the past consists of possible (resp. actual) events that your actions anticipate (resp. react to), and the future consists of actions that anticipate of (resp. react to) your possible (resp. actual) actions.

- If you use Bayesian probability on things in your logical past, you open yourself up to
*manipulation*: adversaries that can predict how you'll react to things—and using a lawful technique like Bayesian probability makes predicting you easy!—can filter your evidence in order to manipulate your beliefs. - If you use Bayesian probability on things in your logical future, you ignore the effects that your own beliefs have on your actions, paradoxically writing yourself out of your own model. To quote Garrabrant, "the standard justifications of Bayesian probability are in a framework where the facts that you are uncertain about are not in any way affected by whether or not you believe them".

The first point is in fact an incredible problem for Bayesian updating in adversarial environments, discussed more thoroughly in Abram Demski's **Thinking About Filtered Evidence is (Very!) Hard** *in principle* just directly prove or disprove. **Cartesian**" use of Bayesianism—perhaps best-known as it shows up in AIXI—which relies on a world-model which you impassively observe and act upon as an independent mind; the solution, theoretically, is to have a model that takes fully into account the way in which you are *integrated into the world*.

Yet this is dangerous for an entirely different reason. While Demski's **Embedded Agency** is beyond our scope, his **The Parable of Predict-O-Matic** hints at why we might not want to build any agent whose probabilistic model informs it lets it predict how its actions affect its observations. Insofar as that agent has a utility which depends on its observations, such as making predictions which minimize the error of those observations, it has not just the motive but the *ability* to figure out how to act in order to increase the utility of what it observes. Even if you just want it to predict things for you—stating predictions is an action, and it can benefit by tailoring or conditionally sharing its predictions in order to manipulate you.

Of course, there are good reasons to think that this is a capacity the agent should have: you want your construct to be *able* to do the optimal thing to achieve its goal, and then control what it *decides* to do in order to keep it from subverting or destroying you. If it can't solve this problem, known generally as the **naturalized induction** problem, it will always be artificially stupid (unless it somehow manages to develop this notion on its own, in the same way that humans sometimes do). There are some proposed solutions to this problem, such as infra-Bayesian physicalism

These are some extraordinarily general limitations to the use of Bayesian probabilities. You can say that this or that coherence theorem is why Bayesian agents are the most accurate—but when it turns out that they just *aren't*, when they're intractably screwed over by such simple twists that humans readily reason about, what do all your proofs, all your theorems, all your mathematical guarantees come to?

In any case, there's another sense in which you might use probability to reason about "your world", one in which the nature of your existence in it is even more fundamental. Anthropics.

As painful as it is, anthropic reasoning *must* inform the way we use probabilistic reasoning. When we say that no, the fine-tuning of the universe to support life *doesn't* imply a fine-tuner, it's a necessary condition of our being here to observe anything at all—we're using anthropics, arguing that even though $P(\text{life}\mid \text{random tuning})$ might be very low, we're not looking down at an arbitrary randomly chosen universe; we're *within* a universe which must be able to support us *whether* and *how* you need to condition on the fact of your own existence. The canonical reference is Bostrom's **Anthropic Bias** , which discusses such problems in depth—but, for a quick overview of the more terrifying thought experiments, see Scott Aaronson's **Fun With the Anthropic Principle** (notes from lecture 17 of his quantum computing course).

Either way, radical Bayesians have a problem—if they ignore anthropics, they're fucked because they're incoherently settling a large class of important questions by ignorant dogmatism, and if they don't ignore anthropics, they're fucked because now they have to think about anthropics.

As explained by Stuart Armstrong's **Anthropics: different probabilities, different questions** shows how different theories of anthropic probability are really answers to different questions, and ata's **If a tree falls on Sleeping Beauty...** discusses how this obtains in the case of the Sleeping Beauty problem—how those who get answers of either 1/3 and those who get 1/2 are answering different questions.

We want to use probability to reason about what things are *true*. The theory through which frequentists come to conclusions about what is true looks more like statistics proper, which I won't go into—though, again, see Statistics is Difficult, which points to the unusually interesting divide between Fisher and Neyman-Pearson on the extent to which statistical inference ought to help guide our interpretations of evidence (as opposed to directly guiding our decisions).

For Bayesians, the situation is thornier. Aumann's Agreement Theorem tells us that agents with the same prior who update using Bayes' rule on common knowledge *must* converge onto the same beliefs, rather than agreeing to disagree—but this never happens. We are not rational enough, quantitatively precise enough, computationally powerful enough to implement it. In this sense, Robin Hanson calls us "Bayesian wannabes", and, in his paper **For Bayesian Wannabes, Are Disagreements Not About Information?** , demonstrates that such wannabes with the same starting priors will disagree for reasons *beyond* their simply having different information about the world.

Andrew Gelman, who literally wrote the book on Bayesian Data Analysis, gives two more pragmatic objections to the *use* of Bayesian inference in his **Objections to Bayesian Statistics** . First, that Bayesian methods are generally presented as "automatic", when that's just not how statistical modeling tends to work—it's a setting-dependent thing, where models are used highly contextually—and, second, that it's not clear how to assess the kind of subjective knowledge Bayesian probability claims to be about, and science should be concerned with objective knowledge anyway. There's a great paragraph that I'll quote:

As Brad Efron wrote in 1986, Bayesian theory requires a great deal of thought about the given situation to apply sensibly, and recommending that scientists use Bayes’ theorem is like giving the neighborhood kids the key to your F-16. I’d rather start with tried and true methods, and then generalize using something I can trust, such as statistical theory and minimax principles, that don’t depend on your subjective beliefs. Especially when the priors I see in practice are typically just convenient conjugate forms. What a coincidence that, of all the infinite variety of priors that could be chosen, it always seems to be the normal, gamma, beta, etc., that turn out to be the right choices?

Abram Demski's **Complete Class: Consequentialist Foundations** offers a "more purely consequentialist foundation for decision theory", and a "proposed foundational argument for Bayesianism". He argues that Dutch Books are more like manners of illustrating inconsistencies rather than for demonstrating desiderata for rationality, because beliefs are distinct from actions like betting, requiring a decision theory to link them. Rather, Demski advocates the titular complete class theorem, which states that any decision rule that's Pareto-optimal (or, admissible: has no strictly superior decision rule) can be expressed as Bayesian—namely, as an expected utility maximizer for *some* prior. As Jessica Taylor's comment points out, though, the "some" in that sentence is doing a lot of work: there are ways for groups to arrive at Pareto-optimal outcomes without *utilizing* Bayesian methods, and which only happen to be *rationalizable* in those terms. The prior guaranteed by the theorem might end up looking absolutely insane.

As regards the role of Bayesian inference in decision theories, see also **Wei Dai's Why (and why not) Bayesian Updating?** , and the paper it discusses, Paolo Ghirardato's **Revisiting Savage in a conditional world** , which gives a list of seven axioms that, taken together, are "necessary and sufficient for an agent's preferences in a dynamic decision problem to be represented as expected utility maximization with Bayesian belief updating" (quoting the former).

There are lots of things that I haven't covered here which would be essential to a fuller treatment of the subject. If you want to truly understand how probability is difficult, some other subject-clusters to explore include:

- Systematic methods for structuring prior probabilities, such as probabilistic graphical models, methods for constructing them from factorizations of a distribution, and methods for comparing models, such as the Bayesian information criterion. Causal models, and the difference between causality, conditioning, and regression.
- More generally, the relation of conditioning to causality. The conditionalization rule as a normative update rule for beliefs, diachronic Dutch books, unintended consequences and adverse effects. 'Triviality results' that force us to differentiate between conditional probabilities and the probabilities of conditionals.
- The problem of dealing with very small probabilities, as in Pascal's mugging; the necessity of bounded utilities, and the problem of coherently bounding them; the general problems this poses for the entire framework of probabilistic utility
If you say that this is the right way to do things except where it breaks down, you demonstrate that you're . Related, probabilistic handling of risk, as prompted by e.g. the Ellsberg and Allais paradoxes.*keeping it on a leash*—you're only saying it's right in the cases where you see beforehand, for other reasons, that it's right. Disrespect the math*anywhere*and it loses its universality*everywhere*! The more theorems you stack onto one another in an attempt to show that some method is necessarily the One True Way to do things, the more debilitating it is when you have to fence off one area of its application—an area that comes from*simply scaling the numbers*from areas where you apply it all the time—and say "no, it doesn't work here"! - Different "scales" of probability: linear, logarithmic, logit, and so on. Situations and mental frameworks which naturally call for the use of one over the other (see e.g. the brilliant "losing twos" framework Yudkowsky's introduced in planecrash). Logarithmic and Brier scorings of credences.
- Alternatives to Kolmogorov's axioms for probability. The only one currently on my radar is Renyi's theory of conditional probability spaces, which in particular makes things like improper priors very natural to work with, but I'm sure there are others.
- Different frameworks for dealing with uncertainty in general. There are a lot of them: Dempster-Shafer belief theory, Kahneman and Tversky's prospect theory, certainty factor models, fuzzy sets. Were I especially sadistic, category-theoretic models of decision under uncertainty.
- More on the distinction between decision, utility, and probability. I wasn't able to include as much here as I'd've liked because this current outline is going to be stuffed among my drafts in perpetuity if I don't finish it soon, but it's really worth looking into.
- Methods for dealing with Knightian uncertainty. Most of the theory for this is in my head, filed under terms like "transcendental safety" and my general theory of instrumental values and strategies, but there are also aspects of e.g. robustness which have been publicly discussed.
- A paradox rundown: besides wine/water, Sleeping Beauty, Bertrand, the two most notorious ones are probably St. Petersburg and Two Envelopes; it's worthwhile to go through the graveyards of attempted solutions, see what consensuses exist, and see whether and how the problems underlying such paradoxes might obtain in realistic scenarios.

If you've made it to the end, then I hope you got something out of this brief outline. The single message I want to underscore is, again, that *probability is difficult*. It's important to learn where and when it breaks down, so that you don't make great mistakes in placing so much weight upon such weak foundations—or, worse, build an agent that systematically does so without the ability to ever tell that it's "not supposed" to be doing something pathological. If there's anything I looked over, any errors of fact or attribution, any subject-clusters or resources I should strongly consider including, please let me know.