For a 93.75% confidence interval, draw 5 points (iid). If the last four are all greater than the first one, your CI is the whole real number line, otherwise it’s the empty set.
Once you draw some actual data and get a specific interval, you want to ask about some degree of belief that your specific interval contains the actual parameter. In the case that your CI is all numbers, you know for a fact that it contains the true parameter value. In the case that your CI is the empty set, you know for a fact that it doesn’t contain the true parameter value.
I like this CI procedure because it demonstrates two things. 1) The kind of reasoning for going forward from an unknown parameter to a random interval is very different than what you have working backwards from a specific interval back to the parameter. That asymmetry can be WEIRD. 2) The weirdness is possible if you limit yourself to only the CI definition, meaning that if you want it to be useful, you need something that rules out weird shit like my example.
The properties of specific CI procedures people actually use are generally much much better than what is allowed by the definition of a CI. If you want useful reasoning backwards from the interval, don’t try to reason solely from the definition of a CI.
In my example i(D) is a function of the data (a function of the ordering), and D is a random dataset. Since it’s iid by assumption (sneakily also assuming the probability of an exact repeat is zero), the probability that the interval contains all numbers is 93.75% (1-1/2^4). Otherwise it’s the empty set.
Unpacking that, suppose you have a real number m. The probability that i(D) will contain m (with D as the random variable) is 93.75%, so it is a valid confidence interval for m.
m could be the population mean, the population median, your dog’s age, whatever. The interval depends on the data, but not on the parameter, and the definition of a CI says that’s fine.
It’s a demonstration that definition of a CI alone isn’t really useful for reasoning about a parameter given an interval. You need to know more about the specific data generating process and function i that led to it in order to make sure it’s useful.
Or to put it another way, if i(D) is a function of the ordering, then isn't by definition the ultimate random process observed through i(D) not iid even if D is iid?
This paper describes the situation more thoroughly
But, fair enough, I appreciate your sentiment about 'terseness'. One can only be so nitpicky with words when trying to communicate, before starting to sound like a criminal defense lawyer ...
I'm not clear on what it is that you [the post's author] don't understand about interpretation of Bayesian credible intervals.
Both "objective" and "subjective" Bayesians interpret them as degrees of belief - that, for instance, one would use to make bets (supposing, of course, that you have no moral objection to gambling, etc.).
The difference is that that "objective" Bayesians think that one can formalize "what one knows" and then create an "objective" prior on that basis, that everyone "with the same knowledge" would agree is correct. I don't buy this. Formalizing "what one knows" by any means other than specifying a prior (which would defeat the point) seems impossible. And supposing one did, there is disagreement about what an "objective" prior based on it would be. To joke, "The best thing about objective priors is there are so many of them to choose from!".
Many simple examples can illustrate that the objective Bayesian framework just isn't going to work. One example is the one-way random effects model, where the prior on the variance of the random effects will sometimes have a large influence on the inference (eg, on the posterior probability that the overall mean is positive), but where there is no sensible "objective" prior - you just have to subjectively specify how likely it is that the variance is very close to zero. Another even simpler example is inference for theta given an observation x~N(theta,1), when it is known (with certainty) that theta is non-negative, and the observed x is -1. There's just no alternative to subjectively deciding how likely a priori it is that theta is close to zero.
Frequentist methods also don't give sensible answers in these examples. Subjective Bayesianism is the only way.
If we live in a materialist deterministic world - which many would cite as an axiom for simplicity - then there really is no probability. Everything happens with 100% certainty.
Then, what is probability? If everything will happen with 100% certainty, but probability certainly appears to exist, then probability must reflect something about our information about something occurring.
The author refers to two foundational approaches to our state of knowledge. The first is the objectivist approach, which states that everyone who has the same state of knowledge about a system will evaluate the same probability of something occurring. The second is the subjectivist approach, which states that a given individual with a certain state of knowledge will evaluate some probability of something occurring. To me, these appear to be the same thing except insofar as the former requires a consensus of many while the latter a consensus of one.
The author asks how we might actually define Bayesian probability without resorting to the frequentist approach (i.e. hypothetically simulating many trials of the same event, however infrequent in reality it may be).
First, he says this would mean "interpreting [the credible interval] like a confidence interval". I am no statistician, but is that necessarily true? I don't see why confidence intervals would suddenly emerge in order to interpret a credible interval.
Second, I am not sure the frequentist interpretation is so problematic. When we interpret the plain-English definition of a probability, are we not mentally simulating repeated trials in order to evaluate something's occurrence? What else could a probability imply? If something has a 20% chance of occurring, then it does not occur 80% of the time, and so we must envision 80% of universes (part of the hypothetical trials) where it does not occur. I don't see any other way around this, frequentist or not.
(Note: I am not a statistician, while the author is, and the above is simply my layman's understanding of the article.)
Of course, frequentist and Bayesian stats are completely mathematically equivalent. The choice just affects our mental patterns.
There are some special cases in which a frequentist 95% confidence interval and a Bayesian 95% credible interval based on some sort of default prior are numerically the same, but that doesn't happen in general.
Statisticians would hardly have been vigorously debating the issue for two centuries if it didn't really matter.
>>>> Let’s now suppose that we’ve done a Bayesian analysis. We’ve specified a prior distribution for the parameter, based on prior evidence, our subjective beliefs about the value of the parameter, or perhaps we used a default ‘non-informative’ prior built into our software package.
At first blush the difference is that the Bayesian is using more information. Now don't get me wrong, if Bayes theorem and its progeny give us useful tools for incorporating that information in our analyses, so much the better.
But even if we focus just on confidence intervals and credible intervals, there needn't be the equivalence you state. A comment elsewhere here discusses a ridiculous confidence interval that is either the whole parameter space or the empty set. That's never going to be what a Bayesian credible interval gives you.
A frequentist interpretation of probability is objective in the sense that it grounds the probability value in objective features of the world.
A bayesian is subjective in that the probability valuation of an event is grounded in the belief-state of its observer.
Their grounding is some real physical parameter, eg., with the coin the geometry of it having two sides.
A possible third approach is to create a hypothetical model and feed random data through it, to get an idea of the spread of the outcomes. Modeling doesn't stumble on conditional terms, and if you're unclear on an assumption, it won't run. The computer doesn't know whether it's a frequentist or a Bayesian, or something different from both.
I'm not a statistician. Whenever I need to do something with statistics, I always test my computation with random data.
In fact I wonder, if modeling had been possible since the birth of statistics, if we would even bother with things like elaborate formulas for statistical tests.
Subjective priors are one of the main advantages of Bayesian stats. Regularization used in ML corresponds to using subjective priors, for example. L2 regularization finds the MAP with a normal prior, favoring parsimonious solutions.
1. The outcome of the election here is not a probability. It is the population value - the ratio of people voting for candidate X on the election date. It doesn't have to be repeated in the same way measurements of height for all people in United States would not have to be repeated, if instead of vote we were measuring heights.
2. Frequentist probability doesn't require to physically repeat things. It can reason about what would happen in the repeated sampling under certain conditions, and then draw inferences about those assumed conditions. With the election example: if you get a survey of 100 people with 70% voting for candidate "A" we don't need to repeat this survey in order to know the likelihood (frequency) of this result happening if the real proportion of people voting for candidate "A" across the US is 50%.
Statisticians and mathematicians have gone very far down the path you’re discussing, and you might be interested in some sets of axioms that have come up around probability and relaxations of true/false logic.
The Kolmogorov axioms  are the “standard” probability axioms, and are phrased in terms of set theory and measure theory (not requiring any mention of physics or a physical universe!).
There are other ways to quantify degree of belief, however, and they are very interesting. Apparently Cox’s theorem  justifies a popular probability framework for Bayesians. But there are many more interesting ways to do degree of belief, like Dempster-Schafer theory , which I understand to be a plausibility calculus.
Everybody seems to find a single system and decide it’s the only one out there,
I don't know who "the many" are - but I thought determinism had already been disproved.
I am not a physicist so I will not go into quantum mechanics - but I will take a simple example from Science Fiction, and that is the Temporal Paradox. https://en.wikipedia.org/wiki/Temporal_paradox
The entire field of chaos theory is to make chaos deterministic so you are in good company. There is no general mechanism (yet) to do this. Quantum mechanics is the most interesting area.
The measure of determinism is that it predicts the future before it happens. People have been trying to do this for years in weather and the stock market. This is where the concept of Chaos came from.
Leaving math and physics - there is philosophy. To apply determinism to people you would have to decide there is no free will. Maybe this is true and maybe not.
Assume that both the system state and the observation are drawn from some joint probability distribution.
There is some function γ of the system state which we seek to estimate. The experimentator applies some decision procedure d to the observation to get their result.
A Frequentist will analyze the situation by conditioning on the the model parameter θ. As a result, we get a single target value γ and probability distributions for the observation and decision, depending on θ.
If d results in an interval, the Frequentist calculates the confidence level as the probability that the descision procedure d produces an interval containing γ, under worst-case assumptions for θ. Unbiasedness of the decision procedure means that γ is indeed the function it estimates the best, and it is not a better estimator for any other function γ'(θ).
A Bayesian, on the other hand, will condition the joint distribution on the observation. Consequently, γ is a random variable, while the observation and decision are known.
If d is an interval, its credibility is the probability that γ is within this interval, given the observation. Optimality of the decision procedure means that no other estimator d' produces better results.
: S. Noorbaloochi, Unbiasedness and Bayes Estimators, users.stat.umn.edu/~gmeeden/papers/bayunb.pdf
Are you suggesting that unbiased estimators are necessarily better than biased ones? If so, check out Stein’s phenomenon for a counterexample. It’s common for biased estimators to dominate unbiased ones in terms of error rates. That’s where the bias variance trade off in ML comes from.
The paper I referenced even includes a theorem saying that, as long as the value to be estimated can not be exactly deduced from the observation, no estimator is both Bayes-optimal and unbiased.
However for many observations, the Bayes-optimal estimator becomes asymptotically unbiased.
This article kind of helps in establishing that it is a hard question to answer. Clearly harder with intervals.
I can't help but think much of this gets overcomplicated because we don't take everything in intervals. In large because it is hard, yes; but we should be more comfortable with things not getting known to an exact value.