JP: There is no way to answer causal questions without snapping out of statistical vocabulary.
AG: We disagree. That’s fine. Science is full of disagreements, and there’s lots of room for progress using different methods.
I'm only an amateur, but from the outside it sure doesn't feel "fine" for the two of you to disagree on what seems like such a fundamental issue. Instead, this seems like a case where two extremely smart individuals should be able to reach an common understanding instead of accepting disagreement as a final outcome.
JP: I have tried to demonstrate it to you in the past several
years, but was not able to get you to solve ONE toy problem
from beginning to end.
AG: For me and many others, one can indeed answer causal questions within statistical vocabulary.
Pearl obviously disagrees that standard statistical vocabulary is sufficient to answer all simple causal questions. You seem to think he's wrong. I think you'd be doing a great service to encourage him to formulate such a "toy" question that he thinks is unanswerable without resorting to the do-calculus, which you then try to answer to the audiences' satisfaction using more standard techniques. Maybe the two of you turn out to be in agreement but using different terminology, maybe you are right that his tools are optional, or maybe he's right that they are essential. Any of these outcomes would feel much more satisfactory and productive than agreement to disagree. Please consider offering him a platform with which to make his case.
He might claim that any other method can be reduced to do-calculus. I'm not sure. I do believe at the core of his argument is need for an explicit model.
Here he asks a very simple question and look at the body language from the panel:
This would be very helpful indeed.
I think a lot of the issue comes down to how idiosyncratic Pearl's work is. It will have to accumulate quite a lot of victories before enough people will bother with it.
Until then I suspect Causality will remain something of a statistical Finnegan's Wake.
I'll probably read The Book of Why to try and get a better handle on motivation for the technical material.
Well I just reviewed section 3 of page 584 of that pdf that was shared and it seems different, so I probably have a long way to go before I understand this stuff.
(I quote the word paradox, because they aren't really paradoxes once we have an understanding of what's going on.)
Once you've reached the limits of what you can know (after examining all data, worked through all arguments, etc.) this is pretty much the only possible outcome. What works in one situation might not work in others. One person's interpretation of our limited knowledge might very well appear implausible to someone else.
I'll admit to being turned off by Pearl's insistence on working with "toy problems". That might be fine for a philosophical discussion, but it's not of much practical value. I want Pearl to write a few empirical papers attacking important issues, then let's have a discussion about putting ideas into practice.
Consider a parallel with computer programming. A user complains that they fear a program is giving them the wrong answer on a complex real world problem. They report this, and get back the unhelpful answer "Works for me, will not fix". Unable to shake the feeling that the answer is unreliable, they reduce the problem down to a proof of concept that serves as a simple self-contained test case. Two different inputs produce the same answer, but only one of them can be right! But now they are unable to convince the maintainer to even look at the test case, because now the maintainer says "I need to focus on the real world, and don't want to waste my time on toy examples".
It's a discouraging position to find oneself in.
The problem is that, as far as I can tell, Pearl doesn't go beyond making points with toy problems. He hasn't done much empirical work (has he published a single empirical paper?) or even read much of the empirical literature he's criticizing as worthless. Ultimately, the question is whether policy and other decisions will be better using a particular framework. The fact that Pearl writes with the aggressive confidence of a Hacker News commenter does not mean he's right.
And frankly Judea Pearl is hardly a random internet tinfoil hat guy, nor is he asking people to invest a massive effort checking his looong work (as we eg often see with random NP-completeness “proofs”, or was discussed with the time investment needed to check Mochizuki’s ABC proof - no, he just asks Gellman to apply his own familiar techniques to a toy problem. It does not sound unreasonable at all.
Why? The only thing that matters is the quality of the empirical work. You can get caught up in philosophical debates about the best way to do research. If it has no impact on empirical work, it's useless.
That's not to say Pearl's arguments are wrong or that his work is useless. The problem is the incompleteness of his arguments. You can't arrogantly dismiss empirical work just because it's imperfect. There's no reason a priori to expect that Pearl's approach will lead to better decisions.
There are many self-proclaimed experts who can point out the flaws in programming language designs, but that doesn't mean they can design a better language, and it doesn't mean existing programming languages are useless. Pearl's approach is not some kind of magic pixie dust that suddenly guarantees your empirical work is more trustworthy. It's unfortunate that Pearl thinks it is, and it prevents him from having a reasonable conversation about the topic.
Judea Pearl is attempting to develop (or has developed) and evangelize the approach for others to use it on empirical problems.
So it seems the response is to be in favor of anyone (not just JP) to use it for empirical work.
Do we expect every developer of a theory to put it into practice before it is found convincing? Shouldn't the reasoned explanation of a theory be sufficient for someone else to understand and attempt it?
His main point is that "To properly define causal problems, let alone solve them, requires a vocabulary that resides outside the language of probability theory. This means that all the smart and brilliant statisticians who used joint density functions, correlation analysis, contingency tables, ANOVA, Entropy, Risk Ratios, etc., etc., and did not enrich them with either diagrams or counterfactual symbols have been laboring in vain — orthogonally to the question — you can’t answer a question if you have no words to ask it." http://causality.cs.ucla.edu/blog/index.php/2018/06/11/stati...
But, I agree that it would be nice if Pearl used real world problems using his methodology.
Being a long-time fan of Gelman (and having studied his Bayesian Data Analysis textbook), I am baffled and disappointed that he doesn't seem to understand Pearl's ideas. In his linked 2009 post, he wrote: "I’ve never been able to understand Pearl’s notation: notions such as a “collider of an M-structure” remain completely opaque to me." I wonder if, after reading this book accessible even to non-statisticans, he still doesn't understand it.
Likewise (well, other than not really having any "scientist friends"). I loved this book, think Pearl has some amazingly valuable ideas, and found the book relatively accessible even though I'm not a statistician. I won't claim to have understood every detail on the first reading, but I got enough out of it to feel like I'll understand it all after a couple of follow on readings, plus consulting Pearl's other books.
I wish our society had a better understanding of causality—that would raise the level of many important discussions.
We've all become familiar with the refrain 'correlation does not imply causation'. This book attempts to answer: 'what DOES imply causation'? He introduces a framework for how one can answer this question. Not very mathematically rigorous, but following through the framework does appear to be able to discover non-intuitive causative conclusions.
Understanding causation will have important implications for the advancement of A.I. Finding a correlation with the causes hidden in a black box (current state of deep learning) isn't enough for many disciplines. Doctors for example will likely need to know WHY an algorithm made a decision, instead of simply running correlations and telling the operator that a patient has 80% chance of some diagnosis.
One trick in causal discovery is additive noise. If X and Y are noisy correlating variables and X is causing Y, assumption that the noise in X is present in Y but not vice versa may reveal the direction of the causal arrow.
Causal Discovery with Continuous Additive Noise Models http://jmlr.org/papers/volume15/peters14a/peters14a.pdf
Nonlinear causal discovery with additive noise models
Humans seem to have causal reasoning ability that is very ad hoc. It works well in practice but it's not principled. There is not enough time to do experiments to establish facts. Correlation is causality seems to be a good heuristics.
I think that that AI will eventually learn to build causal models in the same way. Build a quick and dirty causal models with unfounded assumptions and see what works. Hold multiple effective conflicting causal theories that apply in different situations without any consistent model.
consider a linear model. The true model is Y ~ aX + ϵ, X causes Y. you want to distinguish, using observational data, from the case where Y causes X.
if the noise ϵ is Gaussian, there's no way to do this: there are reasonable models going both directions.
if you assume ϵ is uniformly distributed on some interval instead, then it becomes really obvious which way is the correct way.
the exercise recommends drawing little pictures with error bars to convince yourself of this, which is worth doing.
> X and Y are noisy correlating variables
All causation implies temporal separation -- causal event X occurs before caused event Y. The trick is to identify which occurred first AND changed the frequency of the second.
An example is the assertion: "The presence of rain causes people to carry an umbrella". Of course, people carry umbrellas even when it doesn't rain, or don't carry umbrellas when it does rain, but on average, on a day when more people carry umbrellas than usual, it's usually a rainy day. The scientific question is: does people carrying umbrellas cause rain? Or does rain cause people to carry umbrellas?
If the natural variation of rain occurs in some detectable manner (e.g. light rain vs heavy rain) and you see direct variation in how people carry umbrellas (less rain thus fewer umbrellas), then it's more likely that rain causes umbrellas because rain variation correlates positively with umbrella variation. This is effectively confirmed if on several days you see that more people are carrying umbrellas than usual but it's NOT raining harder, then probably carrying of umbrellas does not cause it to rain. (Maybe umbrellas were being given away for free on that day, or the weather forecast threatened more rain than actually arrived, causing more umbrellas to be carried.)
Thus when rain amount rises or falls (due to natural variation or noise), you should see the amount of umbrella carrying follow accordingly. However if the reverse relationship occurs less often or not at all, this implies that rain does indeed cause umbrellas, and not the reverse.
This strategy of identifying the causal event works only for pairs of positively correlated events whose variations/noise sometimes do not occur together, like an increase in umbrellas without an increase in rain.
Can barometric pressure rise or fall due to causes other than storms? Can storms arise without being caused by a rise in pressure? I'd say maybe yes to the first (an elevation change of the meter, or a storm front that passes you very quickly but whose clouds don't pass directly overhead — maybe). But I'd say definite no to the second. If you are hit with rain from a storm, your baro pressure will drop. Thus storms cause pressure to drop, but pressure drop does not cause storms.
I was especially interested in the answer to this question, because my only exposure to the language of "causal chains" has been on Twitter, where they seemed to serve a distinctly ideological purpose. One (non-mathematical) person says "I think X is caused by Y", and then a statistician chimes in and says "you're missing other parts of the causal chain, the real causes are Z and Q." Where of course, Z and Q are things that one political perspective prefers to blame, and Y are things blamed by the other side.
For example: https://twitter.com/gztstatistics/status/1000914269188296709. Here's a great comment from today about the difficulty of establishing causality in practice: https://news.ycombinator.com/item?id=18886275
I want to know how causal chains can be actually proven or falsified, to be convinced that this isn't just highbrow ideological woo.
This is addressed in the introduction. See box 4 in the flow-chart (“testable implications”).
“The listening pattern prescribed by the paths of the causal model usually results in observable patterns or dependencies in the data. [...] If the data contradict this implication, then we need to revise our model.”
"These patterns are called "testable implications" because they can be used for testing the model. These are statements like "There is no path connecting D and L," which translates to a statistical statement, "D and L are independent," that is, finding D does not change the likelihood of L."
This says nothing about testing causality, or the direction of causality. If two things are uncorrelated, then there is probably not a causal relationship between them, granted. But this is not a very novel or useful observation.
However if D and L are correlated, the test above says nothing about how to validate whether D caused L, L caused D, both were caused by a third thing (or set of things), or the correlation is just coincidence.
For a book whose entire thesis is "causality is rigorous," I expect a much more rigorous treatment of how to validate causality using more than mere correlation.
Intuitively I might guess that RCT's are the only way of rigorously establishing cause and effect. I would have been very interested if the book had confirmed or denied this intuitive conjecture of mine.
Another comment in this thread claims that you can infer causality without intervention: https://news.ycombinator.com/item?id=18884104 Perhaps this is true?
This is the kind of discussion that I wish the book had focused on. I want to probe at the line between belief and established fact, and understand what we can rigorously say given the evidence we have. I have a strong aversion to reading extended flowery descriptions of big ideas if the speaker has not rigorously shown that the model maps to the real world. Otherwise it's like listening to just-so stories.
The reason I didn't like the book is that I found it insufficiently rigorous to really engage with the "how" of doing causal inference, but excessively mathematical as a theoretical introduction to causality.
"Causality: A Primer" (also written by Pearl) is a very short book that I think does a good job of surfacing some of the same theoretical background while also explaining how to use Pearl's causality. If you exhaust that, I'd recommend moving to the full "Causality" book.
But otherwise I'd recommend actually looking into the counterfactual / potential outcomes view of causality. The set of questions it answers are about 80% overlapping (although both Pearl and POs have their own 20%), but I find the vocabulary a little more intuitive. Canonical books include Morgan and Winship "Counterfactuals and Causal Inference" or Imbens and Rubin "Causal Inference for Social Scientists".
As to the blog post, Pearl is correct that causality requires qualitative assumptions about design to justify assumptions required to do causal inference. In Pearl's work this is often motivated as qualitative knowledge informing the structure of the DAG before any estimation. But recent advances in causal discovery have actually rendered it possible to black box the structure of a DAG from data -- happy to provide citations if this is down the rabbit hole. By contrast, I agree with Gelman that Pearl is an irritating writer and that in "The Book of Why" he gives a sloppy intellectual history of causation.
I would be very interested in these references.
Jonas Peters et al. - Elements of Causal Inference is a textbook that covers a little bit of what they called "learning cause-effect models". For algorithms, check SGS (Spirtes-Glymour-Scheines) and PC (Peter Spirtes and Clark Glymour). I believe both these algorithms are implemented in R in the package `pcalg`. There's another R package on BioConductor that implements them too, but I'm far enough afield from biostats I don't remember the name or have any notes I can find.
Some recent cites of note: Peters and Buhlmann - "Identifiability of Gaussian structural equation models" (2014), which led to Ghoshal and Honorio - "Learning linear structural equation models in polynomial time" (2018) who generalize the Peters/Buhlmann claim.
Other authors to Google: Dominik Janzig; Joris Mooij; Patrik Hoyer -- all of these people write papers with the above people, so you should be able to map out the network.
What the pieces all have in common is that they're trying to establish empirical differences in the joint distributions of X and Y between scenarios where X -> Y and where Y -> X. This is only possible in some cases.
Hope this helps.
eg. https://arxiv.org/abs/1111.6925 and practical example at https://github.com/jmschrei/pomegranate/blob/master/tutorial...
“Simpson’s paradox in its various forms is something that generations of researchers and statisticians have been trained to look out for. And we do. There is nothing mysterious about it. (This debate regarding Simpsons, which appeared in The American Statistician in 2014, and which I link in the article, hopefully will be visible to readers who are not ASA members.)”
There is nothing mysterious about Simpson’s paradox but the proper answer is still being debated!
Pearl’s response ends as follows:
“The next step is to let the community explore:
1) How many statisticians can actually answer Simpson’s question, and
2) How to make that number reach 90%.
I believe The Book of Why has already doubled that number, which is some progress. It is in fact something that I was not able to do in the past thirty years through laborious discussions with the leading statisticians of our time.
It is some progress, let’s continue.”
Donald Rubin has said, surely correctly, that design trumps analysis in causal inference. Pearl's approach seems to be the opposite--all focus is on the analytical details. For practicing scientists, I think this article this article provides a much more useful model for causal inference: https://academic.oup.com/ije/article/45/6/1787/2617188 See Textbox 3 where different approaches to studying the relation between smoking and low birthweight are described. The various approaches rely on different assumptions and any one study design may not be convincing by itself, but the way their results converge ("triangulation") is very convincing. AFAIK, none of the studies used DAGs, yet the causal evidence provided is stronger than any DAG could provide.
There are other comments, and the authors’ reply:
That should not be surprising:
"It is difficult to get a man to understand something when his salary depends upon his not understanding it." -- Upton Sinclair
Accepting Pearl would amount to stating that some of the procedures we(statisticians) have been using, championing and sourcing funds for, for half a century are seriously flawed. Thats going to have consequences on future funding. Of course there will be resistance.
Tony Hoare is worth paraphrasing -- some methods are so crisp and small that they are obviously correct. Others are so complex that one cannot find obvious errors. Piling on a hierarchy of random variables upon random variables and parameters upon prameters lies firmly in the latter class.
This is actually a charitable analogy because some uses of statistical methods are incorrect but the error lies in incorrect use -- Using a tool or a technique to answer a question that it cannot answer. Smothering it with complexity and phrases like 'but real world', 'but noisy big data' helps to muddy the waters enough to deflect the attention from the fundamentals
difference beween conditioning and intervening.
I can be sympathetic to a claim that a method is more effective solving a complicated problem than a simple one. On the otherhand, if it turns out that the body of theory on which the proposed method has been built, the same proposed method that is presumably correct for the complicated case, cannot deal with a pedagogic toy scenario correctly, that raises my eyebrows.
Andrew Gelman summarize it pretty nicely his take on it.
Coming from a statistic background, casual inference is a growing thing now and several government sponsor research have been pushing for it.
Casual inference from statistic point of view is base on missing data, basically Rudin stuff. It's pretty dang interesting to me. I'm sure there are many ways of looking at the same thing. Linear regression you can look at it in more of a optimal math problem with cost function or you can look at it in statitic using maximum likelihood estimation. Both have it's pro and con, with MLE you get a confidence interval. In my bias opinion I feel that statistic is only about data and it's a great domain for casual inference.
There's no need to put a field down to make yours better. But if it's constructive criticism (pro/con, contrast) I think it make both fields better. Pearl attitude is off putting when you try to read his stuff. We're all human and have vary degree of ego, if you're going to try to convince us that do calculus and your ways is better be objective about it or word things better. If you don't want to convince people then just be blunt as hell.
Michael Nielson has a nice post (circa 2012) on the topic at http://www.michaelnielsen.org/ddi/if-correlation-doesnt-impl... with comments at
"About the exposition of causal inference, I have little to say."
That would have been interesting though.
Personally I believe there is no why (no causality at all). Rather we love to think it exists because it reduces our uncertainty. It's too much to accept our whole reality is just a bunch of random coincidences.
All this is to say, it was probably a mistake for the publisher to order the narration, but I am really glad they did.
For example, consider what happens if we try to describe a causal diagram in words
"A points to B, A points to X, B points to Y, and X points to Y. Now, if we apply do(X) to the diagram, we see that we can Y is now no longer a child of..."
or even simple formulas in words:
"P of A given B times P of B is equal to P of B given A times P of A"
For most of us, this sort of deal is hard to "get" and would be much better served if we just looked at a visual diagram or saw the equation.
I personally had to repeat many sections over and over again with a notebook and pencil in hand to truly understand what was being read to me... but if I'm taking notes and creating visuals for myself, then I might as well have just gotten the paper variant of this book lol.
I'd recommend audio for fiction, or non-fiction with an engaging storyline (ex: Bad Blood), which this is not.
Nonetheless, I'd still recommend the book.
I can't tell if this is a typo in the original text, or a typo from the person complaining about a lack of care.