logo

55 pages 1 hour read

Philip E. Tetlock, Dan Gardner

Superforecasting: The Art and Science of Prediction

Philip E. Tetlock, Dan GardnerNonfiction | Book | Adult | Published in 2015

A modern alternative to SparkNotes and CliffsNotes, SuperSummary offers high-quality Study Guides with detailed chapter summaries and analysis of major themes, characters, and more.

Chapters 3-4Chapter Summaries & Analyses

Chapter 3 Summary: “Keeping Score”

One of the problems with forecasts is the ambiguity of their verbiage. Studies have shown that people differ in their interpretations of words like “significant” and “likely,” and often, those making the predictions do not delineate their definition. Many forecasts also lack concrete timescales. While forecasters might assume that a timescale is implied, failing to directly specify a timescale precludes measurement of the forecast’s accuracy.

After the disastrous 1961 Bay of Pigs invasion of Cuba by the Kennedy administration, history professor Sherman Kent devised a scale to address such discrepancies. Highly regarded in the intelligence community, Kent left Yale in 1941 to join the newly created Coordinator of Information agency (a precursor of the Central Intelligence Agency, or CIA). Kent argued that people needed to quantitatively define such terms such as “serious possibility,” and in 1964, he published a scale where 100% was certainty and 0% was impossibility (55). Other values, such as 75% (give or take about 12%), meant probable. While Kent’s scale aimed to reduce ambiguity, people were reluctant to adopt it, preferring their familiar linguistic methods to the numerical ones that made them feel bookish. Tetlock and Gardner argue that such numerical grading is useful; when forecasters are forced to break down their thinking processes, they become better at identifying more minute degrees of uncertainty, “just as artists get better at distinguishing subtler shades of gray” (57).

Still, the authors acknowledge that one downside to using numbers is what they call the “wrong-side-of-maybe fallacy,” whereby if a meteorologist predicts a 70% chance of rain, they also predict a 30% chance of no rain. This means that if it does not rain, their forecast would not necessarily be inaccurate. However, one way to predict with greater accuracy is to accrue several forecasts over time and then tabulate the results. For example, the accuracy of a particular meteorologist’s predictions can be calibrated within a specified period by using a graph with the x-axis showing the forecast and the y-axis showing the percentage correct. A perfect diagonal would indicate complete accuracy, while being above the diagonal line would indicate underconfidence and being below would indicate overconfidence.

Using a system developed by American statistician and meteorologist Glenn W. Brier in 1950, forecasters can be given a score that measures the distance between what they forecast and what actually happened. Zero is the perfect score, while 0.5 is the 50/50 guesswork of the dart-throwing chimpanzee from Chapter 1. However, it is important to consider Brier scores in context of the difficulty of the predictions made. Thus, a meteorologist scoring 0.2 in a changeable climate is more accurate than someone scoring the same in a predictable climate. The changeable-climate meteorologist is considered more accurate because they embody resolution as well as calibration, meaning that they do not merely play it safe in their predictions but instead use their data to make bold, supportable forecasts.

For Tetlock’s mid-1980s research program that he called Expert Political Judgment, he recruited 284 anonymous experts drawn from areas like academia and think tanks to make political predictions for events falling within specific time frames. Tetlock divided the experts into two groups, using Greek warrior-poet Archilochus’s distinction between people who are like hedgehogs and those who are like foxes. Archilochus judged that while hedgehogs know one big thing, foxes know lots of little things. The hedgehogs in Tetlock’s sample were big-idea people with politically diverse ideological bents. The hedgehogs tended to see all evidence through a lens that supported their core belief, even in the face of contradictory evidence. Hedgehogs’ singular perspective is a hindrance to forecasting because they are inflexible and prone to seeing evidence through their biases. Still, hedgehogs are often sought-after media presences because they can tell simple, provocative stories that captivate audiences.

In contrast, foxes, who know lots of little things and can see the uncertainty in situations, take their time to peruse contradictory data before reaching conclusions. This tendency can make foxes less charismatic media presences. Nevertheless, their deliberation makes foxes better forecasters. One key tool of foxes, both on an individual and a collective level, is aggregation, whereby they pool several judgments together before taking an average. Aggregation is particularly valuable when considering perspectives from many people who know different things that are relevant to the situation at hand. In the political sphere, aggregating data from different types of surveys, such as in the 2012 “poll of polls,” often leads to better electoral predictions.

The authors assert that aggregation, unlike “tip-of-your-nose thinking,” is counterintuitive and must be taught. While stepping outside of oneself and consciously seeking different perspectives can be difficult, foxes are more likely to do so than are hedgehogs. Still, Tetlock and Gardner acknowledge that people are not always consistent when it comes to predictions and decision-making and that people’s approaches can be more foxlike or more hedgehoglike depending on the context. However, the authors assert that people can become conscious of their tendencies and learn to switch gears between different thinking styles.

Chapter 4 Summary: “Superforecasters”

The imperative for good forecasting is nowhere more evident than in the George W. Bush administration’s 2003 mistaken assumption that Iraq was harboring weapons of mass destruction (WMDs). This error cost thousands of lives and billions of dollars.

The authors speculate that the US intelligence community (IC) that produced the argument to support the Bush administration’s belief was not merely pandering to a warmonger’s desires. Instead, Tetlock and Gardner cite the work of Robert Jervis, author of Why Intelligence Fails (2010). Jervis concluded that the IC’s findings were reasonable, but wrong. The intelligence available at the time hinted that Iraqi leader Saddam Hussein was hiding something, and so the IC fell into the bait-and-switch error of replacing a difficult question—whether Hussein actually had WMDs—with a much easier question—it was reasonable to conclude that he had WMDs. Jervis argued that had the IC reached the same conclusion but been less confident about it, this might have avoided the unnecessary destruction of the Iraq war; a 60-70% calculated expectation that Hussein possessed WMDs would have been insufficient to convince Congress to support military action.

As part of the fallout from the Iraq war, the Intelligence Advanced Research Projects Activity was created in 2006, with the mission of funding research to help the US intelligence community make better forecasts and decisions. Tetlock became involved, demanding that the IC should perform tests in realistic environments, just as the medical community does. Tetlock advocated that the IC evaluate its methods by keeping tabs not only on their processes but also on their accuracy over time.

In 2010, IARPA proposed that teams of forecasters trained by Tetlock compete against the US intelligence community in the accuracy stakes. Tetlock and his team recruited 3,200 volunteers via psychometric tests for what became the Good Judgment Project. The GJP volunteers beat IC professionals at the prediction, through a method of aggregating the wisdom of crowds and prioritizing the judgments of those who were more successful at forecasting.

Tetlock allowed his forecasters a six-month window to answer questions, some of which were as difficult as the issue of whether Serbia would join the European Union by the end of 2011. During this time, forecasters would respond to changes in world events by updating their predictions. Then, all the forecasts were entered into a calculation that produced a Brier score on the focus question. Superforecaster and California retiree Doug Lorch made 1,000 such predictions over the course of a year. While Doug showed promise at the end of the first year, his predictions improved further when he joined a team of superforecasters at the end of the second year. He exceeded the wisdom of the crowd by 60%, meaning that on his own, he surpassed the performance target that IARPA had proposed for its multimillion-dollar research program.

It is not just luck that makes people like Doug superforecasters, but rather a combination of skill and luck. Financial academic Michael Mauboussin’s notion of “regression to the mean” shows that most phenomena that initially show spectacular promise eventually regress to closer to the average. Thus, regression to the mean is a useful test for the role of luck in a forecaster’s success. However, Tetlock found that superforecasters’ performance held up “phenomenally well” from one year to the next and that superforecasters like Doug actually increased their lead over other forecasters as time went on (102). Indeed, 70% of superforecasters’ results did not regress to the mean. This suggests that skill as opposed to luck shaped their performance.

Chapters 3-4 Analysis

Chapters 3 and 4 make a case for Forecasting and the Crucial Ingredient of Doubt, or the importance of acknowledging the limitations of one’s own insight and predictive abilities. To convey this point, Tetlock and Gardner illustrate the consequences of a lack of doubt (i.e., overconfidence). The authors describe how a league of intelligence experts, who initially seem like the most competent candidates, were outperformed by Tetlock’s team of non-expert volunteers. IARPA were beaten at their own game due to erroneous thinking processes that no amount of specialist knowledge could compensate for. As Tetlock clarifies, the world’s chief intelligence agencies succumbed to the bait and switch when they stopped their searching for evidence that Hussein had WMDs and instead became transfixed by the fact that the Iraqi leader was “acting like someone who was hiding something” (83). The authors warn against applying hindsight bias—the bias of thinking “I knew it all along”—with the knowledge that there were no WMDs. We must instead appreciate the great degree of contemporary uncertainty surrounding proceedings in Iraq because if we do not, we will draw “false lessons from experience” (83), thereby worsening our judgment over time. Thus, a healthy respect for how little it is possible to know at a given time makes for better forecasters. Had the IC applied this respect for the unknown to the question of Iraq’s potential WMDs, they might still have concluded affirmatively but been less confident in their prediction, giving it a probability of 60-70%. Such lower, quantified probability could have precipitated a different outcome, as Congress likely would not have applied the label “beyond reasonable doubt” to such a number and thereby would have been more hesitant to embark on a costly and futile war (84).

The authors treat overconfidence as a form of hubris—a pride leading to downfall. A healthy doubt is antithetical to such hubris, and for forecasters, such doubt goes beyond mere mindset to affect practice. With the aim of countering overconfidence and processing errors, superforecasters take a scientific approach and adopt Tetlock’s advice of using numbers instead of verbal terms for their predictions. This instantly makes their forecasts more precise and renders the forecasters more accountable for their decisions. It also has the added benefit of facilitating feedback in the case of a forecast that has gone awry, thereby aiding forecasters with future predictions. The authors emphasize how the volunteer forecasters’ relative anonymity and lack of reputational stakes when making decisions where they are non-experts is crucial, as it forces humility and the type of beginner’s mind that is essential to tackling problems objectively, with fresh eyes. Doubt is often seen as a limiting force, but for forecasters, it’s ultimately an accelerant.

The theme of Hedgehogs and Foxes comes up explicitly, as superforecasters are far more likely to be the fox, who knows many little things, while subject-matter experts like intelligence analysts are more aligned with the hedgehog, who knows one big thing. When it comes to forecasting, the authors argue for a shift in US culture, which prefers the compelling narratives and facile conclusions of hedgehogs to the unwieldy reports of foxes, who carefully deliberate the contradictory complexity of reality. Here, they show how appreciating subtlety and nuance is essential to accurate prediction, regardless of humans’ evolutionary preference for certainty and clear-cut conclusions. Tetlock and Gardner make a case for treating forecasting as problem-solving, where process is prioritized over speed, storytelling, and luck. Nevertheless, while the work of accurate forecasting may be seen as overly cerebral and dull compared with compelling narratives and predictions, such work is essential in preventing disasters such as unnecessary, destructive wars.

blurred text
blurred text
blurred text
blurred text
Unlock IconUnlock all 55 pages of this Study Guide

Plus, gain access to 8,800+ more expert-written Study Guides.

Including features:

+ Mobile App
+ Printable PDF
+ Literary AI Tools