Understand Probability to Make Smarter Health Choices

In this digital age we have unfettered access to vast quantities of historical statistical data on the health and habits of humans who have come before us. Studies that analyze that data in all sorts of ways help us understand trends based on sophisticated mathematical models taking into account age-old rules of statistics, probability and correlation. The practical result of it all is significant-trend spotting across vast cohorts that can ultimately shape all our individual behaviors. The link between exercise and longevity, for example, is harder and harder to ignore due to the sheer volume of statistically relevant data now available on how long people lived measured against how active they were. Yet much of the science of statistics, probability and correlation remains frustratingly hard to intuit.

An understanding of conditional probability in particular allows you to make wiser assessments of your own health risk under specific conditions. But it is also a powerful tool to interpret diagnostic test results and make better treatment choices—a real-world effect we would do well to take control of since, as illustrated below, it is not always the case that your doctor is able to interpret diagnostic testing correctly either.

Your doctor is most likely far better than most of us. A deep grasp of conditional probability evades many experts, as Daniel Kahneman has shown throughout his Nobel Prize-winning career as a psychologist and social economist; still, certain common errors in thinking can cause real confusion in patients and sometimes compromised treatment advice.

Conditional probability is based on an idea that, though on its surface seems quite obvious, at times becomes a rather counterintuitive concept to hold in mind. The rule is simply that the probability that “If A occurs, then B will occur” is very much not the same thing as the probability that “If B occurs, then A will occur.” Yet people make this mistake—switching A and B in their thinking—all of the time. Let's look at one example that illustrates how this seemingly obvious rule can hide itself from our best thinking, leading to sometimes devastatingly inaccurate conclusions.

In 1989, Leonard Mlodinow, author of a book on statistics and probability called The Drunkard's Walk, who also happens to be a married, non-IV drug using, white heterosexual male, took an HIV test that astonished him when it came back positive.

Mlodinow's doctor informed him that the chances were 999 out of 1,000 that the test was accurate and that the author would be dead within a decade. The doctor had derived the 1 in 1,000 chance of being healthy from the following correct statistic: the HIV test produced a positive result when the blood was actually not infected (a “false positive”) in only 1 in 1,000 blood samples.

Although this fact sounds an awful lot like the same message Mlodinow's doctor had passed on, it isn't. The doctor had confused the chances that Mlodinow would test positive if he was not HIV positive with the chances that he would not be HIV positive if he tested positive—a classic mix-up of equating the chances of “If A occurs, then B will occur” with those of “If B occurs, then A will occur.” Kahneman's research suggests over and over that humans are bad at telling the difference.

Define the Sample Space
The first step in determining the true odds that Mlodinow has HIV is to define the sample space. Mlodinow notes that we could include everyone who has ever taken an HIV test, but a more accurate result will come by employing a bit of additional relevant information. In this case, we should start by comparing Mlodinow only to other heterosexual, non-drug abusing white male Americans.

List All Possible Outcomes
We next need to classify the members of the sample space by delineating all possible outcomes. Here there are just four possibilities or subgroups:

1.) those who tested positive and are not;
2.) those who tested positive and are positive;
3.) those who tested negative and are negative; and
4.) those who tested negative and are really positive.

How many people are there in each of these subgroups? Suppose we consider a population of 10,000.

Using statistics from the CDC, we can estimate that in 1989 about 1 in those 10,000 non-IV drug abusing, white heterosexual male Americans who got tested was infected with HIV.

Assuming that the false-negative rate is near zero (that is, those who tested negative but really are positive), this means that about 1 person out of every 10,000 will test positive due to the presence of the infection.

Additionally, since the rate of false positives is 1 in 1,000, in a group 10 times that size (10,000), there will be 10 people who are not infected with HIV but who will test positive anyway. That's 11 people who test positive, with only 1 correctly showing a positive result. The other 9,989 men in the sample space will test negative.

Prune the Sample Space
Pruning the sample space to include only those who tested positive, we have 10 people who are false and 1 true positive case. In other words, among this demographic in 1989, only 1 in 11 people who test positive actually are.

Mlodinow's doctor confused the probability that the test was wrong with the chances that he was not infected. The chances that the test was wrong are 1 in 1,000; the chances that Mlodinow was not infected were better than 10 out of 11.

As we will see shortly, it is important to note that this is not because the statistics used to determine the wrongness rates of the test were derived from a different sample space than Mlodinow's (such as male homosexuals). The false positive rate remains accurate at 1 out of every 1,000 tests, for all sample spaces.

Know the False-Positive Rate
The message here is that it is exceptionally important to know the false-positive rate when evaluating any diagnostic test.

For example, a test that identifies 99% of all malignant tumors sounds very impressive, but we can easily devise a test that identifies 100% of all tumors. All we have to do is report that everyone we examine has a tumor. The key statistic that differentiates this theoretical test from a useful one is that our test would produce a high rate of false positives.

Disease Prevalence is Personal
Yet knowledge of the false-positive rate is not sufficient to determine a test's usefulness, as the above HIV test example illustrates. You must also know how the false-positive rate compares with the true prevalence of the disease. If the disease is rare, even a low false-positive rate does not mean a positive test implies you have the disease. If a disease is common, a positive result is much more likely to be meaningful. Therefore, it is absolutely critical to know what your risk factor for the disease is before you interpret the true meaning of a diagnostic test.

To vividly understand how disease prevalence affects the implications of a positive test, let's suppose the author had been homosexual and tested positive. Assume in the male gay community among those tested in 1989 the chance of infection was about 1%. That means that in the results of 10,000 tests, there would be not 1 (as before) but 100 true positives to go with the 10 false positives we established above (1 out of 1,000). In this case, the chances that a positive test meant the author was infected would have been 10 out of 11. Again, when assessing test results, it's very important to know whether you are in a high-risk group.

Misunderstanding Risk
The idea that the probability that A will occur if B occurs will usually differ from the probability that B will occur if A occurs is known as Bayes's Theory. We can observe the under-appreciated relevance of this distinction bearing out now in breast cancer screenings and their increasingly questionable use in determining next course of action.

For example, in studies in Germany and the U.S., researchers asked physicians to estimate the probability that an asymptomatic woman aged 40 to 50 who has a cancer-positive mammogram actually has breast cancer if 7% of mammograms show cancer when there is none.

The doctors were additionally told that the actual incidence was about 0.8% and the false-negative rate (tested negative but were really cancerous) was 10%.

Putting all that together, Bayes's methods can be used to determine that a cancer-positive mammogram is due to cancer in only about 9% of the cases.

In the German group, one-third of the physicians concluded that the probability was about 90%, with the median estimate for the Germans at about 70%.

In the American group, 95 out of 100 physicians estimated the probability to be approximately 75%.

The Drunkard's Walk: How Randomness Rules Our Lives, by Leonard Mlodinow, 2009, Penguin Books Limited, 272 pp.

Thinking Fast and Slow, by Daniel Kahneman, 2012, Penguin Books Limited, 499 pp.

(return to front page)