A classic since it was originally published in 1954, How to Lie with Statistics introduces readers to the major misconceptions of statistics as well as to the ways in which people use statistics to dupe you into buying their products. Above all, this book is a call to the public to be skeptical of the information dumped on us by the media and advertising.
Let’s imagine that you want to figure out how many red beans are in the bean barrels of Bob’s Bean Production Plant. The only way to know for certain is to dump out every single barrel of beans and count them all individually. But doing that is not only time consuming: it’s also incredibly expensive.
Luckily, using statistics, there’s an easier way.
In order to make statistical estimates, you need to create a sample, i.e., a carefully chosen data set used to represent the whole of whatever it is you want to analyze. And since sampling is the basis for drawing conclusions in statistics, creating a good sample is absolutely crucial.
But for a sample to be “good,” it must possess two qualities: it must be large enough to be statistically significant and it must be random.
We’ll address sample size a little later and focus first on randomness – because the only kind of sample that gives true statistical data is one that is purely random.
For example, if you’re interviewing 25-year-old women about how often they play guitar, you would have to randomly select 25-year-old women regardless of their income, social class or anything else.
But getting a truly random sample is easier said than done. Laborious and expensive as it is to “count every bean” by hand, finding creative ways to reach a truly randomized sample is also extremely difficult.
Looking back to our barrels of Bob’s Beans, a good sample is easy to find if the beans are randomly mixed. You just pull out a handful and you have your sample. But what if the barrel wasn’t mixed and you take a handful from the top where only white beans are?
If you’d based your sample on that and concluded that the barrel is full of white beans, you’d have fallen victim to sample bias. In the same way, non-randomized samples can bias an experiment or study.