Chance plays a huge part in your life, whether you know it or not. Your particular genetic makeup mutated slightly when you were created, and it did so based on specific laws of probability. Performance in school involves human errors, yours and others', which tends to keep your actual ability level from being reflected precisely in your report card or on those high-stakes tests. Research on careers even suggests that what you do for a living was probably not a result of careful planning and preparation, but more likely due to happenstance. And, of course, chance determines your fate in games of chance and plays a large role in the outcome of sporting events.
Fortunately, an entire set of scientific tools, the various applications of statistics, can be used to solve the problems caused by our fate-influenced system. Inferential statistics, a field of science based entirely on the nature of probability, allows us to understand the way things work, discover relationships among variables, describe a huge population by seeing just a small bit of it, make uncannily accurate predictions, and, yes, even make a little money with a well-placed wager here and there.
This book is a collection of statistical tricks and tools. Statistics Hacks presents useful tools from statistics, of course, but also from the realms of educational and psychological measurement and experimental research design. It provides solutions to a variety of problems in the world of social science, but also in the worlds of business, games, and gambling.
If you are already a top scientist and do statistical calculations in your sleep, you'll enjoy this book and the creative applications it finds for those rusty old tools you know so well. If you just like the scientific approach to life and are entertained by cool ideas and clever solutions to interesting problems, don't worry. Statistics Hacks was written with the nonscientist in mind, too, so if that is you, you've come to the right place.
It's written for the nonstatistician as well, so if this still describes you, you'll feel safe here.
If, on the other hand, you are taking a statistics course or have some interest in the academic nature of the topic, you might find this book a pleasant companion to the textbooks typically required for those sorts of courses. There won't be any contradictions between your textbook and this book, so hearing about real-world applications of statistical tools that seem only theoretical won't hurt your development. It's just that there are some pretty cool things that you can do with statistics that seem more like fun than like work.
Why Statistics Hacks?
The term hacking has a bad reputation in the press. They use it to refer to people who break into systems or wreak havoc, using computers as their weapon. Among people who write code, though, the term hack refers to a "quick-and-dirty" solution to a problem or a clever way to get something done. And the term hacker is taken very much as a compliment, referring to someone as being creative, having the technical chops to get things done. The Hacks series is an attempt to reclaim the word, document the good ways people are hacking, and pass the hacker ethic of creative participation on to the uninitiated.
Seeing how others approach systems and problems is often the quickest way to learn about a new technology. The technologies at the heart of this book are statistics, measurement, and research design. Computer technology has developed hand-in-hand with these technologies, so the use of the term hacks to describe what is done in this book is consistent with almost every perspective on that word. Though there is just a little computer hacking covered in these pages, there is a plethora of clever ways to get things done.
How This Book Is Organized
You can read this book from cover to cover if you like, but each hack stands on its own, so feel free to browse and jump to the different sections that interest you most. If there's a prerequisite you need to know about, a cross-reference will guide you to the right hack.
The earlier hacks are more foundational and probably provide generalized solutions or strategic approaches across a variety of problems to a greater extent than later hacks. On the other hand, later hacks provide much more specific tricks for winning games or just information to help you understand what's going on around you.
The book is divided into several chapters, organized by subject:
Chapter 1, The Basics
Use these hacks as a strong set of foundational tools, the ones you will use most often when you are stat-hacking your way into and out of trouble. Think of these as your basic toolkit: your hammer, saw, and various screwdrivers.
Chapter 2, Discovering Relationships
This chapter covers statistical ways to find, describe, and test relationships among variables. You will be able to make the invisible visible with these hacks.
Chapter 3, Measuring the World
A variety of tips and tricks for measuring the world around you are presented here. You'll learn to ask the right questions, assess accurately, and even increase your own performance on high-stakes tests.
Chapter 4, Beating the Odds
This chapter is for the gambler. Use the odds to your advantage, and make the right decisions in Texas Hold 'Em poker and just about every other game in which probability determines the outcome.
Chapter 5, Playing Games
From TV game show strategy to winning Monopoly to enjoying sports to just having fun, this chapter presents different hacks for getting the most out of your game playing.
Chapter 6, Thinking Smart
This chapter is perhaps the most cerebral of them all. Get your mind right, play mind games, make discoveries, and unlock the mysteries of the world around us using the statistics hacks you'll find here.
The Basics Hacks 1-10
There's only a small group of tools that statisticians use to explore the world, answer questions, and solve problems. It is the way that statisticians use probability or knowledge of the normal distribution to help them out in different situations that varies. This chapter presents these basic hacks.
Taking known information about a distribution and expressing it as a probability [Hack #1] is an essential trick frequently used by stat-hackers, as is using a tiny bit of sample data to accurately describe all the scores in a larger population [Hack #2]. Knowledge of basic rules for calculating probabilities [Hack #3] is crucial, and you gotta know the logic of significance testing if you want to make statistically-based decisions [Hacks #4 and #8]. Minimizing errors in your guesses [Hack #5] and scores [Hack #6] and interpreting your data [Hack #7] correctly are key strategies that will help you get the most bang for your buck in a variety of situations. And successful stat-hackers have no trouble recognizing what the results of any organized set of observations or experimental manipulation really mean [Hacks #9 and #10]. Learn to use these core tools, and the later hacks will be a breeze to learn and master.
1Hack #1 Know the Big Secret
Statisticians know one secret thing that makes them seem smarter than everybody else.
The primary purpose of statistics as a scientific methodology is to make probability statements about samples of scores. Before we jump into that, we need some quick definitions to get us rolling, both to understand this hack and to lay a foundation for other statistics hacks.
Samples are numeric values that you have gathered together and can see in front of you that represent some larger population of scores that you have not gathered together and cannot see in front of you. Because these values are almost always numbers that indicate the presence or level of some characteristic, measurement folks call these values scores. A probability statement is a statement about the likelihood of some event occurring.
Probability is the heart and soul of statistics. A common perception of statisticians, in fact, is that they mainly calculate the exact likelihood that certain events of interest will occur, such as winning the lottery or being struck by lightning. Historically, the person who had the tools to calculate the likely outcome of a dice game was the same person who had the tools to describe a large group of people using only a few summary statistics.
So, traditionally, the teaching of statistics includes at least some time spent on the basic rules of probability: the methods for calculating the chances of various combinations or permutations of possible outcomes.
More common applications of statistics, however, are the use of descriptive statistics to describe a group of scores, or the use of inferential statistics to make guesses about a population of scores using only the information contained in a sample of scores. In social science, the scores usually describe either people or something that is happening to them.
It turns out, then, that researchers and measurers (the people who are most likely to use statistics in the real world) are called upon to do more than calculate the probability of certain combinations and permutations of interest. They are able to apply a wide variety of statistical procedures to answer questions of varying levels of complexity without once needing to compute the odds of throwing a pair of six-sided dice and getting three 7s in a row. Those odds are .005 or 1/2 of 1 percent if you start from scratch. If you have already rolled two 7s, you have a 16.6 percent chance of rolling that third 7.
The Big Secret
The key reason that probability is so crucial to what statisticians do is because they like to make probability statements about the scores in real or theoretical distributions.
A distribution of scores is a list of all the different values and, sometimes, how many of each value there are. For example, if you know that a quiz just administered in a class you are taking resulted in a distribution of scores in which 25 percent of the class got 10 points, then I might say, without knowing you or anything about you, that there is a 25 percent chance that you got 10 points. I could also say that there is a 75 percent chance that you did not get 10 points. All I have done is taken known information about the distribution of some values and expressed that information as a statement of probability. This is a trick. It is the secret trick that all statisticians know. In fact, this is mostly all that statisticians ever do!
Statisticians take known information about the distribution of some values and express that information as a statement of probability. This is worth repeating (or, technically, threepeating, as I first said it five sentences ago). Statisticians take known information about the distribution of some values and express that information as a statement of probability.
Heavens to Betsy, we can all do that. How hard could it be? Imagine that there are three marbles in an otherwise empty coffee can.
Further imagine that you know that only one of the marbles is blue. There are three values in the distribution: one blue marble and two marbles of some other color, for a total sample size of three. There is one blue marble out of three marbles. Oh, statistician, what are the chances that, without looking, I will draw the blue marble out first? One out of three. 1/3. 33 percent.
To be fair, the values and their distributions most commonly used by statisticians are a bit more abstract or complex than those of the marbles in a coffee can scenario, and so much of what statisticians do is not quite that transparent. Applied social science researchers usually produce values that represent the difference between the average scores of several groups of people, for example, or an index of the size of the relationship between two or more sets of scores. The underlying process is the same as that used with the coffee can example, though: reference the known distribution of the value of interest and make a statement of probability about that value.
The key, of course, is how one knows the distribution of all these exotic types of values that might interest a statistician. How can one know the distribution of average differences or the distribution of the size of a relationship between two sets of variables?
Conveniently, past researchers and mathematicians have developed or discovered formulas and theorems and rules of thumb and philosophies and assumptions that provide us with the knowledge of the distributions of these complex values most often sought by researchers. The work has been done for us.
A Smaller, Dirtier Secret
Most of the procedures that statisticians use to take known information about a distribution of scores and express that information as a statement of probability have certain requirements that must be met for the probability statement to be accurate. One of these assumptions that almost always must be met is that the values in a sample have been randomly drawn from the distribution.
Notice that in the coffee can example I slipped in that "without looking" business. If some force other than random chance is guiding the sampling process, then the associated probabilities reported are simply wrong and-here's the worst part-we can't possibly know how wrong they are. Much, and maybe most, of the applied psychological and educational research that occurs today uses samples of people that were not randomly drawn from some population of interest.
College students taking an introductory psychology course make up the samples of much psychological research, for example, and students at elementary schools conveniently located near where an educational researcher lives are often chosen for study. This is a problem that social science researchers live with or ignore or worry about, but, nevertheless, it is a limitation of much social science research.