Chapter 2 Data Collection and Sampling

Always remember the ultimate goal of inferential statistics:

We want to say something important about the characteristics of a population (parameters) without ever observing the entire population.

Since we can never get a hold of the population, the best thing we can do is to draw a subset of the population (i.e., a sample), use it to calculate the sample characteristics (i.e., statistics), and then draw inference on the population parameters.

The reason why we can say something about a population parameter of interest solely by looking at the statistics from a sample is because we are under the assumption that the sample has the same characteristics as the population. In other words, we say that the sample average is a good guess for the population average, the sample standard deviation is a good guess for the population standard deviation, etc. This is not an assumption that is simply made by wishful thinking. In fact, there is an entire field of statistics devoted to sample selection.

We won’t spend a lot of time on this very important matter, and will instead assume in later chapters that the characteristics of the sample do in fact match those of the population. Nonetheless, this chapter will discuss a few sampling methods so you can rest assured that our crucial assumption of similar sample and population characteristics has a reasonable chance of holding.