2.1 Sampling Distributions

Recall that a sample is the subset of a population selected for analysis.

We are forced to analyze a sample rather than a population because:

selecting a sample is less time-consuming than selecting the population
selecting a sample is less costly
the resulting analysis is less cumbersome and more practical
sometimes obtaining the population is impossible! So the sample is the best we can do.

When making statements on the population parameters using the sample statistics, we are drawing statistical inference. In order for this inference to be reasonable, we must assume that the characteristics of the sample (i.e., the sample statistics) are reasonably close to the characteristics of the population (i.e., the population parameters). The problem with this assumption is that since we will never see the population, we will never be able to verify if the statistics are reasonably close to the parameters. This chapter discusses several different methods of drawing a sample from a population, as well as their pros and cons. The bottom line is that all of these methods attempt to get a sample to be the best possible subset of the population.

Failing to obtain a sample with the same characteristics as the population can fatally flaw a statistical analysis. If the sample statistics are not close to the population parameters, you are potentially over-representing and/or under-representing important aspects of the population. When the sample statistics do not coincide with the population parameters, then the statistics are said to be biased. When this bias stems from a faulty sample, then this is called sampling bias.