1.2 The Vocabulary of Statistics

  • A variable is a characteristic of an item or group that one wishes to analyze.

  • Data are the different values of the variable observed and recorded.

  • An operational definition establishes a meaningful use of the variable. Simply put, we need to ensure the data sufficiently captures what you want to analyze.

  • A population consists of all items you want to draw a conclusion from. The issue with a population is that it is the entire universe of observations that you are interested in, but they can never be fully observed.

    • Sometimes a population is too costly to collect and analyze. For example, you won’t call up every single voter for an election poll.

    • Sometimes a population is impossible to collect because some observations have yet to be determined. For example, the population of end-of-day indices for the S&P 500 includes every observation that has ever existed as well as every observation that has yet to exist.

  • A sample is the portion (i.e., subset) of a population selected for analysis. These are our observations in hand.

  • A statistic is a characteristic of a sample. Since we can observe the sample (i.e., our data), these are our descriptive statistics.

  • A Parameter is a characteristic of a population. Since we cannot observe the population, the best we can do is draw inferential statistics (or predictions) about them. While the value of a parameter exists, we would have to be omniscient in order to know it. The best we can do is use our sample statistics to construct an educated guess of what this value might be.

Recall the problem of the wholesaler who has a supply of light bulbs. It would be great if we could state what the average lifespan of the light bulbs are, but that would require timing every light bulb until they burn out. This isn’t very useful.

The seven terms stated above translate to our light bulb example as follows:

Term Our light bulb problem
Variable The lifespan of a light bulb
Data The light bulbs that you actually plugged in and recorded the time it takes until burnt out
Operational Definition The lifespan in minutes
Population The entire group of light bulbs (all 100,000 of them)
Sample The subset of the population selected for analysis. Sometimes referred to as the data sample.
Statistic The average lifespan of every light bulb in the sample
Parameter The average lifespan of every light bulb in the population

Inferential statistics allow us to describe the parameter of a population by using the corresponding statistic of a sample. We will never be able to truly know the population parameter, because the information available in the sample is all we got.

How do we know if the sample statistic is a GOOD predictor of the population parameter? The kicker is that since we cannot observe the population, the only thing we can do is try our best to ensure that the characteristics of the sample are the same as the population. This has to do with sample selection - a very important topic that will be addressed soon. First, we start by discussing the descriptive measures of data.