3.3 Coding Basics

Now that your software is ready to go, this section introduces you to how R likes to be talked to. Note that the subsequent chapters are full of commands that you will need to learn when the time comes. In the meantime, here are just a few general pointers.

R is what is known as a line command computing language - meaning that it doesn’t need to compile code prior to execution. That being said, try the following command at the prompt in your console (>):

12 + 4
## [1] 16

See? Just a big calculator.

3.3.1 Assigning Objects

We declare variable names and other data objects by assigning things names. For example, we can repeat the calculation above by first assigning some variables the same numbers:

BIG <- 12
SMALL <- 4
(TOTAL <- BIG + SMALL)
## [1] 16

Notice that all of these variable names should now be in your global environment (upper-right window). The reason why 16 was returned on the console is because we put the last command in parentheses. That is the print to screen command.

You might be asking why R simply doesn’t use an equal sign in stead of the assign sign. The answer is that we will be assigning names to output objects that contain much more than a single number. Things like regression output is technically assigned a name, so we are simple being consistent.

3.3.2 Listing, Adding, and Removing

We can list all objects in our global environment using the list command: ls()

ls()
##   [1] "AL"          "alpha"       "AUTO"        "B1"          "Bhat0"       "Bhat1"       "BIG"        
##   [8] "car"         "CARDATA"     "CARDATA2"    "CARGSP"      "CDdata"      "CGDP"        "CM"         
##  [15] "CREG"        "D"           "DENGSP"      "df"          "DGDP"        "DS"          "DTRND"      
##  [22] "e"           "eps"         "EPS"         "Fcrit"       "fit"         "fitpoints"   "Fstat"      
##  [29] "grid.lines"  "h"           "hprice1"     "i"           "i1"          "i2"          "j"          
##  [36] "k"           "left"        "LEFT"        "LFT"         "Lifetime"    "Lifetime1"   "Lifetime2"  
##  [43] "m"           "M"           "MDAT"        "Mode"        "mtcars"      "mu"          "MULTI2"     
##  [50] "n"           "N"           "P"           "PARK"        "probability" "Pval"        "R"          
##  [57] "R2r"         "R2u"         "Rate"        "REG"         "REG1"        "REG2"        "REG3"       
##  [64] "REG4"        "RES"         "Revenue"     "RHT"         "right"       "RIGHT"       "RREG"       
##  [71] "S"           "SBhat1"      "Sig"         "sigma"       "SMALL"       "t"           "t_values"   
##  [78] "tcrit"       "TOTAL"       "tstat"       "UREG"        "wage1"       "x"           "X"          
##  [85] "x.pred"      "X1"          "X2"          "X3"          "Xbar"        "Xcrap"       "xfit"       
##  [92] "xtick"       "xy"          "Y"           "y.pred"      "Y1"          "Y2"          "Y3"         
##  [99] "yfit"        "Yhat"        "Yz"          "Z"           "z.pred"      "Zcrit"       "Zstat"

As we already showed, we can add new variables by simply assigning names to our calculations.

TOTAL.SQUARED <- TOTAL^2

If you ever wanted to remove some variables from your global environment, you can use the remove command: rm(name of variable)

rm(TOTAL.SQUARED)

3.3.3 Loading Data

R can handle data in almost any format imaginable. The main data format we will consider in this class is a trusty old MS Excel file. There are two ways to load data…

1. The Direct Way

Once you locate a data file on your computer, you can direct R to import the file and assign it any name you want. The example below imports a dataset of automobile sales called AUTO_SA.xlsx and names it CARDATA.

library(readxl)
CARDATA <- read_excel("data/AUTO_SA.xlsx")

The term “data/AUTO_SA.xlsx” is the exact location on my computer for this data file. It is recommended that you put all of you data files somewhere easy to access. Like a single folder directly on your C drive.

2. The Indirect (but easy) Way

You can also import data directly into R through Rstudio.

  1. Use the files tab (bottom-right window) and locate the data file you want to import.

  2. Left-click on file and select Import Dataset…

  3. The import window opens and previews your data.

  4. If everything looks good, hit Import and your done.

Note that the import window in step 3 has a code preview section which is actually writing the code needed to import the dataset. It will look exactly like what your code would need to look like in order to import data the direct way. You can refer to that for future reference.

3.3.4 Manipulating Data

You should now have a dataset named CARDATA imported into your global environment. You can examine the names of the variables inside the dataset using the list command - only this time we reference the name of the dataset.

ls(CARDATA)
## [1] "AUTOSALE" "CPI"      "DATE"     "INDEX"    "MONTH"    "YEAR"

When referencing a variable within a dataset, you must reference both the names of the dataset and variable so R knows where to get it. The syntax is:

Dataset$Variable

For example, if we reference the variable AUTOSALE by stating that it is in the CARDATA dataset.

CARDATA$AUTOSALE

We can now manipulate and store variables within the dataset by creating variables for what ever we need. For example, we can create a variable for real autosales by dividing autosales by the consumer price index (CPI).

CARDATA$RSALES <- CARDATA$AUTOSALE / CARDATA$CPI
ls(CARDATA)
## [1] "AUTOSALE" "CPI"      "DATE"     "INDEX"    "MONTH"    "RSALES"   "YEAR"

3.3.5 Subsetting Data

Sometimes our dataset will contain more information than we need. Let us narrow down our dataset to see how we can get rid of unwanted data. You should see a little Excel looking icon to the left of the name CARDATA up in the global environment window. If you click on it, you should see the following:

A Dataset in R

Figure 3.4: A Dataset in R

Thinking of the data set as a matrix with 341 rows and 7 columns will help us understand the code needed to select specific portions of this data.

Note that the variable MONTH cycles from 1 to 12 indicating the months of the year. Suppose we only want to analyze the 12th month of each year (i.e., December). We can do this by creating a new dataset that keeps only the rows associated with the 12 month.

CARDATA2 <- CARDATA[CARDATA$MONTH==12,]

What the above code does is treat the dataset CARDATA as a matrix and lists it as [rows,columns]. The rows instruction is to only keep rows where the month is 12. The columns instruction is left blank, because we want to keep all columns.