2 Introduction to R

These notes introduce core R concepts you’ll use throughout the course.

Each section includes short explanations and runnable code.

Tip: You can run code line-by-line in RStudio with Ctrl+Enter (Windows) / Cmd+Enter (Mac).


2.1 Basic R Operations and Concepts

R works like a powerful calculator and a programming language. You type commands into the Console, and R evaluates them.


2.1.1 The Console and Scripts

  • Console: where commands run immediately.
  • Script (.R) / R Markdown (.Rmd): where you write reproducible work you can save and re-run.

In this course, most notes will be written in R Markdown (.Rmd), which combines text, code, and output in one document.


2.1.2 Comments

Use # to write comments. R ignores them.

# This is a comment
2 + 2
## [1] 4

2.1.3 Printing and Output

Typing an object name prints it. You can also use print().

x <- 10
x
## [1] 10
print(x)
## [1] 10

2.2 Arithmetic

R supports standard arithmetic operators:

  • Addition: +
  • Subtraction: -
  • Multiplication: *
  • Division: /
  • Exponentiation: ^
  • Integer division: %/%
  • Remainder (mod): %%
# Basic arithmetic
5 + 3
## [1] 8
10 - 4
## [1] 6
6 * 7
## [1] 42
20 / 5
## [1] 4
# Powers
2^5
## [1] 32
# Integer division and remainder
17 %/% 3
## [1] 5
17 %% 3
## [1] 2

These commands return numeric output immediately in the console.


2.2.1 Order of Operations

R follows standard order of operations. Use parentheses to be explicit.

3 + 2 * 5      # multiplication first
## [1] 13
(3 + 2) * 5    # parentheses first
## [1] 25

2.2.2 Logical Comparisons

These return TRUE or FALSE.

5 > 3
## [1] TRUE
5 == 3
## [1] FALSE
5 != 3
## [1] TRUE
5 >= 5
## [1] TRUE

2.3 Assignment, Object Names, and Data Types

2.3.1 Assignment

Use <- (common in R code) or =.

a <- 5
b = 12
a + b
## [1] 17

2.3.2 Object Names

Good names are readable and informative.

Valid examples

  • height
  • exam_score
  • x1

Avoid

  • spaces (exam score)
  • starting with numbers (1st)
  • reserved words (if, for, TRUE, FALSE)
exam_score <- 92
final_grade <- "A"

2.3.3 Common Data Types

2.3.3.1 Numeric (double)

x <- 3.14
typeof(x)
## [1] "double"
class(x)
## [1] "numeric"

2.3.3.2 Integer

y <- 7L
typeof(y)
## [1] "integer"
class(y)
## [1] "integer"

2.3.3.3 Character (strings)

name <- "Rene"
typeof(name)
## [1] "character"
class(name)
## [1] "character"

2.3.3.4 Logical

flag <- TRUE
typeof(flag)
## [1] "logical"
class(flag)
## [1] "logical"

2.3.4 Missing Values

R uses NA for missing data.

z <- c(1, NA, 3)
z
## [1]  1 NA  3
is.na(z)
## [1] FALSE  TRUE FALSE

NA affects calculations unless you remove missing values.

mean(z)                 # returns NA
## [1] NA
mean(z, na.rm = TRUE)   # removes NA before computing mean
## [1] 2

2.4 Vectors

Definition 2.1 (Vector in R) A vector is a collection of values of the same type stored in a single object in R.

Vectors are the basic building blocks in R: an ordered collection of values.

2.4.1 Creating Vectors with c()

v <- c(2, 4, 6, 8)
v
## [1] 2 4 6 8

2.4.2 Vector Length

length(v)
## [1] 4

2.4.3 Vector Arithmetic (Vectorized Operations)

Operations apply element-by-element.

v + 1
## [1] 3 5 7 9
v * 2
## [1]  4  8 12 16
v^2
## [1]  4 16 36 64

2.4.4 Sequences with : and seq()

1:10
##  [1]  1  2  3  4  5  6  7  8  9 10
seq(from = 0, to = 1, by = 0.2)
## [1] 0.0 0.2 0.4 0.6 0.8 1.0
seq(from = 1, to = 10, length.out = 5)
## [1]  1.00  3.25  5.50  7.75 10.00

2.4.5 Repetition with rep()

rep(5, times = 4)
## [1] 5 5 5 5
rep(c("A", "B"), times = 3)
## [1] "A" "B" "A" "B" "A" "B"
rep(1:3, each = 2)
## [1] 1 1 2 2 3 3

2.4.6 Indexing Vectors

2.4.6.1 By position

v <- c(10, 20, 30, 40, 50)
v[1]      # first element
## [1] 10
v[3]      # third element
## [1] 30
v[c(2,5)] # second and fifth
## [1] 20 50

2.4.6.2 By negative indexing (remove elements)

v[-1]     # all but first
## [1] 20 30 40 50
v[-c(2,4)]
## [1] 10 30 50

2.4.6.3 By logical indexing

v[v > 25]     # keep values greater than 25
## [1] 30 40 50
v[v == 20]    # values equal to 20
## [1] 20

2.4.7 Named Vectors

Names make code easier to read.

grades <- c(Midterm = 88, Final = 93, Project = 90)
grades
## Midterm   Final Project 
##      88      93      90
grades["Final"]
## Final 
##    93

2.5 Functions and Expressions

Definition 2.2 (Function) A function is a set of instructions that takes inputs and returns an output.

R has many built-in functions: mean(), sum(), sd(), etc.

x <- c(2, 4, 6, 8, 10)

sum(x)
## [1] 30
mean(x)
## [1] 6
sd(x)
## [1] 3.162278
min(x)
## [1] 2
max(x)
## [1] 10

2.5.1 Function arguments

Arguments control how the function behaves.

round(3.14159, digits = 2)
## [1] 3.14
mean(c(1, NA, 3), na.rm = TRUE)
## [1] 2

2.5.2 Expressions and Nesting

You can combine (nest) functions.

x <- c(1, 2, 3, 4, 5)
sqrt(sum(x^2))
## [1] 7.416198

2.5.3 Creating Your Own Function

Use function() to define reusable code.

# A simple function: compute z-scores
zscore <- function(x) {
  (x - mean(x)) / sd(x)
}

zscore(c(10, 12, 15, 20))
## [1] -0.9771621 -0.5173211  0.1724404  1.3220429
quick_summary <- function(x) {
  c(
    n = length(x),
    mean = mean(x, na.rm = TRUE),
    sd = sd(x, na.rm = TRUE),
    min = min(x, na.rm = TRUE),
    max = max(x, na.rm = TRUE)
  )
}

quick_summary(c(1, 2, 3, NA, 5))
##        n     mean       sd      min      max 
## 5.000000 2.750000 1.707825 1.000000 5.000000

2.6 Getting Help

R has excellent built-in help tools.

2.6.1 Help Pages

Use ? or help().

?mean
help(sd)

2.6.2 Examples in Help Files

Many help pages include examples you can run:

example(mean)
## 
## mean> x <- c(0:10, 50)
## 
## mean> xm <- mean(x)
## 
## mean> c(xm, mean(x, trim = 0.10))
## [1] 8.75 5.50

2.6.3 Searching for a Function

If you don’t know the exact name, use help.search() or ??.

??regression
help.search("histogram")

2.6.4 Inspecting Objects

x <- rnorm(5)
str(x)
##  num [1:5] 0.607 1.119 1.254 -1.22 1.21
class(x)
## [1] "numeric"
typeof(x)
## [1] "double"
ls()        # list objects in your environment
##   [1] "a"                    "A"                    "actD"                 "Admission"            "age"                 
##   [6] "aLin"                 "aLog"                 "alp"                  "alpha"                "alt"                 
##  [11] "aQua"                 "aSqr"                 "b"                    "B"                    "bac"                 
##  [16] "biaSam"               "bLin"                 "bLog"                 "booD"                 "booPva"              
##  [21] "bQua"                 "bSqr"                 "c"                    "C"                    "calories"            
##  [26] "cards"                "chi_stat"             "ci_bounds"            "conInt"               "control"             
##  [31] "cov"                  "cQua"                 "d"                    "D"                    "dat"                 
##  [36] "deck"                 "den"                  "denLow"               "densityChiSquare"     "denUpp"              
##  [41] "df"                   "die1"                 "die2"                 "dif"                  "differences"         
##  [46] "E"                    "eduPar"               "eduPer"               "erI"                  "errors"              
##  [51] "exam_score"           "f"                    "final_grade"          "firstAce"             "fitLin"              
##  [56] "fitLog"               "fitQua"               "fitSqr"               "Fl"                   "flag"                
##  [61] "fLin"                 "fLog"                 "fQua"                 "Fr"                   "Fs"                  
##  [66] "fSqr"                 "Gender"               "gra"                  "grades"               "graPer"              
##  [71] "hei"                  "heiDisBiaSam"         "heiDisPop"            "heiDisRanSam"         "heiDisSimRanSam"     
##  [76] "heiRacMea"            "heiRacSd"             "i"                    "incLev"               "incPer"              
##  [81] "int"                  "j"                    "k"                    "knoVar"               "l"                   
##  [86] "len"                  "lm1"                  "lm2"                  "lm3"                  "lm4"                 
##  [91] "lower"                "M"                    "mat"                  "matSco"               "maxScoMat"           
##  [96] "maxScoVer"            "meaBiaSam"            "meaDisBiaSam"         "meaDisRanSam"         "meaDisSimRanSam"     
## [101] "means"                "meaPop"               "meaPro"               "meaRanSam"            "meaSimRanSam"        
## [106] "minScoMat"            "minScoVer"            "mu"                   "mu_null"              "mu_pop"              
## [111] "mu0"                  "mu1"                  "mu2"                  "muA"                  "n"                   
## [116] "N"                    "n_required"           "n1"                   "n2"                   "name"                
## [121] "namEduPar"            "namIncLev"            "namSch"               "nP"                   "null_means"          
## [126] "num_rep"              "numGra"               "numPeoRac"            "numRep"               "numSam"              
## [131] "numSch"               "numSel"               "numStu"               "numStuClu"            "numStuGra"           
## [136] "p"                    "p_value"              "p1"                   "p2"                   "pA"                  
## [141] "pA_exact"             "pAandB"               "pAandB_exact"         "pAc"                  "pAorB"               
## [146] "pB"                   "pB_exact"             "pBgivenA"             "pC"                   "pCorD"               
## [151] "pD"                   "pop"                  "population"           "powTte"               "powZte"              
## [156] "pro"                  "pro50"                "proInt"               "proRej"               "pval"                
## [161] "pVal"                 "qua"                  "quick_summary"        "r"                    "R"                   
## [166] "rac"                  "racNam"               "ranSam"               "reaAlc"               "reaBac"              
## [171] "rej"                  "rejBoo"               "rejMat"               "rejPer"               "rejPoo"              
## [176] "rejTre"               "rep"                  "resLin"               "resLog"               "resQua"              
## [181] "resSqr"               "s"                    "S"                    "s1"                   "s2"                  
## [186] "s21"                  "s22"                  "sample_data"          "sample_means"         "sample1"             
## [191] "sample2"              "samSiz"               "samVar"               "sch"                  "schDat"              
## [196] "schPer"               "sd"                   "sd_pop"               "sd1"                  "sd2"                 
## [201] "sdPro"                "se"                   "secondAce"            "sel"                  "selGra"              
## [206] "selInc"               "selPar"               "selSch"               "shape1"               "shape2"              
## [211] "sigma"                "sigma0"               "sigma02"              "sigma2"               "sim"                 
## [216] "simRanSam"            "sk"                   "sk1"                  "sk2"                  "sp"                  
## [221] "stdPop"               "t"                    "t_crit"               "t_stat"               "ta"                  
## [226] "tab"                  "tabCol"               "tabMar"               "tabRow"               "testStat"            
## [231] "toss1"                "toss2"                "treatment"            "true_mean"            "u"                   
## [236] "u1"                   "u2"                   "upper"                "v"                    "va1"                 
## [241] "va2"                  "varKno"               "vecPro"               "verSco"               "weight"              
## [246] "winklerIntervalScore" "x"                    "x_future"             "x_sorted"             "x1"                  
## [251] "x2"                   "x3"                   "x4"                   "xbar"                 "xBar1"               
## [256] "xBar2"                "xm"                   "xmax"                 "xmin"                 "y"                   
## [261] "y_bar"                "y1"                   "y2"                   "y3"                   "y4"                  
## [266] "yLin"                 "yLog"                 "ymax"                 "ymin"                 "yQua"                
## [271] "ySqr"                 "z"                    "za"                   "zscore"
rm(x)       # remove x
ls()
##   [1] "a"                    "A"                    "actD"                 "Admission"            "age"                 
##   [6] "aLin"                 "aLog"                 "alp"                  "alpha"                "alt"                 
##  [11] "aQua"                 "aSqr"                 "b"                    "B"                    "bac"                 
##  [16] "biaSam"               "bLin"                 "bLog"                 "booD"                 "booPva"              
##  [21] "bQua"                 "bSqr"                 "c"                    "C"                    "calories"            
##  [26] "cards"                "chi_stat"             "ci_bounds"            "conInt"               "control"             
##  [31] "cov"                  "cQua"                 "d"                    "D"                    "dat"                 
##  [36] "deck"                 "den"                  "denLow"               "densityChiSquare"     "denUpp"              
##  [41] "df"                   "die1"                 "die2"                 "dif"                  "differences"         
##  [46] "E"                    "eduPar"               "eduPer"               "erI"                  "errors"              
##  [51] "exam_score"           "f"                    "final_grade"          "firstAce"             "fitLin"              
##  [56] "fitLog"               "fitQua"               "fitSqr"               "Fl"                   "flag"                
##  [61] "fLin"                 "fLog"                 "fQua"                 "Fr"                   "Fs"                  
##  [66] "fSqr"                 "Gender"               "gra"                  "grades"               "graPer"              
##  [71] "hei"                  "heiDisBiaSam"         "heiDisPop"            "heiDisRanSam"         "heiDisSimRanSam"     
##  [76] "heiRacMea"            "heiRacSd"             "i"                    "incLev"               "incPer"              
##  [81] "int"                  "j"                    "k"                    "knoVar"               "l"                   
##  [86] "len"                  "lm1"                  "lm2"                  "lm3"                  "lm4"                 
##  [91] "lower"                "M"                    "mat"                  "matSco"               "maxScoMat"           
##  [96] "maxScoVer"            "meaBiaSam"            "meaDisBiaSam"         "meaDisRanSam"         "meaDisSimRanSam"     
## [101] "means"                "meaPop"               "meaPro"               "meaRanSam"            "meaSimRanSam"        
## [106] "minScoMat"            "minScoVer"            "mu"                   "mu_null"              "mu_pop"              
## [111] "mu0"                  "mu1"                  "mu2"                  "muA"                  "n"                   
## [116] "N"                    "n_required"           "n1"                   "n2"                   "name"                
## [121] "namEduPar"            "namIncLev"            "namSch"               "nP"                   "null_means"          
## [126] "num_rep"              "numGra"               "numPeoRac"            "numRep"               "numSam"              
## [131] "numSch"               "numSel"               "numStu"               "numStuClu"            "numStuGra"           
## [136] "p"                    "p_value"              "p1"                   "p2"                   "pA"                  
## [141] "pA_exact"             "pAandB"               "pAandB_exact"         "pAc"                  "pAorB"               
## [146] "pB"                   "pB_exact"             "pBgivenA"             "pC"                   "pCorD"               
## [151] "pD"                   "pop"                  "population"           "powTte"               "powZte"              
## [156] "pro"                  "pro50"                "proInt"               "proRej"               "pval"                
## [161] "pVal"                 "qua"                  "quick_summary"        "r"                    "R"                   
## [166] "rac"                  "racNam"               "ranSam"               "reaAlc"               "reaBac"              
## [171] "rej"                  "rejBoo"               "rejMat"               "rejPer"               "rejPoo"              
## [176] "rejTre"               "rep"                  "resLin"               "resLog"               "resQua"              
## [181] "resSqr"               "s"                    "S"                    "s1"                   "s2"                  
## [186] "s21"                  "s22"                  "sample_data"          "sample_means"         "sample1"             
## [191] "sample2"              "samSiz"               "samVar"               "sch"                  "schDat"              
## [196] "schPer"               "sd"                   "sd_pop"               "sd1"                  "sd2"                 
## [201] "sdPro"                "se"                   "secondAce"            "sel"                  "selGra"              
## [206] "selInc"               "selPar"               "selSch"               "shape1"               "shape2"              
## [211] "sigma"                "sigma0"               "sigma02"              "sigma2"               "sim"                 
## [216] "simRanSam"            "sk"                   "sk1"                  "sk2"                  "sp"                  
## [221] "stdPop"               "t"                    "t_crit"               "t_stat"               "ta"                  
## [226] "tab"                  "tabCol"               "tabMar"               "tabRow"               "testStat"            
## [231] "toss1"                "toss2"                "treatment"            "true_mean"            "u"                   
## [236] "u1"                   "u2"                   "upper"                "v"                    "va1"                 
## [241] "va2"                  "varKno"               "vecPro"               "verSco"               "weight"              
## [246] "winklerIntervalScore" "x_future"             "x_sorted"             "x1"                   "x2"                  
## [251] "x3"                   "x4"                   "xbar"                 "xBar1"                "xBar2"               
## [256] "xm"                   "xmax"                 "xmin"                 "y"                    "y_bar"               
## [261] "y1"                   "y2"                   "y3"                   "y4"                   "yLin"                
## [266] "yLog"                 "ymax"                 "ymin"                 "yQua"                 "ySqr"                
## [271] "z"                    "za"                   "zscore"

2.6.5 Getting Package Help

Some functions come from additional packages, which extend base R. If you use a package, load it first (if installed).

# install.packages("ggplot2")  # run once if needed
library(ggplot2)

?ggplot

2.7 Why R for Statistics

In this course, we will use R to:

  • summarize data
  • visualize patterns
  • perform statistical analysis

In the following sections, we will introduce statistical concepts and use R to implement them.