2 Introduction to R

These notes introduce core R concepts you’ll use throughout the course.

Each section includes short explanations and runnable code.

Tip: You can run code line-by-line in RStudio with Ctrl+Enter (Windows) / Cmd+Enter (Mac).


2.1 Basic R Operations and Concepts

R works like a powerful calculator and a programming language. You type commands into the Console, and R evaluates them.


2.1.1 The Console and Scripts

  • Console: where commands run immediately.
  • Script (.R) / R Markdown (.Rmd): where you write reproducible work you can save and re-run.

In this course, most notes will be written in R Markdown (.Rmd), which combines text, code, and output in one document.


2.1.2 Comments

Use # to write comments. R ignores them.

# This is a comment
2 + 2
## [1] 4

2.1.3 Printing and Output

Typing an object name prints it. You can also use print().

x <- 10
x
## [1] 10
print(x)
## [1] 10

2.2 Arithmetic

R supports standard arithmetic operators:

  • Addition: +
  • Subtraction: -
  • Multiplication: *
  • Division: /
  • Exponentiation: ^
  • Integer division: %/%
  • Remainder (mod): %%
# Basic arithmetic
5 + 3
## [1] 8
10 - 4
## [1] 6
6 * 7
## [1] 42
20 / 5
## [1] 4
# Powers
2^5
## [1] 32
# Integer division and remainder
17 %/% 3
## [1] 5
17 %% 3
## [1] 2

These commands return numeric output immediately in the console.


2.2.1 Order of Operations

R follows standard order of operations. Use parentheses to be explicit.

3 + 2 * 5      # multiplication first
## [1] 13
(3 + 2) * 5    # parentheses first
## [1] 25

2.2.2 Logical Comparisons

These return TRUE or FALSE.

5 > 3
## [1] TRUE
5 == 3
## [1] FALSE
5 != 3
## [1] TRUE
5 >= 5
## [1] TRUE

2.3 Assignment, Object Names, and Data Types

2.3.1 Assignment

Use <- (common in R code) or =.

a <- 5
b = 12
a + b
## [1] 17

2.3.2 Object Names

Good names are readable and informative.

Valid examples

  • height
  • exam_score
  • x1

Avoid

  • spaces (exam score)
  • starting with numbers (1st)
  • reserved words (if, for, TRUE, FALSE)
exam_score <- 92
final_grade <- "A"

2.3.3 Common Data Types

2.3.3.1 Numeric (double)

x <- 3.14
typeof(x)
## [1] "double"
class(x)
## [1] "numeric"

2.3.3.2 Integer

y <- 7L
typeof(y)
## [1] "integer"
class(y)
## [1] "integer"

2.3.3.3 Character (strings)

name <- "Rene"
typeof(name)
## [1] "character"
class(name)
## [1] "character"

2.3.3.4 Logical

flag <- TRUE
typeof(flag)
## [1] "logical"
class(flag)
## [1] "logical"

2.3.4 Missing Values

R uses NA for missing data.

z <- c(1, NA, 3)
z
## [1]  1 NA  3
is.na(z)
## [1] FALSE  TRUE FALSE

NA affects calculations unless you remove missing values.

mean(z)                 # returns NA
## [1] NA
mean(z, na.rm = TRUE)   # removes NA before computing mean
## [1] 2

2.4 Vectors

Definition 2.1 (Vector in R) A vector is a collection of values of the same type stored in a single object in R.

Vectors are the basic building blocks in R: an ordered collection of values.

2.4.1 Creating Vectors with c()

v <- c(2, 4, 6, 8)
v
## [1] 2 4 6 8

2.4.2 Vector Length

length(v)
## [1] 4

2.4.3 Vector Arithmetic (Vectorized Operations)

Operations apply element-by-element.

v + 1
## [1] 3 5 7 9
v * 2
## [1]  4  8 12 16
v^2
## [1]  4 16 36 64

2.4.4 Sequences with : and seq()

1:10
##  [1]  1  2  3  4  5  6  7  8  9 10
seq(from = 0, to = 1, by = 0.2)
## [1] 0.0 0.2 0.4 0.6 0.8 1.0
seq(from = 1, to = 10, length.out = 5)
## [1]  1.00  3.25  5.50  7.75 10.00

2.4.5 Repetition with rep()

rep(5, times = 4)
## [1] 5 5 5 5
rep(c("A", "B"), times = 3)
## [1] "A" "B" "A" "B" "A" "B"
rep(1:3, each = 2)
## [1] 1 1 2 2 3 3

2.4.6 Indexing Vectors

2.4.6.1 By position

v <- c(10, 20, 30, 40, 50)
v[1]      # first element
## [1] 10
v[3]      # third element
## [1] 30
v[c(2,5)] # second and fifth
## [1] 20 50

2.4.6.2 By negative indexing (remove elements)

v[-1]     # all but first
## [1] 20 30 40 50
v[-c(2,4)]
## [1] 10 30 50

2.4.6.3 By logical indexing

v[v > 25]     # keep values greater than 25
## [1] 30 40 50
v[v == 20]    # values equal to 20
## [1] 20

2.4.7 Named Vectors

Names make code easier to read.

grades <- c(Midterm = 88, Final = 93, Project = 90)
grades
## Midterm   Final Project 
##      88      93      90
grades["Final"]
## Final 
##    93

2.5 Functions and Expressions

Definition 2.2 (Function) A function is a set of instructions that takes inputs and returns an output.

R has many built-in functions: mean(), sum(), sd(), etc.

x <- c(2, 4, 6, 8, 10)

sum(x)
## [1] 30
mean(x)
## [1] 6
sd(x)
## [1] 3.162278
min(x)
## [1] 2
max(x)
## [1] 10

2.5.1 Function arguments

Arguments control how the function behaves.

round(3.14159, digits = 2)
## [1] 3.14
mean(c(1, NA, 3), na.rm = TRUE)
## [1] 2

2.5.2 Expressions and Nesting

You can combine (nest) functions.

x <- c(1, 2, 3, 4, 5)
sqrt(sum(x^2))
## [1] 7.416198

2.5.3 Creating Your Own Function

Use function() to define reusable code.

# A simple function: compute z-scores
zscore <- function(x) {
  (x - mean(x)) / sd(x)
}

zscore(c(10, 12, 15, 20))
## [1] -0.9771621 -0.5173211  0.1724404  1.3220429
quick_summary <- function(x) {
  c(
    n = length(x),
    mean = mean(x, na.rm = TRUE),
    sd = sd(x, na.rm = TRUE),
    min = min(x, na.rm = TRUE),
    max = max(x, na.rm = TRUE)
  )
}

quick_summary(c(1, 2, 3, NA, 5))
##        n     mean       sd      min      max 
## 5.000000 2.750000 1.707825 1.000000 5.000000

2.6 Getting Help

R has excellent built-in help tools.

2.6.1 Help Pages

Use ? or help().

?mean
help(sd)

2.6.2 Examples in Help Files

Many help pages include examples you can run:

example(mean)
## 
## mean> x <- c(0:10, 50)
## 
## mean> xm <- mean(x)
## 
## mean> c(xm, mean(x, trim = 0.10))
## [1] 8.75 5.50

2.6.3 Searching for a Function

If you don’t know the exact name, use help.search() or ??.

??regression
help.search("histogram")

2.6.4 Inspecting Objects

x <- rnorm(5)
str(x)
##  num [1:5] 3.441 -2.343 1.17 -0.499 -0.159
class(x)
## [1] "numeric"
typeof(x)
## [1] "double"
ls()        # list objects in your environment
##   [1] "a"                "A"                "age"              "alp"              "alpha"            "altVar"           "appMea"           "appMeaLin"       
##   [9] "b"                "B"                "bac"              "biaSam"           "booMea"           "booPva"           "booSam"           "booSD"           
##  [17] "boot_means"       "booT025"          "booT975"          "booTst"           "breaks"           "c"                "C"                "calories"        
##  [25] "cards"            "ci_bounds"        "conInt"           "control"          "count"            "cov"              "cover"            "d"               
##  [33] "D"                "d1"               "d2"               "data"             "deck"             "densityChiSquare" "df"               "die1"            
##  [41] "die2"             "dieRol"           "differences"      "dirVar"           "disease"          "E"                "eduPar"           "eduPer"          
##  [49] "errors"           "exam_score"       "exaMeaBin"        "exaMeaLin"        "f"                "final_grade"      "firstAce"         "flag"            
##  [57] "gra"              "grades"           "graPer"           "griSta"           "hei"              "heiDisBiaSam"     "heiDisPop"        "heiDisRanSam"    
##  [65] "heiDisSimRanSam"  "heiRacMea"        "heiRacSd"         "i"                "incLev"           "incPer"           "j"                "k"               
##  [73] "l"                "lower"            "lowUpp"           "M"                "matSco"           "matScoBin"        "maxScoMat"        "maxScoVer"       
##  [81] "mea0"             "mea1"             "mea2"             "meaBiaSam"        "meaDisBiaSam"     "meaDisRanSam"     "meaDisSimRanSam"  "means"           
##  [89] "meaPop"           "meaRanSam"        "meaSimRanSam"     "meaX"             "meaX2"            "medians"          "minScoMat"        "minScoVer"       
##  [97] "mu"               "mu_null"          "mu_pop"           "mu0"              "muA"              "n"                "n_required"       "n1"              
## [105] "n2"               "name"             "namEduPar"        "namIncLev"        "namSch"           "null_means"       "num_rep"          "numDf"           
## [113] "numGra"           "numPeoRac"        "numRep"           "numSam"           "numSch"           "numSel"           "numStu"           "numStuClu"       
## [121] "numStuGra"        "obsAlc"           "obsErr"           "p"                "p_value"          "pA"               "pA_exact"         "pAandB"          
## [129] "pAandB_exact"     "pAc"              "pAorB"            "pB"               "pB_exact"         "pBgivenA"         "pC"               "pCorD"           
## [137] "pD"               "pmf"              "pop"              "population"       "populationGrowth" "powTte"           "powTte025"        "powTte975"       
## [145] "powZte"           "pro"              "pro1"             "pro12"            "pro2"             "pro50"            "proCon"           "proInf"          
## [153] "proInt"           "proMar"           "proTab"           "qua"              "quick_summary"    "R"                "rac"              "racNam"          
## [161] "ranks"            "ranSam"           "reaAlc"           "reaBac"           "rej"              "rejMat"           "rejPer"           "relFreHea"       
## [169] "rep"              "repTwo"           "s"                "S"                "s1"               "s2"               "samCoi"           "sampA"           
## [177] "sampB"            "sample_boot"      "sample_data"      "sample_means"     "sample1"          "sample2"          "samPop"           "samSiz"          
## [185] "samVar"           "sch"              "schDat"           "schPer"           "sd"               "sd_pop"           "sdx"              "se"              
## [193] "secondAce"        "sel"              "sel1"             "sel2"             "selGra"           "selInc"           "selPar"           "selSch"          
## [201] "sen"              "sigma"            "simRanSam"        "sims"             "spe"              "stdPop"           "suits"            "t"               
## [209] "t_crit"           "t_stat"           "ta"               "test"             "toss1"            "toss2"            "treatment"        "trimmed"         
## [217] "true_mean"        "u"                "u1"               "u2"               "upper"            "v"                "var"              "varKno"          
## [225] "varX"             "varY"             "verSco"           "weight"           "x"                "X"                "x_bar"            "x_future"        
## [233] "x_sorted"         "x2"               "xbar"             "xBar"             "xm"               "xmax"             "xmin"             "xSe"             
## [241] "y"                "Y"                "y_bar"            "y_centered"       "ybar"             "ymax"             "ymin"             "yval"            
## [249] "z"                "za"               "zlow"             "zscore"           "zupp"
rm(x)       # remove x
ls()
##   [1] "a"                "A"                "age"              "alp"              "alpha"            "altVar"           "appMea"           "appMeaLin"       
##   [9] "b"                "B"                "bac"              "biaSam"           "booMea"           "booPva"           "booSam"           "booSD"           
##  [17] "boot_means"       "booT025"          "booT975"          "booTst"           "breaks"           "c"                "C"                "calories"        
##  [25] "cards"            "ci_bounds"        "conInt"           "control"          "count"            "cov"              "cover"            "d"               
##  [33] "D"                "d1"               "d2"               "data"             "deck"             "densityChiSquare" "df"               "die1"            
##  [41] "die2"             "dieRol"           "differences"      "dirVar"           "disease"          "E"                "eduPar"           "eduPer"          
##  [49] "errors"           "exam_score"       "exaMeaBin"        "exaMeaLin"        "f"                "final_grade"      "firstAce"         "flag"            
##  [57] "gra"              "grades"           "graPer"           "griSta"           "hei"              "heiDisBiaSam"     "heiDisPop"        "heiDisRanSam"    
##  [65] "heiDisSimRanSam"  "heiRacMea"        "heiRacSd"         "i"                "incLev"           "incPer"           "j"                "k"               
##  [73] "l"                "lower"            "lowUpp"           "M"                "matSco"           "matScoBin"        "maxScoMat"        "maxScoVer"       
##  [81] "mea0"             "mea1"             "mea2"             "meaBiaSam"        "meaDisBiaSam"     "meaDisRanSam"     "meaDisSimRanSam"  "means"           
##  [89] "meaPop"           "meaRanSam"        "meaSimRanSam"     "meaX"             "meaX2"            "medians"          "minScoMat"        "minScoVer"       
##  [97] "mu"               "mu_null"          "mu_pop"           "mu0"              "muA"              "n"                "n_required"       "n1"              
## [105] "n2"               "name"             "namEduPar"        "namIncLev"        "namSch"           "null_means"       "num_rep"          "numDf"           
## [113] "numGra"           "numPeoRac"        "numRep"           "numSam"           "numSch"           "numSel"           "numStu"           "numStuClu"       
## [121] "numStuGra"        "obsAlc"           "obsErr"           "p"                "p_value"          "pA"               "pA_exact"         "pAandB"          
## [129] "pAandB_exact"     "pAc"              "pAorB"            "pB"               "pB_exact"         "pBgivenA"         "pC"               "pCorD"           
## [137] "pD"               "pmf"              "pop"              "population"       "populationGrowth" "powTte"           "powTte025"        "powTte975"       
## [145] "powZte"           "pro"              "pro1"             "pro12"            "pro2"             "pro50"            "proCon"           "proInf"          
## [153] "proInt"           "proMar"           "proTab"           "qua"              "quick_summary"    "R"                "rac"              "racNam"          
## [161] "ranks"            "ranSam"           "reaAlc"           "reaBac"           "rej"              "rejMat"           "rejPer"           "relFreHea"       
## [169] "rep"              "repTwo"           "s"                "S"                "s1"               "s2"               "samCoi"           "sampA"           
## [177] "sampB"            "sample_boot"      "sample_data"      "sample_means"     "sample1"          "sample2"          "samPop"           "samSiz"          
## [185] "samVar"           "sch"              "schDat"           "schPer"           "sd"               "sd_pop"           "sdx"              "se"              
## [193] "secondAce"        "sel"              "sel1"             "sel2"             "selGra"           "selInc"           "selPar"           "selSch"          
## [201] "sen"              "sigma"            "simRanSam"        "sims"             "spe"              "stdPop"           "suits"            "t"               
## [209] "t_crit"           "t_stat"           "ta"               "test"             "toss1"            "toss2"            "treatment"        "trimmed"         
## [217] "true_mean"        "u"                "u1"               "u2"               "upper"            "v"                "var"              "varKno"          
## [225] "varX"             "varY"             "verSco"           "weight"           "X"                "x_bar"            "x_future"         "x_sorted"        
## [233] "x2"               "xbar"             "xBar"             "xm"               "xmax"             "xmin"             "xSe"              "y"               
## [241] "Y"                "y_bar"            "y_centered"       "ybar"             "ymax"             "ymin"             "yval"             "z"               
## [249] "za"               "zlow"             "zscore"           "zupp"

2.6.5 Getting Package Help

Some functions come from additional packages, which extend base R. If you use a package, load it first (if installed).

# install.packages("ggplot2")  # run once if needed
library(ggplot2)

?ggplot

2.7 Why R for Statistics

In this course, we will use R to:

  • summarize data
  • visualize patterns
  • perform statistical analysis

In the following sections, we will introduce statistical concepts and use R to implement them.