1 Stat 5428/6428 Spring 2026

This is the website for STAT 5428/6428. It will contain the relevant information for the course and lecture notes and code seen in class.

Find the syllabus here: Syllabus

Find the github page for the course (with the data sets) here: Github

Find the

1.1 Weekly Course Outline

Introduction to Statistical Thinking and Data Analysis
Role of statistics in research; types of data; study design; observational versus experimental studies; introduction to statistical software (R).
Descriptive Statistics and Exploratory Data Analysis (EDA)
Numerical summaries; measures of center and variability; graphical methods; data visualization; exploratory data analysis workflow.
Probability Concepts and Random Variables
Basic probability rules; discrete and continuous random variables; common distributions; expectation and variance.
Sampling Distributions and Simulation
Concept of sampling distributions; Central Limit Theorem; simulation‑based illustration; intuition for confidence levels and statistical power.
Statistical Inference I: Estimation and Confidence Intervals
Point estimation; confidence intervals for means and proportions; interpretation and limitations; assumptions and robustness.
Statistical Inference II: Hypothesis Testing
Null and alternative hypotheses; test statistics; p‑values; Type I/II errors; power considerations and sample size intuition.
Inference for Two Populations
Two‑sample CIs and tests; paired designs; comparing means and variances; parametric vs. nonparametric choices.
Analysis of Variance (ANOVA)
One‑way ANOVA; model formulation; assumptions; multiple comparisons; practical interpretation and reporting.
Analysis of Covariance (ANCOVA)
Incorporating quantitative covariates; adjusted means; interpretation; assumption checks and diagnostics.
Simple and Multiple Linear Regression
Least squares estimation; interpretation of coefficients; model fit; inference for regression parameters.
Regression Diagnostics and Model Assessment
Residual analysis; outliers and influence; multicollinearity; remedial measures (transformations, variable selection).
Logistic Regression and Categorical Data Analysis
Binary response models; odds, odds ratios, and log‑odds; goodness‑of‑fit; interpreting software output.
Nonparametric Methods and Alternatives
Rank‑based tests and other nonparametric procedures; when and why to choose nonparametric approaches.
Experimental Design and Model Selection
Principles of randomization, blocking, replication; power and sample size concepts; model selection strategies; reproducibility.
Integration, Applications, and Communication of Results
Case studies with real datasets; end‑to‑end analysis; ethical considerations; transparent reporting; course review.

1.2 Topical Learning Outcomes (Aligned with Bloom’s Taxonomy)

Topic	Bloom’s Level	Learning Outcome
Descriptive Statistics	Understand / Apply	Summarize and visualize datasets using appropriate numerical and graphical descriptive measures.
Probability Concepts	Understand	Explain fundamental probability concepts and their role in statistical modeling and inference.
Sampling Distributions	Analyze	Analyze sampling variability and illustrate key ideas via simulation.
Confidence Intervals	Apply / Analyze	Construct and interpret confidence intervals; assess assumptions and practical implications.
Hypothesis Testing	Analyze / Evaluate	Evaluate hypotheses using appropriate tests and interpret results in context.
Two‑Sample Inference	Apply / Analyze	Compare two populations using parametric and nonparametric procedures.
ANOVA / ANCOVA	Analyze	Analyze group differences with ANOVA/ANCOVA, incorporating covariates where appropriate.
Linear Regression	Apply / Analyze	Develop and interpret simple and multiple linear regression models.
Model Diagnostics	Evaluate	Assess model assumptions; identify influence and multicollinearity; implement remedial actions.
Logistic Regression	Analyze	Model binary outcomes; interpret odds ratios and fitted model diagnostics.
Nonparametric Methods	Evaluate	Justify and apply nonparametric alternatives when assumptions are violated.
Experimental Design	Create / Evaluate	Design and critique basic experiments (randomization, blocking, replication).
Statistical Software (R)	Apply / Create	Use R to perform analyses, run simulations, interpret output, and communicate results clearly.

1.3 Course Overview

Introduction to Statistical Analysis is a graduate‑level course that develops statistical thinking as an integrated process of description, modeling, inference, and revision. The course is structured around the view that statistical analysis is not a linear application of techniques, but an iterative framework in which data are explored, probabilistic assumptions are imposed, conclusions are drawn under those assumptions, and models are reassessed in light of empirical evidence. Emphasis is placed on conceptual understanding, methodological coherence, and interpretation, rather than on formal mathematical derivations.

The central organizing principle of the course is the relationship between data and probability models. Data are treated as realizations from an underlying stochastic process, and statistical methods are developed as tools for learning about that process. Throughout the course, students use statistical software (R) to connect theory with practice and to evaluate the adequacy of models when applied to real data.

The course is organized into four major sections, each corresponding to a distinct stage of statistical reasoning.

1.3.1 Section I: Data — Description and Intrinsic Properties

Purpose: Characterizing data without imposing a probabilistic model

The first section focuses on understanding data in its raw form. Prior to modeling or inference, data must be described, explored, and contextualized. This section emphasizes the intrinsic properties of data that can be observed directly, without reference to stochastic assumptions.

Key objectives include identifying structure, detecting anomalies, and understanding variability as an empirical phenomenon.

Topics in this section include: - Nature and structure of data (measurement scales, data types) - Observational versus experimental data - Study design and sources of variation - Descriptive statistics (location, dispersion, shape) - Graphical summaries and data visualization - Exploratory Data Analysis (EDA)

This section establishes descriptive analysis as a critical and non‑optional stage of any sound statistical investigation.

1.3.2 Section II: Data Modeling — Probability as a Generative Mechanism

Purpose: Modeling data as realizations of a stochastic process

In the second section, probability is introduced as a modeling language for data generation. Rather than focusing on probability as an abstract mathematical subject, it is framed as a tool for encoding assumptions about how data arise.

Students study theoretical distributions and sampling behavior implied by different probabilistic models, and they examine the consequences of these assumptions.

Topics in this section include: - Basic probability concepts and axioms - Random variables and probability distributions - Discrete and continuous models - Expectation and variance - Sampling distributions - The Central Limit Theorem - Simulation‑based exploration of theoretical results

This section provides the probabilistic foundation upon which all subsequent inferential procedures are built.

1.3.3 Section III: Statistical Inference — Learning from Data Under a Model

Purpose: Drawing conclusions using probabilistic models

The third section develops formal statistical inference as a logical extension of probability modeling. Conditional on an assumed data‑generating mechanism, students derive and apply methods for estimation, hypothesis testing, and uncertainty quantification.

Inference is presented as fundamentally model‑dependent, and interpretation is emphasized over mechanical computation.

Topics in this section include: - Point estimation and estimators - Confidence intervals and coverage interpretation - Hypothesis testing frameworks - Test statistics and p‑values - Type I and Type II errors - Power and sample size considerations - Inference for one and two populations - Introduction to ANOVA as a comparative inferential framework

Students learn to answer substantive research questions while explicitly acknowledging the assumptions that justify their conclusions.

1.3.4 Section IV: Data Remodeling — Model Assessment, Revision, and Alternatives

Purpose: Validating conclusions and revisiting assumptions

The final section returns to the modeling stage with a critical perspective. After conducting inference, students evaluate whether the probabilistic assumptions underlying their analyses are empirically reasonable.

Model adequacy is assessed using diagnostics, and alternative approaches are considered when assumptions fail. This section emphasizes that statistical modeling is iterative and subject to revision rather than a one‑time choice.

Topics in this section include: - Linear and multiple regression models - Logistic regression and categorical responses - Residual analysis and goodness‑of‑fit diagnostics - Influential observations and multicollinearity - Remedial measures and transformations - Nonparametric alternatives - Model selection strategies - Interpretation under updated or revised assumptions

This section reinforces the idea that statistical conclusions are conditional statements whose validity rests on the suitability of the modeling framework.

1.4 Course Perspective

Across all four sections, the course emphasizes coherence between data, models, inference, and validation. By the end of the course, students will be able to approach data analysis as a structured yet flexible process—one that integrates empirical evidence, probabilistic reasoning, and critical evaluation in order to draw defensible conclusions from data.

STAT 5428/6428 Spring 2026