STAT 5428/6428 Spring 2026
1 Stat 5428/6428 Spring 2026
This is the website for STAT 5428/6428. It will contain the relevant information for the course and lecture notes and code seen in class.
Find the syllabus here: Syllabus
Find the github page for the course (with the data sets) here: Github
Find the
1.1 Weekly Course Outline
Introduction to Statistical Thinking and Data Analysis
Role of statistics in research; types of data; study design; observational versus experimental studies; introduction to statistical software (R).Descriptive Statistics and Exploratory Data Analysis (EDA)
Numerical summaries; measures of center and variability; graphical methods; data visualization; exploratory data analysis workflow.Probability Concepts and Random Variables
Basic probability rules; discrete and continuous random variables; common distributions; expectation and variance.Sampling Distributions and Simulation
Concept of sampling distributions; Central Limit Theorem; simulation‑based illustration; intuition for confidence levels and statistical power.Statistical Inference I: Estimation and Confidence Intervals
Point estimation; confidence intervals for means and proportions; interpretation and limitations; assumptions and robustness.Statistical Inference II: Hypothesis Testing
Null and alternative hypotheses; test statistics; p‑values; Type I/II errors; power considerations and sample size intuition.Inference for Two Populations
Two‑sample CIs and tests; paired designs; comparing means and variances; parametric vs. nonparametric choices.Analysis of Variance (ANOVA)
One‑way ANOVA; model formulation; assumptions; multiple comparisons; practical interpretation and reporting.Analysis of Covariance (ANCOVA)
Incorporating quantitative covariates; adjusted means; interpretation; assumption checks and diagnostics.Simple and Multiple Linear Regression
Least squares estimation; interpretation of coefficients; model fit; inference for regression parameters.Regression Diagnostics and Model Assessment
Residual analysis; outliers and influence; multicollinearity; remedial measures (transformations, variable selection).Logistic Regression and Categorical Data Analysis
Binary response models; odds, odds ratios, and log‑odds; goodness‑of‑fit; interpreting software output.Nonparametric Methods and Alternatives
Rank‑based tests and other nonparametric procedures; when and why to choose nonparametric approaches.Experimental Design and Model Selection
Principles of randomization, blocking, replication; power and sample size concepts; model selection strategies; reproducibility.Integration, Applications, and Communication of Results
Case studies with real datasets; end‑to‑end analysis; ethical considerations; transparent reporting; course review.
1.2 Topical Learning Outcomes (Aligned with Bloom’s Taxonomy)
| Topic | Bloom’s Level | Learning Outcome |
|---|---|---|
| Descriptive Statistics | Understand / Apply | Summarize and visualize datasets using appropriate numerical and graphical descriptive measures. |
| Probability Concepts | Understand | Explain fundamental probability concepts and their role in statistical modeling and inference. |
| Sampling Distributions | Analyze | Analyze sampling variability and illustrate key ideas via simulation. |
| Confidence Intervals | Apply / Analyze | Construct and interpret confidence intervals; assess assumptions and practical implications. |
| Hypothesis Testing | Analyze / Evaluate | Evaluate hypotheses using appropriate tests and interpret results in context. |
| Two‑Sample Inference | Apply / Analyze | Compare two populations using parametric and nonparametric procedures. |
| ANOVA / ANCOVA | Analyze | Analyze group differences with ANOVA/ANCOVA, incorporating covariates where appropriate. |
| Linear Regression | Apply / Analyze | Develop and interpret simple and multiple linear regression models. |
| Model Diagnostics | Evaluate | Assess model assumptions; identify influence and multicollinearity; implement remedial actions. |
| Logistic Regression | Analyze | Model binary outcomes; interpret odds ratios and fitted model diagnostics. |
| Nonparametric Methods | Evaluate | Justify and apply nonparametric alternatives when assumptions are violated. |
| Experimental Design | Create / Evaluate | Design and critique basic experiments (randomization, blocking, replication). |
| Statistical Software (R) | Apply / Create | Use R to perform analyses, run simulations, interpret output, and communicate results clearly. |
1.3 Course Overview
Introduction to Statistical Analysis is a graduate‑level course that develops statistical thinking as an integrated process of description, modeling, inference, and revision. The course is structured around the view that statistical analysis is not a linear application of techniques, but an iterative framework in which data are explored, probabilistic assumptions are imposed, conclusions are drawn under those assumptions, and models are reassessed in light of empirical evidence. Emphasis is placed on conceptual understanding, methodological coherence, and interpretation, rather than on formal mathematical derivations.
The central organizing principle of the course is the relationship between data and probability models. Data are treated as realizations from an underlying stochastic process, and statistical methods are developed as tools for learning about that process. Throughout the course, students use statistical software (R) to connect theory with practice and to evaluate the adequacy of models when applied to real data.
The course is organized into four major sections, each corresponding to a distinct stage of statistical reasoning.
1.3.1 Section I: Data — Description and Intrinsic Properties
Purpose: Characterizing data without imposing a probabilistic model
The first section focuses on understanding data in its raw form. Prior to modeling or inference, data must be described, explored, and contextualized. This section emphasizes the intrinsic properties of data that can be observed directly, without reference to stochastic assumptions.
Key objectives include identifying structure, detecting anomalies, and understanding variability as an empirical phenomenon.
Topics in this section include: - Nature and structure of data (measurement scales, data types) - Observational versus experimental data - Study design and sources of variation - Descriptive statistics (location, dispersion, shape) - Graphical summaries and data visualization - Exploratory Data Analysis (EDA)
This section establishes descriptive analysis as a critical and non‑optional stage of any sound statistical investigation.
1.3.2 Section II: Data Modeling — Probability as a Generative Mechanism
Purpose: Modeling data as realizations of a stochastic process
In the second section, probability is introduced as a modeling language for data generation. Rather than focusing on probability as an abstract mathematical subject, it is framed as a tool for encoding assumptions about how data arise.
Students study theoretical distributions and sampling behavior implied by different probabilistic models, and they examine the consequences of these assumptions.
Topics in this section include: - Basic probability concepts and axioms - Random variables and probability distributions - Discrete and continuous models - Expectation and variance - Sampling distributions - The Central Limit Theorem - Simulation‑based exploration of theoretical results
This section provides the probabilistic foundation upon which all subsequent inferential procedures are built.
1.3.3 Section III: Statistical Inference — Learning from Data Under a Model
Purpose: Drawing conclusions using probabilistic models
The third section develops formal statistical inference as a logical extension of probability modeling. Conditional on an assumed data‑generating mechanism, students derive and apply methods for estimation, hypothesis testing, and uncertainty quantification.
Inference is presented as fundamentally model‑dependent, and interpretation is emphasized over mechanical computation.
Topics in this section include: - Point estimation and estimators - Confidence intervals and coverage interpretation - Hypothesis testing frameworks - Test statistics and p‑values - Type I and Type II errors - Power and sample size considerations - Inference for one and two populations - Introduction to ANOVA as a comparative inferential framework
Students learn to answer substantive research questions while explicitly acknowledging the assumptions that justify their conclusions.
1.3.4 Section IV: Data Remodeling — Model Assessment, Revision, and Alternatives
Purpose: Validating conclusions and revisiting assumptions
The final section returns to the modeling stage with a critical perspective. After conducting inference, students evaluate whether the probabilistic assumptions underlying their analyses are empirically reasonable.
Model adequacy is assessed using diagnostics, and alternative approaches are considered when assumptions fail. This section emphasizes that statistical modeling is iterative and subject to revision rather than a one‑time choice.
Topics in this section include: - Linear and multiple regression models - Logistic regression and categorical responses - Residual analysis and goodness‑of‑fit diagnostics - Influential observations and multicollinearity - Remedial measures and transformations - Nonparametric alternatives - Model selection strategies - Interpretation under updated or revised assumptions
This section reinforces the idea that statistical conclusions are conditional statements whose validity rests on the suitability of the modeling framework.
1.4 Course Perspective
Across all four sections, the course emphasizes coherence between data, models, inference, and validation. By the end of the course, students will be able to approach data analysis as a structured yet flexible process—one that integrates empirical evidence, probabilistic reasoning, and critical evaluation in order to draw defensible conclusions from data.