# General Course Descriptions

** 6301. Introduction to Statistical Computing: **This course is designed for students who seek to develop skills in statistical computing using the R programming language. STATA for statistical analysis will be introduced briefly. Students will learn to use R for data manipulation, reporting generating, data presentation, and data tabulation and summarization. Topics will include organization and documentation of data, input and export of data sets, methods of cleaning data, tabulation and graphing of data, programming capabilities, and an introduction to simulations and bootstrapping. Students will also be introduced to LaTeX, Markdown and knitr for report writing. FALL. [2] Beck and Greevy

**6306. Introduction to Study Design:** This course will introduce principles of study design in medical and health statistics. The designs considered will be case series, ecologic studies, matched and unmatched case-control studies, observational cohort stud ies, historically controlled clinical trials, screening trials and randomized clinical trials. The goal is to introduce critical design challenges that ultimately impact the ability to make statistical inferences from observed samples to the target populations. Concepts such as internal and external validity, bias identification and control, confounding and effect modification will be discussed and illustrated with examples from the medical literature. The dependence of traditional univariate measures of statistical association (absolute risk, relative risk and odds ratios) on critical design elements will be highlighted. Statistical evaluation of diagnostic tests will also be introduced along with a brief introduction to causal inference. Permission of instructor required. Prerequisite: Access to STATA statistical software. Fall. [3] Dupont

**6311 & 6311L. Principles of Modern Biostatistics: **Principles of Modern Biostatistics is a foundational first course in graduate level statistics designed to develop a richer understanding of one- and two-sample statistical methods and statistical philosophies. It explores the operational characteristics of frequently used statistical methods. Through simulation studies conducted in R and STATA, students will explore questions such as: What are the true coverage rates of commonly used confidence interval methods for proportions? What is the impact of sampling from various non-normal distributions on the true Type I Error rate for hypothesis testing methods? How do various testing methods compare in terms of power in a variety of settings? How do traditional hypothesis testing methods compare and contrast with methods in the Bayesian and Likelihoodist paradigms? This course is intended for graduate students in programs for biostatistics, biomedical informatics, and epidemiology, and by students in other programs who have a strong undergraduate-level background in statistics. Lab required [1]. Prerequisite: Calculus I. Fall. [3]. Vandekar.

**6312 & 6312L. Modern Regression Analysis: **This is the second in a two-course series designed for students who seek to develop skills in modern biostatistical reasoning and data analysis. Students learn modern regression analysis and modeling building techniques from an applied perspective. Theoretical principles will be demonstrated with real-world examples from biomedical studies. This course requires substantial statistical computing in software packages STATA and/or R. The course covers regression modeling for continuous outcomes, including simple linear regression, multiple linear regression, and analysis of variance with one-way, two-way, multi-way, and analysis of covariance models. Data types to be modeled include continuous outcomes (classic regression models), binary outcomes (logistic models), ordinal outcomes (proportional odds models), count outcomes (Poisson/negative binomial models), and time to event outcomes (Kaplan-Meier curves, Cox proportional hazard modeling). Incorporated into the presentation of these models are subtopic topics such as regression diagnostics, nonparametric regression, splines, data reduction techniques, model validation, parametric bootstrapping, and methods for handling missing data. Lab required [1]. Prerequisite: BIOS 6311 or equivalent. Spring. [3]. Johnson.

**6321. Clinical Trials and Experimental Design: **This course covers the statistical aspects of study designs, monitoring, and analysis. Emphasis is on studies of human subjects, i.e. clinical trials. Topics include: principles of measurement, selection of endpoints, bias, masking, randomization and balance, blocking, study designs, sample size projections, interim monitoring of accumulating results, flexible and adaptive designs, sequential analysis, analysis principles, data and safety monitoring boards (DSMB), and the ethics of animal and human subject experimentation. Spring [3] (Yu)

**6341 & 6341L. Fundamentals of Probability:** This is the first in a two-course series designed to introduce the fundamentals of statistical probability and inference. Students learn probability theory and its application to everyday statistical concepts and analysis methods. This course covers probability axioms, probability and sample space, events and random variables, probability inequalities, independence, discrete and continuous distributions, expectations and variances, conditional expectation, moment generating functions, random vectors, variable transformations, convergence concepts, the Central Limit Theorem, weak and strong Law of Large Numbers, the delta method, extreme value distributions, order statistics, exponential and location-scale families, and basic techniques for generating random variables. During the lab section of the course, we will perform simulations to better understand concepts and explore links between probability theory and applied problems. Multivariable calculus is a prerequisite. Lab required [1]. Fall [3] (Shepherd)

**6342 & 6342L. Contemporary Statistical Inference: **This is the second in a two-course series designed to impart the fundamental probabilistic and inferential framework in statistical probability and inference. Students learn the key tools of mathematical statistics (likelihood, estimating equations, information quantities, etc.), popular methods of inference (hypothesis testing, significance testing, confidence intervals), the schools of inferential philosophy (Frequentist, Bayesian and Likelihood) and their associated controversies. Topics include: delta method, sufficiency, minimal sufficiency, ancillarity, completeness, conditionality principle, Fisher’s Information, Cramer-Rao inequality, hypothesis testing (likelihood ratios test, most powerful test, optimality, Neyman-Pearson lemma, inversion of test statistics), Likelihood principle, Law of Likelihood, Bayesian posterior estimation, Interval estimation (confidence intervals, support intervals, credible intervals), basic asymptotic and large sample theory, maximum likelihood estimation, re-sampling techniques (e.g., bootstrap). Lab required [1]. Spring [3] (Blume)

**7323 & 7323L. Applied Survival Analysis:** This course provides an introduction to methods for time-to-event data with censoring mechanisms. Topics include: ideas of censoring and truncation, nonparametric approaches (e.g., Kaplan-Meir, log-rank), semi-parametric approaches (e.g., Cox model, extended Cox model with time-dependent covariates), parametric approaches (e.g., Weibull, gamma), multivariate survival model (e.g. frailty model, marginal model), model diagnostics, and sample size calculation for time-to-event data. Focus is on fitting the models and the relevance of those models for the biomedical application. Lab required [1]. Fall [3] (Chen, Q.)

**7330. Regression Modeling Strategies:** The first part of the course presents the following elements of multivariable predictive modeling for a single response variable: using regression splines to relax linearity assumptions, perils of variable selection and over-fitting, where to spend degrees of freedom, shrinkage, imputation of missing data, data reduction, interaction surfaces, and measuring predictive accuracy. Then a default overall modeling strategy will be described. This is followed by methods for graphically understanding models (e.g., using nomograms) and using re-sampling to estimate a model’s likely performance on new data. Then, the R rms package, which facilitates most steps of the modeling process, will be overviewed. Next, statistical methods related to longitudinal regression, binary logistic models, ordinal regression, and survival models will be covered. Along the way, various general features of maximum likelihood estimation and bootstrapping are explored. Comprehensive case studies will be presented: analysis of efficacy in a longitudinal randomized clinical trial using generalized least squares, modeling hemoglobin A1c from NHANES data, an exploration of the survival of Titanic passengers, flexible modeling of ordinal clinical outcomes, developing a survival time model for critically ill patients, and developing a Cox model in chronic disease. Students undertake a variety of in-depth analyses incorporating methods of reproducible research. Spring [3] (Harrell)

**7345 & 7345L. Advanced Regression Analysis I (Linear & General Linear Models):** Students are exposed to a theoretical framework for linear and generalized models. The first half of the semester covers linear models: multivariate normal theory, least squares estimation, limiting chi-square and F-distributions, sum of squares (partial, sequential) and expected sum of squares, weighted least squares, orthogonality, Analysis of Variance (ANOVA). Second half of the semester focuses on generalized linear models: binomial, Poisson, multinomial errors, introduction to categorical data analysis, conditional likelihoods, quasi-likelihoods, model checking. Lab required [1]. Fall [3] (Kang)

**7346 & 7346L. Advanced Regression Analysis II (General Linear Models & Longitudinal Data Analysis): **Covers the classic repeated measures model, the general linear model for longitudinal data, linear and generalized linear mixed effects models, and for generalized linear models for longitudinal data, distinguishes marginal and conditional models. Semi-parametric (generalized estimating equations) and parametric (generalized least squares and likelihood-based mixed effects models) estimation and inference are central to the course. Advanced topics include missing data techniques, causal inference, marginalized regression model, and study design considerations for longitudinal data. Lab required [1]. Spring [3] (Schildcrout)

**7351. Statistical Collaboration in Health Sciences I:** Students are exposed to a variety of problems that arise in collaborative arrangements. The course’s goal is to develop the knowledge and skills necessary to successfully interact with research collaborators. The importance of developing communicative, ethical, and professional skills to establish a successful collaboration will be emphasized. Students will roleplay and develop projects with real investigators, present research and biostatistical topics, discuss collaborative situations that have gone awry, discover important concepts through a cases, and face real life issues such as, poor scientific formulation, lack of time and expectations, supervision and interview skills, career track choices, grantsmanship, and business negotiations. Course content will also make use of departmental clinics that run concurrently. Fall [3] (Davidson)

**7352. ****Statistical Collaboration in Health Sciences II:** Second course of a year-long sequence in collaboration in statistical science. Students are exposed to a variety of statistical and methodological problems that can arise in collaborative arrangements. The course’s goal is to sharpen students’ skills in applying their statistical knowledge in real world settings, while exposing them to the application of advanced statistical techniques in routine health science applications. The importance of understanding and learning the science underlying collaborations will be emphasized. Students will engage in consulting projects that will involve the use of a wide range of biostatistics methods from design to analysis. Prerequisite: 7351 Spring [3] (Liu)

**7361. ****Advanced Probability and Real Analysis Concepts:** This course provides a basic foundation in probability theory that includes probability spaces, set functions, sigma-algebras, random variables, expectation, L^{p} spaces, conditional expectation and projections, characteristic functions, modes of convergance, uniform integrability, classical limit theorems, random walks, martingales, Markov chains, and Brownian motion. Emphasis on measure theory is minimal. Concepts are illustrated in biomedical applications whenever possible. Fall [3] (Johnson)

**7362 & 7362L. Advanced Statistical Inference and Statistical Learning: **This course provides a technically oriented survey of modern inferential tools and statistical learning. Topics include variable selection and regularization, basis expansions (e.g., splines), kernel smoothing, tree-based methods, supervised and unsupervised learning, neural networks, support vector machines, and ensemble methods. General techniques for inference will also be discussed, including bootstrap techniques and analytical approximations (e.g., the multivariate delta method), and exact methods. Lab required [1]. Spring. [3]. Shotwell.

**8366. Advanced Statistical Computing:** The second computational statistics course covers advanced computational and machine learning algorithms using the Python programming language. These include numerical optimization and integration, Markov Chain Monte Carlo (MCMC), estimation-maximization (EM) algorithms, Gaussian processes, Hamiltonian Monte Carlo, clustering, decision trees, and graphical models. Students will be also be introduced to parallel and high performance computing approaches. Prerequisite: BIOS 301 or permission of instructor Fall [3]

**8370. Foundations of Statistical Inference:** Examines the foundations of statistical inference as viewed from Frequentist, Bayesian, and Likelihood approaches. Famous papers and controversies are discussed along with statistical theories of evidence and decision theory, and their historic significance. Spring. [3]. Blume.

**8372. Bayesian Methods:** This course covers the methodology and rationale for Bayesian methods and their applications. Statistical topics include the historical development of Bayesian method such as hierarchical models, Markov Chain Monte Carlo (MCMC) and related sampling methods, specification of priors, sensitivity analysis, and model checking and comparison. This course features applications of Bayesian methods to biomedical research. Prerequisite: BIOS 6301, BIOS 6312, BIOS 7330, BIOS 6341, BIOS 6342 and BIOS 7345, or equivalent; for non-biostatistics students, permission required. Fall [3] Choi

**8375. Causal Inference:**This course provides an introduction to causal inference methods for observational data and randomized studies. Topics include the Rubin causal model, directed acyclic graphs, propensity scores, inverse probability weighting, instrumental variables, causal mediation analysis, marginal structural models, g-computation, and sensitivity analyses to examine robustness to untestable assumptions. Students will learn the basic theory behind the methods and will apply them to biomedical data examples. Prerequisites: 6341, 6342, 7323, and 7346 or approval by the instructor. Spring. [3] (Shepherd)

**8398. Special Topics in Biostatistics: **Special topics in Biostatistics; Content set by the faculty instructor. (Staff)

**7999. Master’s Thesis Research: **Credit hours for students engaging in Master’s thesis research. (Staff)

**8999. Non-Candidate Research: **Credit hours for students engaging in dissertation research prior to completing qualifying exams. (Staff)

** 9999. Ph.D. Dissertation Research: **Credit hours for students engaging in dissertation research. (Staff)

**Courses in development**

**334. Statistical Genetics and Bioinformatics: **This course provides an introduction to, and discusses, the statistical methodology of, genomics-inspired techniques and bioinformatics tools, including genome sequencing, DNA microarrays, proteomics, publicly available databases and software tools.Statistical topics include multiple hypothesis testing, clustering and classification, variable selection, hidden Markov model, and Bayesian networks. Methods for high-dimensional data analysis will also be illustrated and discussed.

**336. Principles of Graphics: **This course discusses the underlying goals in presenting and visualizing data, and how best to achieve those goals using different graphical techniques. Topics include: nomograms, principles of scaling, data summarization, theories of graphical perception and principles of graph construction.

**338. Accommodating Missing Data: **This course provides an in-depth exploration of methods for handling missing data. Topics include last observation carried forward, complete case analysis, pattern mixture models, predictive mean methods, MAR and MCAR assumptions, missingness in the response and covariates, and sensitivity analysis.