Advanced R for Epidemiologists

Tuesday, June 1, 2021 - Wednesday, June 30, 2021

Download the syllabus for this course

Download the flyer for this course

Course Description

This goal of this course is to introduce students to advanced data wrangling techniques, advanced quantitative methods, reproducible research, and graphics using R and RStudio. Heavy emphasis will be placed on the tidyverse suite of packages and functions for data manipulation and modeling, and for more advanced applied epidemiologic analyses. Each session is a combination of didactic lecture and hands-on practice. Students will clean and analyze actual data sets, learn the importance of data preparation and cleaning, how to visualize descriptive analyses, write their own functions, run regressions, and use R for advanced methods such as multiple imputation. The course will provide a quick introduction to R but move quickly into the tidyverse and advanced topics.

Course Objectives

The primary objective of this course is to introduce students to advanced R programming skills so they can (1) more efficiently prepare data for analysis; (2) implement advanced epidemiologic methods in R; and (2) continue learning and problem-solving in R beyond the course. By the end of the course students will be able use R to:


  • Enter, read, index, clean, organize, and manipulate (i.e., “wrangle”) epidemiologic data
  • Understand and use packages from the “Tidyverse'
  • Develop visually attractive graphics based on the grammar of graphics (i.e., ggplot2)
  • Write their own R functions for data manipulation and analyses not found in base R
  • Share their knowledge by developing reproducible research documents using the Rmarkdown format
  • Be aware of how R can be used for multiple imputation, inverse probability weighted marginal structural models, and other advanced methods


Some familiarity with R is ideal. Students are expected to be familiar with the file structure of their operating systems (Windows, OS X, or LINUX) and how to download, install, and transfer files. A basic understanding of statistical and epidemiological principles is strongly encouraged but brief introductions to key concepts will be offered as necessary although focus will naturally be on the application and interpretation of statistical and epidemiological concepts in the context of R.




Students will need access to a computer with high-speed internet access. In addition, students will need to download and install the following software:

  • R, a free multi-platform (Windows, Mac, Linux) user-maintained advanced statistical and scientific computing platform based on the S-plus language. R has versatile and powerful graphics capability. In addition to the thousands of user-contributed packages, the language is easily extendable though simple object-oriented programming. A number of specialized packages will be used e.g., ggplot2, Tidyverse, etc
  • RStudio, a free multi-platform (Windows, Mac, Linux) integrated development environment (IDE) for R designed to make analysis and document production more efficient and reproducible.

Course Reading List



Seth Prins, PhD

Click here to download Criminogenic or Criminalized? Testing an Assumption for Expanding Criminogenic Risk Assessment. Prins, S. J. Law and Human Behavior, Online First. 2019. My two programs of research concern the collateral consequences of mass incarceration for public health, and the effects of the social division and structure of labor on mental illness. Two questions have motivated my work to date: First, what are the theoretical and methodological assumptions underlying the growing use of psychiatric categories, such as antisocial personality, to explain and assess the risk of exposure to the criminal justice system, particularly in the context of mass incarceration? Second, what can we learn about the distribution and determinants of mental illness by examining social class as a dynamic relational process, rather than an individual attribute? I am also working on a project to study the role of adolescent substance use as determinant and consequence of the school-to- prison pipeline, disentangling individual risk, social determinants, and group disparities. I explore these questions at the intersections of epidemiology, sociology, and criminology, combining theory-driven analysis with advanced quantitative methods. I am a social and psychiatric epidemiologist interested in pushing the boundaries of the discipline to encompass rich social theory.


Course Fee

Early registration discount before April 1, 2021: NA
After April 1, 2021: $1,000.00


The registration period has closed for this event.

Online Course Format

This is a month-long digital course, equivalent to approximately 20 hours of classroom instruction. Lectures and course material will be presented online in roughly weekly segments. The flexible format will include video or audio recordings of lecture material, file sharing and topical discussion, self-assessment exercises, and access to the instructor for feedback during the course. The course utilizes the learning management software, Canvas (; participants will receive an e-mail inviting them to join on the first day of the course. Any additional information about technical requirements and access to the course will be shared in the weeks before the course begins.

Share This