About

Introduction

Coronavirus disease 2019 (COVID-19) pandemic is a global challenge caused by a rapid emergence of the SARS-CoV-2 virus. Over the course of 2020 the numbers of infections reached 144,220,516 cases and 3,065,612 deaths. You can check the current numbers at https://www.worldometers.info/coronavirus/.  According to the United States Centers for Disease Control and prevention (CDC) as of April 21, 2021 there are at least 31,602,676 reported cases and 565,613 total deaths in the United States, with 983,875 reported cases and 283,000 total deaths in the state of New Jersey alone. You can find the recent data at https://covid.cdc.gov/covid-data-tracker/#cases_totalcasesAccording to statistics produced by the CDC, several factors are important in predicting the likeliness of the infection and severity of the disease development such as age, gender, and density of a population.

In this exercise, students will analyze an original dataset prepared by Genesis Laboratory Management, a COVID-testing laboratory located in Monmouth County, NJ, USA. The laboratory collects patient specimens from the whole New Jersey, from all counties of the state, but of course, more samples are coming from the Monmouth County. We suggest to interpret the results of the analysis keeping in mind that it might be biased towards Monmouth County. The samples were tested for the SARS-CoV-2 RNA by PCR and the results are recorded in the database. The data cover the period from March to December 2020.

Software: RStudio

R packages: maps, mapdata, ggplot2, gifski and dplyr.

Dataset: an adapted version from the original data from Genesis Laboratories.  All personal information was deleted from the dataset. The dataset is a CSV (comma delimited) file with individual test result entries as rows, and information on patient county, age, sex, and test results as columns. The data set also contains the central geographic location (longitude and latitude) for each county.

Learning objectives:

In this activity, students will:

  1. Identify appropriate computational approaches to address questions on epidemiology of COVID-19
  2. Use statistical software R to analyze data on COVID-19 pandemic
  3. Evaluate data using graphical representation
  4. Apply statistical methods such as non-parametric t-test to analyze original data
  5. Test hypothesis on factors that affect viral spread
  6. Draw conclusions based on data analysis

Questions that students will be able to answer after completing this module: Do you think the number of positive cases for COVID-19 has changed from March to December of 2020 in each county? Do you think it will depend on the number of tests conducted, or the population of the county? Is there a difference in the number of positive tests between males and females? Was the virus spreading from one location within the state or from multiple location at the same time?

Team

Publications