Introductory Data Science Pipeline Activity – Yellow Fever and Global Precipitation

Name: Introductory Data Science Pipeline Activity – Yellow Fever and Global Precipitation
Published: 2024-05-15
License: CC Attribution-ShareAlike 4.0 International

Mary Mulcahy

doi:10.25334/SP15-ZK40

Introductory Data Science Pipeline Activity – Yellow Fever and Global Precipitation

Author(s): Mary Mulcahy

University of Pittsburgh at Bradford

383 total view(s), 363 download(s)

0 comment(s) (Post a comment)

Download

Adapt 0

Brought to you by

Biological and Environmental Data Education (BEDE) Network

Summary:

Students follow the steps of a tiny data science project from start to finish. They are given a research question "Are the number of cases of yellow fever associated with global average precipitation?" The students locate the data from the World…

more

Students follow the steps of a tiny data science project from start to finish. They are given a research question "Are the number of cases of yellow fever associated with global average precipitation?" The students locate the data from the World Health Organization and Environmental Protection Agency, download it, and use the merged and cleaned data to see whether the evidence supports the hypothesis that yellow fever cases are higher in wetter than drier years. The activity is intended to be used early in a course to prepare introductory students to eventually explore their own questions.

Licensed under CC Attribution-ShareAlike 4.0 International according to these terms

Version 1.0 - published on 15 May 2024 doi:10.25334/SP15-ZK40 - cite this

Keywords

Alignments

Contents:

Mulcahy_Pipeline_Activity_Student_Handout_2024_05_Ver001.docx(DOCX | 182 KB)
Mulcahy_Pipeline_Activity_Student_Handout_2024_05_Ver001.pdf(PDF | 265 KB)
Mulcahy_Pipeline_Teaching_Notes_Ver001.docx(DOCX | 30 KB)
Mulcahy_KEY_Pipeline_Activity_2024_05_Ver001.docx(DOCX | 184 KB)
Mulcahy Pipeline Activity Merged Data.csv(CSV | 775 B )
precipitation_fig-2.csv(CSV | 2 KB)
Yellow Fever YF reported cases and incidence 2024-08-05 00-40 UTC.xlsx(XLSX | 9 KB)
Yellow Fever (YF) reported cases and incidence
https://www.epa.gov/climate-indicators/climate-change-indicators-us-and-global-precipitation
Navigating Codap for the Biomes Module Video Tutorial
CODAP
License terms

Description

This activity is an introduction to one version of the data science pipeline with the intent of inspiring students to consider their own research questions that could be answered using this process. In this activity, the data science pipeline is defined as a series of seven steps (start to finish) for using existing data, especially publicly available data, to answer new research questions.

This activity introduces one narrow view of data science. Although the steps are given an order, students are reminded that data scientists may circle back to previous steps or skip a step entirely depending on their research goals. Much like the scientific method, data science is approached in many ways, and the pipeline path described here introduces some of the common terminology and a frequent approach to answering questions.

Define the terms: clean, pull, verify, wrangle, merge, interoperable, file extension, and long and wide format.
Recognize the difference between long and wide data format.
Review metadata associated with a downloaded data file.
Describe the ordered steps of the data pipeline process, but also recognize that the phrase “data science pipeline” means different things to different scientists and that these steps don’t always occur in order.
Create a graph with appropriate axes with units under guidance.
Use an r-squared value to assess the strength of a relationship between two variables.
Explain why statistical analysis may be needed to interpret a data pattern.

Cite this work

Researchers should cite this work as follows:

Mulcahy, M. (2024). Introductory Data Science Pipeline Activity – Yellow Fever and Global Precipitation. Biological and Environmental Data Education (BEDE) Network, QUBES Educational Resources. doi:10.25334/SP15-ZK40
BibTex | EndNote

Introductory Data Science Pipeline Activity – Yellow Fever and Global Precipitation

Keywords

Alignments

Description

Cite this work

Home