Resource Image

Working with Datasets in R swirl

Author(s): Caitlin Hicks Pries

Dartmouth College

1410 total view(s), 818 download(s)

0 comment(s) (Post a comment)

Summary:
The goal of this lesson is to learn how to import datasets into R, understand variable types, make adjustments to variables, perform basic calculations, and begin data visualization. The exercise uses an over 100 year time series of climate data.

Licensed under CC Attribution-ShareAlike 4.0 International according to these terms

Version 1.0 - published on 15 May 2019 doi:10.25334/Q4KF2V - cite this

Description

This lesson is part of a Global Change Biology course where one of the main learning objectives is to have the students develop and answer scientific questions by using R to access, organize, and graph actual long-term data sets. In order to explore global change data and learn how various anthropomorphic activities are affecting ecosystems and species, we need to import datasets into R and make sure all the variables are in the correct format that will allow us to perform data visualization (aka, create graphs). In this lesson, we will learn how to import data into R, perform basic QA/QC (quality assurance/quality control) of the data, and create some basic exploratory graphs of the data from which we can begin to answer scientific questions. We will be working with a dataset from the PRISM Climate Group (http://www.prism.oregonstate.edu/) that includes air temperature and precipitation over the past one hundred years in two towns in the United States (Boulder, CO and Hanover, NH). This resource also includes a graphing homework assignment and key that can be used as an assessment and extension of learning after students complete the swirl lesson. 

Learning objectives:

  1. You will learn how to import data into R and perform basic QA/QC.
  2. You will learn about different data and variable formats you will encounter in this course (and throughout your data ‘career’).
  3. You will learn how to manipulate date variables in R in order to get them into the format you need for data visualization.
  4. You will learn how to use piping in R (from tidyverse package) to subset the data and perform basic calculations with the data.
  5. You will be able to create simple graphs to visualize data in order to answer scientific questions.

While the majority of this lesson was built from scratch, The graphing part of this lesson was modified from the existing swirl lesson GGPlot2_Part1 in the Exploratory_Data_Analysis course (https://github.com/swirldev/swirl_courses/blob/master/Exploratory_Data_Analysis/GGPlot2_Part1/lesson).

Climate data used in this lesson was downloaded from the PRISM Climate Group (http://www.prism.oregonstate.edu/).

Cite this work

Researchers should cite this work as follows: