Resource Image

Data Management in Excel and R using National Ecological Observatory Network's (NEON) Small Mammal Data

Author(s): Marguerite Mauritz1, Sarah McCord2

1. University of Texas at El Paso 2. USDA ARS Jornada Experimental Range

1584 total view(s), 1582 download(s)

Summary:
Students use small mammal data from the National Ecological Observatory Network to understand necessary steps of data management from data collection to data analysis by re-organising excel sheets in an R-compatible format and doing basic analysis…

more

Students use small mammal data from the National Ecological Observatory Network to understand necessary steps of data management from data collection to data analysis by re-organising excel sheets in an R-compatible format and doing basic analysis in R

Description

Background

Undergraduate STEM students are graduating into professions that require them to manage and work with data at many points of a data management life cycle. Within ecology, students are presented not only with many opportunities to collect data themselves, but increasingly to access and use public data collected by others. This activity introduces the basic concept of data management, spreadsheet management, and metadata to allow meaningful data analysis. The accompanying presentation materials mention the importance of considering long-term data storage and data analysis using public data.

This data set is a subset of small mammal trapping data from the National Ecological Observatory Network (NEON). The accompanying lesson introduces students to proper data management practices including how data moves from collection to analysis. Students will do some manual preparation of messy spreadsheets with small datasets to prepare data in an R-compatible format. Students will then import data to R and do some preliminary data checks and a basic visualization. 

Students then use a much larger NEON dataset and metadata sheets to understand small mammal capturing methods. Students use the field protocol and metadata to navigate the data structure and then import the data to R for some data-checks and visualizations. 

 Data from six months at NEON’s Smithsonian Conservation Biology Institute (SCBI) field site are included in the materials download. Data from other years or locations can be downloaded directly from the NEON data portal to tailor the activity to a specific location or ecological topic.

Teaching notes:

I taught this module asynchronously in Fall 2020. Students received a background document for the lab, a brief video lecture covering the basic principles of good spreadsheet management, and had the option to work with extensively scripted helper R code or a minimal R code framework to complete the activity. I also provided a video in which I walked through and completed some components of the R code. Students worked in groups of 2-4 in their own time and were given 1 week to complete the lab. 

In this activity, students will:

  • reflect on data management practices themselves and in teams. Presentation slides are provided to guide this discussion.
  • view field collection data sheets to understand how organized data sheets can be constructed.
  • design a spreadsheet data table for transcription of field collected data using good data management practices.
  • view NEON small mammal trapping data to a) see a standardized spreadsheet data table and b) see what data are collected during NEON small mammal trapping.
  • use R to read and analyze patterns in spatial and seasonal small mammal abundance

R skills

  • basic familiarity with R markdown, ggplot2, and dplyr packages
  • I provide helper code (most of the code is pre-written with prompts for where to enter variables) and minimal code (code is initiated and students write the rest)
  • students can choose which code they use to allow greater accessibility and the ability to explore more independent coding

The Data Sets

The National Ecological Observatory Network is a program sponsored by the National Science Foundation and operated under cooperative agreement by Battelle Memorial Institute. This material is based in part upon work supported by the National Science Foundation through the NEON Program.

The following datasets are posted for educational purposes only. Data for research purposes should be obtained directly from the National Ecological Observatory Network (www.neonscience.org).

Data Citation: National Ecological Observatory Network. 2017. Data Product: NEON.DP1.10072.001. Provisional data downloaded from http://data.neonscience.org. Battelle, Boulder, CO, USA

Adaptation Notes

This version was adapted by blending ideas from two other Qubes resources: 

  1. McNeil, J., Jones, M. A. (2018). Data Management using NEON Small Mammal Data with Accompanying Lesson on Mark Recapture Analysis. NEON - National Ecological Observatory Network, QUBES. doi:10.25334/Q4XH5S

  2. Hernández-Pacheco, R. H. (2018). More In Depth Spreadsheet Management Adaptation of Data Management using NEON Small Mammal Data. NEON Faculty Mentoring Network, QUBES Educational Resources. doi:10.25334/Q44X4D

Notes

The main adaptation of this module was to the addition of an R component for the data investigation. In R, students check for correct data formatting, do some basic data preparation and column formatting, and then examine seasonal abundance by habitat and species. The mark-recapture analysis was excluded. This adapted module also contains some links to overview videos and general principles of data QA/QC

 

Adaptation:

This version was adapted by blending ideas from two other Qubes resources: 

  1. McNeil, J., Jones, M. A. (2018). Data Management using NEON Small Mammal Data with Accompanying Lesson on Mark Recapture Analysis. NEON - National Ecological Observatory Network, QUBES. doi:10.25334/Q4XH5S

  2. Hernández-Pacheco, R. H. (2018). More In Depth Spreadsheet Management Adaptation of Data Management using NEON Small Mammal Data. NEON Faculty Mentoring Network, QUBES Educational Resources. doi:10.25334/Q44X4D

Cite this work

Researchers should cite this work as follows: