Resource Image

Cleaning Biodiversity Data: A Botanical Example Using Excel or RStudio

Author(s): Michelle Gaynor

University of Florida

2834 total view(s), 1277 download(s)

0 comment(s) (Post a comment)

Description

This was an adaptation created by Michelle Gaynor (University of Florida) as part of a BCEENET (https://bceenetwork.org/) workshop on data cleaning co-facilitated with Pam Soltis (University of Florida) using the species Shortia galacifolia, the Oconee bells or acony bell, which is a rare North American plant in the family Diapensiaceae found in the southern Appalachian Mountains.

Reference the original resource for more background information: https://qubeshub.org/publications/1899/

Students will clean an open sources herbarium dataset using best practices to accurately and clearly designate each step taken to collect, clean, and analyze open access biodiversity data. This exercise uses Excel or R.

Upon completion of this module, each student should be able to:

  1. Access biodiversity data from open sources.
  2. Use descriptive, retrievable, and consistent file names to manage datasets.
  3. Identify common problems with digital datasets
  4. Rectify common problems with digital datasets
  5. Apply disciplinary knowledge for smart data cleaning 
  6. Explain the importance of reproducible data and cleaning steps
  7. Document data cleaning steps to provide reproducibility.

Access data and Excel and RStudio examples from GitHub: https://github.com/mgaynor1/BCEENET-DataCleaning

Notes

This resource was originally developed in part at the 2019 QUBES & BioQUEST Summer Institute Evolution of Data in the Classroom: From Data to Data Science.

This was an adaptation created by Michelle Gaynor (University of Florida) for a BCEENET (https://bceenetwork.org/) workshop in 2020 using the species Shortia galacifolia, the Oconee bells or acony bell, which is a rare North American plant in the family Diapensiaceae found in the southern Appalachian Mountains.

Reference the original resource for more background information: https://qubeshub.org/publications/1899/

Cite this work

Researchers should cite this work as follows: