Support

Support Options

  • Knowledge Base

    Find information on common questions and issues.

  • Support Messages

    Check on the status of your correspondences with members of the QUBES team.

Contact Us

About you
About the problem

Genome Sequence Data in R using Biostrings (Swirl Lesson)

By Robert E Furrow

University of California, Davis

By the end of this lesson, students should be able to load FASTA files into R as DNAStringSets and use width() and alphabetFrequency(), combined with other functions like sum() and mean(), to evaluate genome assembly quality and nucleotide...

Listed in Teaching Materials | resource by group Make Teaching with R in Undergraduate Biology Less Excruciating 2020

Version 1.0 - published on 09 Jun 2020 doi:10.25334/ADJT-CH40 - cite this

Licensed under CC Attribution-ShareAlike 4.0 International according to these terms

GenomeSequenceData_CoverImage.png

Description

This swirl lesson aims to familiarize students with DNAStringSets from the Biostrings package in the programming language R. The lesson will build student skills to manipulate and analyze genomic sequence data. By the end of this lesson, students should be able to load FASTA files into R as DNAStringSets and use width() and alphabetFrequency(), combined with other functions like sum() and mean(), to evaluate genome assembly quality and nucleotide frequencies. The swc file is self-contained, and can be used to install the complete lesson using swirl. The lesson plan pdf contains a longer description of the lesson and its context, as well as suggestions for implementation. The lab 2 Rmd and pdf files outline some example material to use leading up to the swirl lesson, and to assess student ability to use the tools. The lab 3 Rmd and pdf contain follow up material with more advanced approaches to working with these DNAStringSets.

The lesson was designed and implemented in a course called Genome Hunters in Spring Quarter 2020 at the University of California, Davis. This course-based research experience guided students through exploratory data analysis on genome assemblies of novel microbes. The materials are aimed at first-year undergraduates with an interest in biology, although the early versions of the class have included students across all four college years. These assemblies can be loaded into R using tools from the Biostrings package, and the genomes can be analyzed easily both with custom functions and with many built-in tools in the package. Although this swirl lesson only introduces the basics of counting nucleotide frequencies and exploring contig lengths, many students went on to use additional tools in the Biostrings package for their individual projects. A full set of the computational lab materials for this class is online here https://bookdown.org/joelrome88/bis23b/, and the labs before and after this swirl homework assignment are included in this resource as RMarkdown and pdf files.

Contents

Cite this work

Researchers should cite this work as follows:

Tags

Make Teaching with R in Undergraduate Biology Less Excruciating 2020

When watching a resource, you will be notified when a new version is released.