Collections
publication
Tackling "Big Data" with Biology Undergrads: A Simple RNA-seq Data Analysis Tutorial Using Galaxy
Analyzing high-throughput DNA sequence data is a fundamental skill in modern biology. However, real and perceived barriers such as massive file sizes, substantial computational requirements, and lack of instructor background knowledge can discourage faculty from incorporating high-throughput sequence data into their courses. We developed a straightforward and detailed tutorial that guides students through the analysis of RNA sequencing (RNA-seq) data using Galaxy, a public web-based bioinformatics platform. The tutorial stretches over three laboratory periods (~8 hours) and is appropriate for undergraduate molecular biology and genetics courses. Sequence files are imported into a student's Galaxy user account directly from the National Center for Biotechnology Information Sequence Read Archive (NCBI SRA), eliminating the need for on-site file storage. Using Galaxy's graphical user interface and a defined set of analysis tools, students perform sequence quality assessment and trimming, map individual sequence reads to a genome, generate a counts table, and carry out differential gene expression analysis. All of these steps are performed "in the cloud," using offsite computational infrastructure. The provided tutorial utilizes RNA-seq data from a published study focused on nematode infection of Arabidopsis thaliana. Based on their analysis of the data, students are challenged to develop new hypotheses about how plants respond to nematode parasitism. However, the workflow is flexible and can accommodate alternative data sets from NCBI SRA or the instructor. Overall, this resource provides a simple introduction to the analysis of "big data" in the undergraduate classroom, with limited prior background and infrastructure required for successful implementation.
Carolyn Wetzel onto Genetics BIO243
@
on