Resources

Resource Image

A Fun Introductory Command Line Exercise: Next Generation Sequencing Quality Analysis with Emoji!

Author(s): Rachael St. Jacques1, Max Maza1, Sabrina Robertson2, Guoqing Lu3, Andrew Lonsdale4, Ray A Enke5

1. Department of Biology, James Madison University 2. Department of Psychology & Neuroscience, University of North Carolina at Chapel Hill 3. Department of Biology and School of Interdisciplinary Informatics, University of Nebraska Omaha 4. ARC Centre of Excellence in Plant Cell Walls, Melbourne University 5. James Madison University

2834 total view(s), 2408 download(s)

1 comment(s) (Post a comment)

Summary:
This resource is a fun computer-based intro to command line programming. The activity takes FASTQ NGS data files and runs a fun program called FASTQE.

Description

The activity takes FASTQ NGS data files and runs a fun program called FASTQE. This program is very similar to the popular FastQC, however, rather than outputting data plot visualizations of NGS sequence quality, FASTQE outputs emojis signifying the quality of each base call in the file. The activity takes a fundamental yet sort of boring step in NGS analysis and makes it accessible and fun to students without much experience in the field. It is also designed to for students with little to no experience using command line analysis to learn and run a few simple commands. The elated reaction from students when they get a long string of emojis to output after typing a few commands is really cool! The activity also utilizes another command line tool called FASTP for FASTQ file trimming and filtering.

Notes

Changes for version 2:

  • Content was refined and edited
  • Instructor and student versions of the lesson are distinct and separate files
  • A PowerPoint slide deck now accompanies the lesson
  • 2 zipped FASTQ.gz files have been included for analysis

Cite this work

Researchers should cite this work as follows: