Bioinformatics
The science of collecting and analyzing complex biological data.
The CourseSource Learning Framework for Bioinformatics listed below includes the learning goals and objectives relevant to undergraduate biological sciences majors. The framework was developed by Sally Elgin (University of Washington, St. Louis); Mark Pauley (University of Nebraska); Anne Rosenwald (Georgetown University); and Lonnie Welch (Ohio University).
A working group from the Network for Integrating Bioinformatics into Life Sciences Education (NIBLSE) recently, in April 2021, mapped the NIBLSE core competencies from their article in PLoS One onto the existing CourseSource Bioinformatics framework. The updated Bioinformatics course page along with the new framework will be available on the new CourseSource website in summer 2021. The updated CourseSource Bioinformatics framework is currently available as a Google Doc. The NIBLSE working group includes Sally Elgin (University of Washington, St. Louis); Adam Kleinschmit (University of Dubuque); Bill Morgan (College of Wooster); Mark Pauley (University of Nebraska); Anne Rosenwald (Georgetown University); Bill Tapprich (University of Nebraska); Eric Triplett (University of Florida); and Lonnie Welch (Ohio University).
Download the Bioinformatics Learning FrameworkBioinformatics Learning Framework
Society Learning Goals
Articles
Computation in the life sciences
What is the role of computation in hypothesis-driven discovery processes within the life sciences?
Sample Learning Objectives- Describe the role of bioinformatics in the scientific research method.
- Explain the necessity for computation in life sciences research.
- Explain the role of wet-bench techniques in verifying computational results in life science research.
- Compare and contrast computer-based research with wet-lab research.
- Read a scientific article and evaluate how bioinformatics methods were employed by the authors to explore a particular hypothesis.
- Given a scientific question, develop a hypothesis and define computational approaches that could be used to explore the hypothesis.
- Evaluate the social, legal, and ethical implications of computational approaches to understanding biology.
- Using Raw Sequence Data to Analyze Tumor Suppressor Mutations
- Learning R for Biologists: A Mini Course Grab-Bag for Instructors
- Does Organelle Shape Matter?: Exploring Patterns in Cell Shape and Structure with High-Throughput (HT) Imaging
- Using Bioinformatics and Molecular Visualization to Develop Student Hypotheses in a Malate Dehydrogenase Oriented CURE
- A 360˚ View of COVID-19
- Moths and Frogs and E. coli, Oh My!: Agent-based Modeling of Evolutionary Systems
- A Fun Introductory Command Line Lesson: Next Generation Sequencing Quality Analysis with Emoji!
- Mapping a Mutation to its Gene: The "Fly Lab" as a Modern Research Experience
- A CRISPR/Cas Guide RNA Design In Silico Activity
- An Introduction to Eukaryotic Genome Analysis in Non-model Species for Undergraduates: A tutorial from the Genome Consortium for Active Teaching
- Infectious Chocolate Joy with a Side of Poissonian Statistics: An activity connecting life science students with subtle physics concepts
What computational concepts are important in bioinformatics?
Sample Learning Objectives- Define the term algorithm.
- Explain the difference between a heuristic (approximate) algorithm and an exact algorithm.
- Describe the three basic programming structures: sequential, repetition (e.g., while, for) and selection (e.g., if).
- Use variables and data structures (e.g., lists, arrays, scalars, hash functions).
- Describe what a regular expression is.
- Explain the concept of cloud computing.
- Describe the importance of “big data” in bioinformatics.
- Describe the means by which “big data” are managed and stored (e.g. dmptool.org).
- Central Dogma, Dictionaries, and Functions: Using Programming Concepts to Simulate Biological Processes
- Learning R for Biologists: A Mini Course Grab-Bag for Instructors
- A Fun Introductory Command Line Lesson: Next Generation Sequencing Quality Analysis with Emoji!
- Sequence Similarity: An inquiry based and "under the hood" approach for incorporating molecular sequence alignment in introductory undergraduate biology courses
- An Introduction to Eukaryotic Genome Analysis in Non-model Species for Undergraduates: A tutorial from the Genome Consortium for Active Teaching
- Teaching RNAseq at Undergraduate Institutions: A tutorial and R package from the Genome Consortium for Active Teaching
What statistical concepts are important in bioinformatics?
Sample Learning Objectives- Calculate average, median, mode, range, standard deviations for a given data set.
- Calculate p-values using a t-test for discrete data.
- Calculate p-values using a z-test for continuous data.
- Calculate an e-value statistic.
- Describe the importance of statistical analysis of big data sets.
- Create a network to illustrate gene interactions.
- Learning R for Biologists: A Mini Course Grab-Bag for Instructors
- An Introduction to Eukaryotic Genome Analysis in Non-model Species for Undergraduates: A tutorial from the Genome Consortium for Active Teaching
- Teaching RNAseq at Undergraduate Institutions: A tutorial and R package from the Genome Consortium for Active Teaching
DNA - Information Storage [GENOMICS]
Where are data about the genome found (e.g., nucleotide sequence, epigenomics) and how are they stored and accessed?
Sample Learning Objectives- Describe how nucleotide sequence data are represented (FASTA, FASTQ, GenBank).
- Describe the nucleotide databases available at NCBI.
- Describe how the NCBI nucleotide databases intersect with other nucleotide databases (EBI, DDBJ, UniProt, etc.).
- Compare and contrast the data contained in different nucleotide databases.
- Search for a sequence record in a nucleotide database with a given accession number.
- Create a collection of nucleotide sequence records that meet a specified criterion (e.g., gene name or symbol).
- Determine the DNA methylation state of a particular region of a genome.
- Describe the types of metadata that accompany sequence data to make for useful biological interpretation (e.g. biological source, accession number, GeneID, journal articles, etc.).
- Central Dogma, Dictionaries, and Functions: Using Programming Concepts to Simulate Biological Processes
- How to Find a Gene: Retrieving Information From Gene Databases
- Single Cell Insights Into Cancer Transcriptomes: A Five-Part Single-Cell RNAseq Case Study Lesson
- A CURE for Salmonella: A Laboratory Course in Pathogen Microbiology and Genomics
- An undergraduate bioinformatics curriculum that teaches eukaryotic gene structure
- A Fun Introductory Command Line Lesson: Next Generation Sequencing Quality Analysis with Emoji!
- Mapping a Mutation to its Gene: The "Fly Lab" as a Modern Research Experience
- A CRISPR/Cas Guide RNA Design In Silico Activity
- A CURE-based approach to teaching genomics using mitochondrial genomes
- Using QIIME to Interpret Environmental Microbial Communities in an Upper Level Metagenomics Course
- Predicting and classifying effects of insertion and deletion mutations on protein coding regions
- Using computational molecular modeling software to demonstrate how DNA mutations cause phenotypes
- A Hands-on Introduction to Hidden Markov Models
- Exploration of the Human Genome by Investigation of Personalized SNPs
- Using Undergraduate Molecular Biology Labs to Discover Targets of miRNAs in Humans
How can bioinformatics tools be employed to analyze genetic information?
Sample Learning Objectives- Calculate the alignment score between two DNA sequences using a provided scoring matrix.
- Perform a BLASTN search and interpret the results.
- Explain the BLASTN algorithm for nucleotide sequence information.
- Interpret the biological significance of an e-value.
- Annotate a prokaryotic gene (derive a model).
- Annotate a eukaryotic gene (derive a model).
- Create and interpret a multiple sequence alignment (e.g., T-COFFEE, MUSCLE, etc.).
- For a genomic region of interest (e.g., the neighborhood of a particular gene), use a genome browser to view nearby genes, transcription factor binding regions, epigenetic information, etc.
- Describe Hidden Markov Models and how they can be used to assess motifs.
- How to Find a Gene: Retrieving Information From Gene Databases
- Application of a Bacterial Experimental Evolution System to Visualize and Teach Evolution in Action: A Course-Based Undergraduate Research Experience
- A 360˚ View of COVID-19
- Single Cell Insights Into Cancer Transcriptomes: A Five-Part Single-Cell RNAseq Case Study Lesson
- A CURE for Salmonella: A Laboratory Course in Pathogen Microbiology and Genomics
- An undergraduate bioinformatics curriculum that teaches eukaryotic gene structure
- A Fun Introductory Command Line Lesson: Next Generation Sequencing Quality Analysis with Emoji!
- Mapping a Mutation to its Gene: The "Fly Lab" as a Modern Research Experience
- A CRISPR/Cas Guide RNA Design In Silico Activity
- A CURE-based approach to teaching genomics using mitochondrial genomes
- Sequence Similarity: An inquiry based and "under the hood" approach for incorporating molecular sequence alignment in introductory undergraduate biology courses
- Using QIIME to Interpret Environmental Microbial Communities in an Upper Level Metagenomics Course
- An Introduction to Eukaryotic Genome Analysis in Non-model Species for Undergraduates: A tutorial from the Genome Consortium for Active Teaching
- Predicting and classifying effects of insertion and deletion mutations on protein coding regions
- Using computational molecular modeling software to demonstrate how DNA mutations cause phenotypes
- A Hands-on Introduction to Hidden Markov Models
- Exploration of the Human Genome by Investigation of Personalized SNPs
- Using Undergraduate Molecular Biology Labs to Discover Targets of miRNAs in Humans
- Homologous chromosomes? Exploring human sex chromosomes, sex determination and sex reversal using bioinformatics approaches
RNA - Information Transfer [TRANSCRIPTOMICS]
Where are data about the transcriptome found (e.g., expression, epigenomics and structure) and how are they stored and accessed?
Sample Learning Objectives- Identify the euchromatin/heterochromatin boundaries, histone states in a given sequence, and the nucleosome modifications
- Compare and contrast DNA structure at telomeres and centromeres.
- Describe the RNA databases available at NCBI (e.g., ESTs, UniGene).
- Describe the types of metadata that accompany sequence data to make for useful biological interpretation (e.g. biological source, accession number, GeneID, journal articles, etc.).
- How to Find a Gene: Retrieving Information From Gene Databases
- Single Cell Insights Into Cancer Transcriptomes: A Five-Part Single-Cell RNAseq Case Study Lesson
- An undergraduate bioinformatics curriculum that teaches eukaryotic gene structure
- Tackling "Big Data" with Biology Undergrads: A Simple RNA-seq Data Analysis Tutorial Using Galaxy
How can bioinformatics tools be employed to examine transfer of genetic information?
Sample Learning Objectives- Given a microarray or RNA-seq data file, find the set of significantly differentially expressed genes.
- Perform motif discovery on the promoter regions of a set of genes identified by a ChIP-seq experiment.
- Use RNA structure prediction programs (e.g., RNASoft, RNAFold, RNAStructure) to evaluate possible structures for an RNA sequence.
- Identify the possible different splice isoforms possible from a given gene sequence.
- Single Cell Insights Into Cancer Transcriptomes: A Five-Part Single-Cell RNAseq Case Study Lesson
- An undergraduate bioinformatics curriculum that teaches eukaryotic gene structure
- Tackling "Big Data" with Biology Undergrads: A Simple RNA-seq Data Analysis Tutorial Using Galaxy
- Teaching RNAseq at Undergraduate Institutions: A tutorial and R package from the Genome Consortium for Active Teaching
Protein - Information in Action [PROTEOMICS]
Where are data about the proteome found (e.g., amino acid sequence and structure) and how are they stored and accessed?
Sample Learning Objectives- Describe how protein sequence data are represented (e.g., FASTA, GenBank, etc.)
- Describe the different protein databases available at NCBI (sequence, structure, function).
- Describe how the NCBI databases intersect with other databases (e.g., EBI, DDBJ, UniProt, etc.).
- Compare and contrast data contained in different databases.
- Search for a protein record in a database with a given accession number.
- Create a collection of records that meet a specified criterion (e.g., gene name or symbol).
- Describe the types of metadata that accompany sequence, structure, and function data to make for useful biological interpretation (e.g. biological source, accession number, UniProt number, journal articles, etc.).
How can bioinformatics tools be employed to examine protein structure and function?
Sample Learning Objectives- Explain the BLASTP, BLASTX, tBLASTn, tBLASTx algorithms for protein sequence information
- Interpret the biological significance of an e-value.
- Describe Hidden Markov Models and how they can be used to assess motifs.
- Query a dataset with a specific protein sequence to learn about potential functions (e.g. Pfam, CDD, SwissProt, UniProt, etc.).
- View and interpret the structure output from Protein Data Bank (e.g. Cn3D, Jmol, etc.).
- Propose potential functions for a give protein structure.
- Explain the outputs from protein-folding algorithms to predict structure from sequence.
- Understand how protein structures are determined (e.g. NMR, crystallography).
- Compare and contrast the output from 2-D gel experiments.
- Analyze the output from mass spectrometry analysis (e.g., use the MASCOT package).
Small Molecules - Cellular Homeostasis [METABOLOMICS & SYSTEMS BIOLOGY]
Where are data about metabolomics and systems biology found and how are they stored and accessed?
Sample Learning Objectives- Describe how metabolomics data are represented (e.g. Human Metabolome Database, METLIN Database, etc.)
- Describe the different metabolomic databases that are available.
- Describe the types of metadata that accompany metabolomic data to make for useful biological interpretation.
How can bioinformatics tools be employed to examine flow of molecules within pathways?
Sample Learning Objectives- Perform a GO analysis to identify the pathways relevant to a set of genes (e.g., identified by a transcriptomic study or a proteomic experiment).
- Use the KEGG pathway database to look up the interaction network of a pathway.
- Interpret the data from experiments (e.g., mass spectrometry, nuclear magnetic resonance, etc.) to determine levels of small molecule metabolites.
Ecology and Evolution [METAGENOMICS]
How can bioinformatics tools be employed to examine ecological niches?
Sample Learning Objectives- Create and interpret a multiple sequence alignment (e.g., T-COFFEE, MUSCLE, etc.).
- Describe the components of a phylogenetic tree (e.g., root, node, leaf).
- Explain the various types of phylogenetic trees (e.g., rooted, unrooted).
- Interpret a phylogenetic tree (e.g., which organism is most closely related to a given organism in the tree)
- Sketch a phylogenetic tree from its Newick representation.
- Use bootstrapping to assess the quality of a phylogenetic tree.
- Create a phylogenetic tree for a set of related sequences (nucleotide or amino acid) (e.g., MEGA).
- Use pre-existing tools to analyze a metagenomic data set to determine the set of organisms present in a metagenomic sample (e.g., 16s rRNA, Greengenes, mothur, etc.)
Computational Skills
How do biologists employ software development as part of the scientific discovery process?
Sample Learning Objectives- Write a script to calculate the reverse complement of a nucleotide sequence
- Write a script to determine reading frames of a nucleotide sequence.
- Write a script to calculate melting point of double-stranded DNA.
- Write a script to retrieve the promoter regions for a set of related genes.
- Write a script to find the longest open reading frame in a given nucleotide sequence
- Write a script to calculate the reverse complement of a nucleotide sequence.
- Write a script to convert an RNA sequence to cDNA and to amino acids
- Write a script to calculate molecular weight and isoelectric point.
- Write a script to count amino acid frequency.
- Write a script that compares the relative hydrophilicity/hydrophobicity of two protein sequences.
- Central Dogma, Dictionaries, and Functions: Using Programming Concepts to Simulate Biological Processes
- Application of a Bacterial Experimental Evolution System to Visualize and Teach Evolution in Action: A Course-Based Undergraduate Research Experience
- Moths and Frogs and E. coli, Oh My!: Agent-based Modeling of Evolutionary Systems
- A CRISPR/Cas Guide RNA Design In Silico Activity
- Teaching RNAseq at Undergraduate Institutions: A tutorial and R package from the Genome Consortium for Active Teaching
What higher-level computational skills can be used in bioinformatics research?
Sample Learning Objectives- Use a spreadsheet to perform simple data analysis.
- Use a spreadsheet to open, read, parse, modify and output comma-separate (.csv) files that will be ready to use in subsequent tools.
- Perform elementary statistical analysis on an “omics” dataset (e.g. using Excel or Weka).
- Perform Input/Output with data files.
- Interact with remote servers.
- Construct a bioinformatics pipeline.
- Use open source libraries and packages (e.g., BioPerl, Biopython, R, BioConductor).
- Use programs at the Unix/Linux command line to analyze bioinformatics data.
- Use graph theory to represent data networks.
- Central Dogma, Dictionaries, and Functions: Using Programming Concepts to Simulate Biological Processes
- Learning R for Biologists: A Mini Course Grab-Bag for Instructors
- Application of a Bacterial Experimental Evolution System to Visualize and Teach Evolution in Action: A Course-Based Undergraduate Research Experience
- Moths and Frogs and E. coli, Oh My!: Agent-based Modeling of Evolutionary Systems
- Single Cell Insights Into Cancer Transcriptomes: A Five-Part Single-Cell RNAseq Case Study Lesson
- A CURE for Salmonella: A Laboratory Course in Pathogen Microbiology and Genomics
- A Fun Introductory Command Line Lesson: Next Generation Sequencing Quality Analysis with Emoji!
- A CRISPR/Cas Guide RNA Design In Silico Activity
- An Introduction to Eukaryotic Genome Analysis in Non-model Species for Undergraduates: A tutorial from the Genome Consortium for Active Teaching
- Teaching RNAseq at Undergraduate Institutions: A tutorial and R package from the Genome Consortium for Active Teaching