The bioinformatics competencies that NIBLSE recommends as essential for undergraduate life sciences students are listed below. The competencies are informed by the results of the national NIBLSE survey, analysis of ninety syllabi with bioinformatics content, and the cumulative expertise and experience of the authors. Following each competency is a list of three representative examples illustrating the competency.
A publication describing these core competencies was recently published in PLOS ONE.
C1. Explain the role of computation and data mining in addressing hypothesis-driven and hypothesis-generating questions within the life sciences. Life sciences students should have a clear understanding of the role computing and data mining play in modern biology. Given a traditional hypothesis-driven research question, students should have ideas about what types of data and software exist that could help them answer the question quickly and efficiently. They should also appreciate that mining large datasets can generate novel hypotheses to be tested in the lab or field.
C2. Summarize key computational concepts, such as algorithms and relational databases, and their applications in the life sciences. To make use of sophisticated software and database tools, students should have a basic understanding of the principles upon which these tools are based and should be exposed to how these tools work.
C3. Apply statistical concepts used in bioinformatics. In addition to the basic statistics found in many biology curricula, modern life scientists should have an understanding of the statistics of large datasets and multiple comparisons.
C4. Use bioinformatics tools to examine complex biological problems in evolution, information flow, and other important areas of biology. This competency is written broadly so as to encompass a variety of problems that can be addressed using bioinformatics tools, such as understanding the evolutionary underpinnings of sequence comparison and homology detection; distinguishing between genomic sequences, RNA sequences, and protein sequences; and
23 interpreting phylogenetic trees. “Complex” biological problems require that students should be able to work through a problem with multiple steps, not just perform isolated tasks.
C5. Find, retrieve, and organize various types of biological data. Given the numerous and varied datasets currently being generated from all of the ‘omics fields, students should develop the facility to: identify appropriate data repositories; navigate and retrieve data from these databases; and organize data relevant to their area of study in flat files or small local stand-alone databases.
C6. Explore and/or model biological interactions, networks and data integration using bioinformatics. Modeling of biological systems at all levels, from cellular to ecological, is being facilitated by technological and algorithmic advances. These models provide novel insights into the perturbations in systems that can cause disease, interactions of microbes with various eukaryotic systems, how metabolic networks respond to environmental stresses, etc. Students should be familiar with the techniques used to generate these analyses and should be able to interpret the outputs and use the data to generate novel hypotheses.
C7. Use command-line bioinformatics tools and write simple computer scripts: Most biological datasets (e.g., genomic and proteomic sequences, BLAST results, RNASeq and resulting differential expression data) are available as text files; the most powerful and dynamic way to interact with these datasets is through the command line or shell scripting. Students should be able to manipulate their own data and to create and modify complex data processing and analysis workflows.
C8. Describe and manage biological data types, structure, and reproducibility. This competency addresses two distinct concerns: 1) each of the varied ‘omics fields produces data in formats particular to its needs, and these formats evolve with changes in technologies and refinements in 24 downstream software; and 2) all experimental data is subject to error and the user must be cognizant of the need to verify the reproducibility of their data. Students need to develop an awareness of, and ability to, manipulate different data types given the versioning of formats. They also need to exercise caution, to carry out appropriate statistical analyses on their data as part of normal operating procedures and report the uncertainty of their results, and to provide the relevant information to enable reproduction of their results.
C9. Interpret the ethical, legal, medical, and social implications of biological data. The increasing scale and penetration of human genetic and genomic data has greatly enhanced our ability to identify disease-related loci, druggable targets, etc. and to identify potential genes for replacement with developing techniques. However, with this information also comes many ethical, legal, and social questions; suggested resolutions are often outpaced by the technological advances. As part of their scientific training, students should debate the medicinal, societal, and ethical implications of these information sets and techniques.