Collections

Data Science for Undergraduates: Opportunities and Options

Data science is emerging as a field that is revolutionizing science and industries alike. Work across nearly all domains is becoming more data driven, affecting both the jobs that are available and the skills that are required. As more data and ways of analyzing them become available, more aspects of the economy, society, and daily life will become dependent on data. It is imperative that educators, administrators, and students begin today to consider how to best prepare for and keep pace with this data-driven era of tomorrow. Undergraduate teaching, in particular, offers a critical link in offering more data science exposure to students and expanding the supply of data science talent.

Data Science for Undergraduates: Opportunities and Options offers a vision for the emerging discipline of data science at the undergraduate level. This report outlines some considerations and approaches for academic institutions and others in the broader data science communities to help guide the ongoing transformation of this field.

Suggested citation: National Academies of Sciences, Engineering, and Medicine. 2018. Data Science for Undergraduates: Opportunities and Options. Washington, DC: The National Academies Press. https://doi.org/10.17226/25104.

0 comments 1 reposts

Profile picture of Alycia Crall

Alycia Crall onto Publications

Phylogeny.fr: Robust Phylogenetic Analysis For The Non-Specialist

Phylogeny.fr is a free, simple to use web service dedicated to reconstructing and analysing phylogenetic relationships between molecular sequences.

Phylogeny.fr runs and connects various bioinformatics programs to reconstruct a robust phylogenetic tree from a set of sequences.

0 comments 0 reposts

Profile picture of Drew LaMar

Drew LaMar onto Molecular BIRDD Data

Island Summary Dataset

Explore in Radiant

Note: You must be a member of the BIRDD group to explore this data in Radiant.

0 comments 0 reposts

Profile picture of Drew LaMar

Drew LaMar onto Island and Habitat BIRDD Data

R script for combining site-month NEON data files

This link takes you to the Code Resources page with the Super Easy R Stacker Script For Beginners, with only 3 lines of code -- and now, no need to mess with file paths -- is to help non- or early-R users combine the NEON site-month data that is downloaded from the NEON data portal. 

This R script allows you to combine multiple months & sites of NEON data downloaded as a .zip file from the NEON data portal (data.neonscience.org).  


This script is only for NEON data products that are delivered as zipped folders of .csv files (most OS and IS data products but not AOP data products). 

0 comments 0 reposts

Island and Habitat BIRDD Data

0 comments 0 reposts

Profile picture of Nicole Chodkowski

Nicole Chodkowski onto Island and Habitat BIRDD Data

Molecular BIRDD Data

0 comments 0 reposts

Profile picture of Nicole Chodkowski

Nicole Chodkowski onto Molecular BIRDD Data

Morphology Dataset

Over 6500 specimens that were measured by David Lack. These data were transcribed from records deposited at the California Academy of Sciences.

Explore in Radiant

Note: You must be a member of the BIRDD group to explore this data in Radiant.

Citation

  • Lack DL (1947) Darwin's finches: an essay on the general biological theory of evolution, Cambridge University Press.
  • Lack DL (1947) Data from: Darwin's finches: an essay on the general biological theory of evolution. Dryad Digital Repository. https://doi.org/10.5061/dryad.150

0 comments 0 reposts

Profile picture of Nicole Chodkowski

Nicole Chodkowski onto Morphology BIRDD Data

BIRDD Morphology Data

0 comments 0 reposts

Profile picture of Nicole Chodkowski

Nicole Chodkowski onto Morphology BIRDD Data

Datamethods Discussion Forum

This is a place where statisticians, epidemiologists, informaticists, machine learning practitioners, and other research methodologists communicate with themselves and with clinical, translational, and health services researchers to discuss issues related to data...

Learn more...

Created by Frank Harrell

0 comments 0 reposts

Road Map for Choosing Between Statistical Modeling and Machine Learning

from the Statistical Thinking blog by Frank Harrell

0 comments 0 reposts

Profile picture of Drew LaMar

Drew LaMar onto Machine Learning

Error-Discovery Learning Boots Student Engagement and Performance, while Reducing Student Attrition in a Bioinformatics Course

This paper describes an innovative way of teaching on problem solving while using inquiry learning methods. 


ABSTRACT

We sought to test a hypothesis that systemic blind spots in active learning are a barrier both for instructors—who cannot see what every student is actually thinking on each concept in each class—and for students—who often cannot tell precisely whether their thinking is right or wrong, let alone exactly how to fix it. We tested a strategy for eliminating these blind spots by having students answer open-ended, conceptual problems using a Web-based platform, and measured the effects on student attrition, engagement, and performance. In 4 years of testing both in class and using an online platform, this approach revealed (and provided specific resolution lessons for) more than 200 distinct conceptual errors, dramatically increased average student engagement, and reduced student attrition by approximately fourfold compared with the original lecture course format (down from 48.3% to 11.4%), especially for women undergraduates (down from 73.1% to 7.4%). Median exam scores increased from 53% to 72–80%, and the bottom half of students boosted their scores to the range in which the top half had scored before the pedagogical switch. By contrast, in our control year with the same active-learning content (but without this “zero blind spots” approach), these gains were not observed.

0 comments 0 reposts

Bioinformatics Module I

This lab is the beginning of a 12-week project during which students will clone a gene from the ciliate Tetrahymena thermophila, fuse it to the gene encoding GFP, put the engineered gene back into Tetrahymena, and induce its expression. The gene will be cloned by PCR, so the first step is to design primers to allow the amplification of the desired gene. This lab will take students through the steps to do that, and will also demonstrate the use of other bioinformatics tools.

0 comments 0 reposts

Bioinformatics Module II

This lab is the beginning of a 13-week project in which students will engineer a construct to knockout a gene from the ciliate Tetrahymena thermophila. The first step is to identify an appropriate gene homolog in Tetrahymena using the databases. Students will then design primers to amplify the gene by PCR and clone it into a plasmid vector.

0 comments 0 reposts

Gene Expression Profiling

In this module, students assess the relative amount of expression of a gene of interest throughout different stages in the Tetrahymena life cycle. The life cycle is complex and involves a number of physiological changes and DNA processing events (see Tetrahymena Facts). Gene expression is evaluated through assessing the relative production of gene transcripts at different time points in the life cycle by reverse transcriptase PCR, yielding an "expression profile". (Estimated time: 5 x 4-hour laboratory periods.)

0 comments 0 reposts

GFP Tagging Module

This module was developed by Douglas Chalker, Washington University, MO. Students engineer genetic constructs for the tagging and expression of a Tetrahymena protein of interest. Fluorescence microscopy is used to determine localization of the tagged protein. (Estimated time: 9 x 4 hour laboratory periods.)

0 comments 0 reposts

Knockout Construction (KOC) Module

In this module, students engineer genetic constructs to delete individual genes from the Tetrahymena genome. (Estimated time: 10 x 4 hour laboratory periods.)

0 comments 0 reposts

SUPRDB

The Student / Unpublished Results database (SUPRDB) provides a workspace where researchers can share unpublished experimental results with each other. Many of the results in the database are produced by student researchers, and the results are linked to the Tetrahymena Genome Database to enable sharing of the results with the larger Tetrahymena research community.

0 comments 0 reposts

Tetrahymena Genome Database

The Tetrahymena Genome Database (TGD) is a wiki for the Tetrahymena research community. It provides a BLAST service, GBrowse genome browsers, and information on Tetrahymena genes and proteins.

0 comments 0 reposts

RNA-Seq Analysis GCAT-SEEK Workshop Manual

Developed by Dr. Mark Peterson, this module provides an overview of the different types of RNA-Seq data analyses. It describes how to assess the quality of RNA-Seq data with FastQC, trim adapters and low quality data with Trimmomatic, read mapping with RSEM, differential expression analysis with DESeq, variant detection with VarScan, and de novo assembly with Trinity.

 

CourseSource Publication

 

Peterson, M.P., Malloy, J.T., Buonaccorsi, V.P., and Marden, J.H. 2015. Teaching RNAseq at Undergraduate Institutions: A tutorial and R package from the Genome Consortium for Active Teaching. CourseSource. https://doi.org/10.24918/cs.2015.14

0 comments 0 reposts

GCAT-SEEK Eukaryotic Genomics Workshop Manual

Developed by Dr. Vince Buonaccorsi at Juniata College, the GCAT-SEEK Eukaryotic Genomics Workshop Manual consists of seven chapters that cover genome assembly, gene annotations, and variant calling:

  1. Genome assembly I: Quality control with FastQC and Trimmomatic
  2. Genome assembly II: Assembly size estimation and k-mer graphs
  3. Genome assembly III: Assembly algorithms
  4. Genome annotation with Maker I: Overview and repeat finding
  5. Genome annotation with Maker II: Whole genome analysis
  6. Whole genome annotation: Miscellaneous methods
  7. SNP calling and interpretation

 

CourseSource Publication

 

Buonaccorsi, V.P., Hamlin, D., Fowler, B., Sullivan, C., and Sickler, A. 2017. An Introduction to Eukaryotic Genome Analysis in Non-model Species for Undergraduates: A tutorial from the Genome Consortium for Active Teaching. CourseSource. https://doi.org/10.24918/cs.2017.1

0 comments 0 reposts

Comprehensive list of color palettes available in r

0 comments 0 reposts

Profile picture of Drew LaMar

Drew LaMar onto Data Visualization

A Wes Anderson color palette for R

0 comments 0 reposts

Profile picture of Drew LaMar

Drew LaMar onto Data Visualization