ICYMI: Report from the 2018 Two-Year College Data Science Summit
In May 2018, the American Statistical Association hosted the Two-Year College Data Science Summit in Arlington, VA. With primary funding from the National Science Foundation and additional support from Booz Allen Hamilton, the summit brought together 72 educators, researchers and practitioners in statistics, mathematics, computer science, and data science. Summit participants included faculty from two-year colleges, four-year colleges and representatives from industry, government and non-profits, with a primary goal of developing curricular guidelines to assist two-year colleges in establishing and maintaining data science programs.
The final report frames the summit by motivating the importance of data science:
Participants were asked to consider data science as a developing field that merges the disciplines of statistics, mathematics, and computer science in order to facilitate the ability to draw meaning and understanding from data. The ubiquity of data as well as the complexity and scale of these data drives a need for a workforce that can safely and securely store, maintain and provide data; that can access data from a variety of sources and prepare it for analysis; that can find meaningful patterns in large and complex data sets and communicate these findings along with the data limitations to diverse communities; and that can scale algorithms for data discovery, classification, and prediction. Equally important is the growth of a citizenry that is aware not only of the role that data play in a democracy, but also of the need to maintain and protect security and privacy.
As of the summit in 2018, the report identified 11 certificate programs at two-year colleges, one "direct to workplace" associate degree program, and six associate degree programs intended for transfer to four-year colleges or universities.
The report makes seven recommendations, many of which are in close alignment with BioQUEST's history of engagement in quantitative biology and current conversation on the importance of data science for undergraduate biology education:
Recommendation 1: Create courses that provide students with a modern and compelling introduction to statistics that, in addition to traditional topics in inferential statistics, includes exploratory data analysis, the use of simulations, randomization-based inference, and an introduction to confounding and causal inference.
Recommendation 2: Ensure that students have ample opportunities to engage with realistic problems using real data so that they see statistics as an important investigative process useful for problem solving and decision-making.
Recommendation 3: Explore ways of reducing mathematics as a barrier to studying data science while addressing the needs of the target student populations and ensuring appropriate mathematical foundations. Consider a "math for data science" sequence which emphasizes applications and modeling.
Recommendation 4: Design courses so that students solve problems that require both algorithmic and statistical thinking. This includes frequent exposure to realistic problems that require engaging in the entire statistical investigative process and are based on real data.
Recommendation 5: All programs should (a) expose students to technology tools for reproducibility, collaboration, database query, data acquisition, data curation, and data storage; (b) require students to develop fluency in at least one programming language used in data science and encourage learning a second language.
Recommendation 6: Ethical issues and approaches should be infused throughout the curriculum in any program of data science.
Recommendation 7: Whenever possible, classroom pedagogy should foster active learning and use real data in realistic contexts and for realistic purposes. Programs should consider portfolios as summative and formative assessment tools that both improve and evaluate student learning.