Using DataCamp to help teach data science in a biostatistics class
A large part of science can be described as the process, through observation and measurement, of extracting information from the world in the form of data. Even with in-silico models, we construct analytic information through mathematical analysis, or record "measurements" on resulting simulations. Statistics or machine learning can then be used on the results or data to extract relevant information.
I have been teaching an Introduction to Biostatistics course at the College of William and Mary for the past 4 years. The focus of my course is "data", which allows me to organize the course around the following questions:
- How do we design experiments that will lead to data containing information about the real world that most effectively stands as evidence for our specific hypothesis or question?
- With this data in hand, what skills do we need to learn in order to store, manipulate and visualize the data?
- What statistical tools/techniques should we use to extract the maximal amount of relevant information from our data?
These questions can be visualized as three points on the following "data science triangle":
It is important to mention here that the image above is not meant to silo or imply that these three areas are done in isolation. For example, an effective practicing scientist will have deep knowledge of the statistical procedures they will use and the data they will collect as they are designing their experiments. All of this leads me to the following statement: data is central to science and statistics, and to leave out two of the three points of the data science triangle in a biostatistics course (design and basic data handling skills) is doing a disservice to our students.
There are many interesting opportunities and potential challenges associated with how to incorporate more experimental design and data skills in a biostatistics course (stay tuned!) For example, should a biostatistics course have a separate lab for data skills? Should students be designing their own experiments and collecting their own data? Side stepping these extremely important discussions, let's focus for now on the following question: Where can faculty go for resources and training in data skills, and how can those skills be brought into the classroom? Two entities that immediately come to my mind are Data Carpentry and DataCamp. There are of course plenty others - feel free to mention your favorites in the comments! The remainder of this post introduces DataCamp and their offerings - but I encourage everyone to also take a look at Data Carpentry!
DataCamp is an online learning platform with courses that teach data science in the languages R and Python. These languages are easily the front-runners of the data science world, and knowing at least one of them is, in my opinion, a huge asset to your students' future success and employability. DataCamp runs on a freemium model: there are two entirely free courses, Intro to Python for Data Science and Introduction to R, and the first chapter of many (if not all) of their premium courses are free. In order to have access to all content in their courses, there is a $29/month fee (or $300/year). This can be pricey for a student, especially if it is needed for an entire semester. While lab fees can be used to offset this cost, DataCamp has done something pretty awesome: use of all content is free for educational use! To learn more about their educational offerings, click here. One aspect that might be problematic is a stated minimum of 10 students in your class. I am unsure how strict this threshold is, however. If you have less than 10 students in your class, I would not hesitate to contact DataCamp support and ask them. I've interacted with their support team on multiple occasions and they are fantastic!
So what courses do my students take? Below are clickable DataCamp badges corresponding to each course. Remember, the first chapter of each course is free, so go check them out!
What has been your experience with DataCamp? Do you teach data management skills? What else would you like to know or share about integrating experimental design, data science and biostatistics in a single course? Leave us a comment below!