Ch2_html.knit

Today’s investigation. Sampling subsets of populations is a basic practice in biology; after all, we can’t measure every member of the population. Biologists are constrained by logistics, time, and funding, and thus rely on measuring samples. As the features characterizing a sample are not identical to those characterizing the whole population, measurements made from a sample are affected by who gets sampled and who does not. Even more challenging, measurements from samples also depend on who is sampling the individuals (us!). Today, we will explore some of the factors that give rise to sampling error and, consequently, decrease measurement accuracy and precision through digital image analysis of carnivore skulls from CSU Long Beach CNSM Vertebrate Collections.

Introduction

In this lab, we will explore the difference between measurement accuracy and precision while getting some hands-on experience in digital image analysis. A fundamental task among biologists is to minimize both sampling error or the difference between our estimate and the true population parameter due to chance and bias, or the systematic discrepancy between our estimate and the true population parameter. A group of estimates can have high accuracy and low precision, while others can be inaccurate but very precise. As you might guess, both high measurement accuracy and precision are important to be closer to the truth. So, how can we determine accuracy and precision in our measurements and avoid error and bias?

Today, we will sample carnivore skulls and measure the canine length using linear distances on digital images to quantify measurement accuracy and precision, and to contrast between sampling error and bias in measurement recording (Figure 1). Digital image analysis provides means to increase precision (e.g., high resolution) but it can also be a source of bias (e.g., curvature of the visual field by the lens). However, not only this measurement process affects our estimates, we can also be a source of error and bias through poor scaling or other visual limitations. So, let’s first explore and quantify measurement accuracy and precision in our own estimates, then we will analyze the entire classroom’s data to investigate how sample size affects accuracy and precision.

Figure 1. Coyote skull (left) and upper right canine (right), CSU Long Beach CNSM Vertebrate Collection.

Upon completion of this lab, you should be able to:

Quantify the accuracy of measurements;
Quantify the precision of measurements;
Compare and contrast sampling error and bias in measurement recording.

Worked example

To get started, let’s remind ourselves the definitions of sample, true value, accuracy and precision:

sample - a set of individuals in a population
true value - the actual population value that would be estimated if we were to measured every individual in a population wiht no error nor bias
accuracy - the closeness of the measurements to a particular value, say the true value
precision - the closeness of the measurements to each other

By definition, accuracy, and precision are independent of each other but both affect the conclusions of any investigation. Say that you recorded nine measurements (orange dots in Fig. 2) ranging from 7.1 to 11.2 units and a mean value of 8.3 units. The difference between the minimum and maximum values (the range) is an estimate of precision, which in this case is 4.1 units (Figure 2, blue line). Thus, this mean measurement of 8.3 units also has a mean deviation of 4.2 units (8.3 ± 4.1 units). This is independent of your measurement accuracy. Here, the true value is 3 units (Figure 2, black dot), which is 4.1 units different from the minimum value (7.1 units), 8.2 units different from the maximum value (11.2 units), and 5.3 units different from the mean value (8.3 units). So, with the available information, we can argue that the measurements are relatively precise but highly inaccurate (no single estimate is close to the true value).

Figure 2. Measurement accuracy and precision given its true value.

Materials and Methods

R and RStudio
Image J
Spreadsheet (e.g., Excel, Google Sheets)
Digital images of carnivore skulls (coyote, bob cat, kangaroo rat)

Today’s activity Sampling carnivore canines is organized into two main exercises exploring accuracy and precision in measurements. First, you will generate data by analyzing digital images of carnivore skulls. Second, you will analyze the data to determine accuracy and precision and discuss potential sources of sampling error and bias.

Sampling carnivore canines

1. Digital image analysis

You will be assigned a set of images of one out of three different carnivores; bob cat, coyote, and kangaroo rat, for digital analysis.

A. If you have not done so already, download Image J.

B. Open a spreadsheet and name it: LastnameFirstname_Species.

Step 1: Create the following header of 4 columns: “image_number” “specimen_id” “tooth_side” “length_cm”.
Step 2: Fill in the information in each row. The information corresponding to each column can be found in the image file name. For example: the name of the first image for the bob cat specimen is “1-BCat_L21_left”. Thus, the image_number is “1”, the specimen_id is “BCat_L21”, and the tooth_side is “left”. Length_cm is the canine length measurement you will estimate later.
Step 3: Save this file and keep it open, you will use it to record all measurements. Make sure it’s saved as a comma separated values file with extension “.csv”.

C. Open ImageJ. You will repeat the following steps for each image.

Step 1: Click on the Straight toolbox:

D. Select an image for analysis. In the main menu bar, go to File -> Open. Select an image. There are a total of 6 images you should analyze. These images are duplicates and so you need to measure them in the order they are given (1 to 6) to avoid extra biases. Note that you can zoom in and out the image to desired view using standard shortcuts or clicking Image -> Zoom.

Step 1: Set the image scale. To set the scale in the ruler, drag the cursor from one hatch mark to the next to achieve 1cm (see image below).
Step 2: Go to Analyze -> Set Scale. In “Known distance”, type “1” and in “Unit of length” type “cm”. Leave the rest as is. Hit “OK”. This step is very important, you are assigning a reference that the software will use to determine the magnitude of your measurements. You have to repeat Step 1 and Step 2 in Part D for each image. You are now ready to measure!

E. Measure the canine length in cm in each image.

Step 1: Drag the cursor to create a line from the tip of the tooth to the uppermost part attached to the base.
Step 2: With the line drawn, go to Analyze -> Measure. A chart should appear with multiple measurements including “Length” (see figure below). Length is the only measure of interest for this exercise.
Step 3: Go to your excel file and record each length measurement generated with each image. You should finish with a total of six measurements. Don’t forget to include the id and tooth you just measured!

Info-Box! Before any data analysis, there is a very important step: data management. Through the process of data management, we are able to arrange our dataset appropriately for analysis. These are some generalizations on how to do good data management.

Rules for good spreadsheets:

Variables are assigned to columns
Observations (individuals) are assigned to rows
Avoid empty cells between columns and rows

2. Data analysis

Although we just have a sample size of three (i.e., three images per canine), let’s practice how to get summary statistics in R. Note: the example below follows the suggested heading of the data in Part A.

A. Import your data to RStudio. For simplicity, let’s call the data object “canine”. Hint: review Chapter 1 script!

# Importing data
canine <- read.csv("canine.csv",header=TRUE)

# viewing the data
canine

Questions:

What does each column in “canine” represent?
What does each row in “canine” represent?

B. Summarize each variable in your data. For that, we can use the function summary().

# summarizing the data
s <- summary(canine)
s

Questions:

What does each column in your summary represent?
Do you have the expected sample size per canine?

C. Summarize the data across teeth type. As any language, there are many different ways to “say the same”. That is, other functions exist to estimate the same data summary. For instances, to get the minimum value of a column in your data you can use the function min() but we need to explicitly indicate the column of interest with the $ sign (meaning “within”):

# minimum length value 
m1 <- min(canine$length_cm)
m1

# maximum length value
m2 <- max(canine$length_cm)
m2

# mean length value
m3 <- mean(canine$length_cm)
m3

# median length value
m4 <- median(canine$length_cm)
m4

D. Estimating accuracy and precision. Assuming a true value of mean canine length in the entire population, say the true value “TV” (provided by the instructor), an approximation of the accuracy of our estimates can be calculated as the absolute value of the difference between the mean estimate and TV. On the other hand, precision is calculated as the difference between the maximum and minimum values.

# accuracy 
a <- abs(m3-TV) 
a

# precision
p <- m2-m1
p

Questions:

Are your measurements accurate?
Are your measurements precise?

E. Estimating accuracy and precision for the left canine. To avoid errors due to natural differences between teeth, let’s now carry out the data summary for the left tooth only. This involves a bit of data management. Let’s use the package tidyverse to filter by tooth and estimate the summary. Hint: check out Chapter 1 exercise’s script!

# loading the package
library(tidyverse)

# viewing the data
canine

# filtering by the left canine
left <- filter(canine,tooth=="left")
left

# summary
summary(left)

# precision
max(left$length_cm) - min(left$length_cm)

# accuracy
abs(mean(left$length_cm)-TV)

Questions:

How precise were your length measurements of the upper left canine? Hint: When referring to precision, we refer to the spread of measurements. That is, how much variation was there between measurements and how much measurements agree with each other.
How accurate was your mean length measurement of the upper left canine? Hint: When referring to accuracy, we refer to the closeness of the measurements to the true value.
What factors/processes do you think could have influenced accuracy and precision? Hint: Think about the logistics involving digital image analysis and ImageJ, the potential biases and objectivity in measurement recording.

Stop, Think, Do: Now, it is your turn to estimate the descriptive summary for the upper right canine and quantify accuracy and precision. Stop and review the codes in Steps C and D of Part 2. Think about how you could manipulate such codes in order to do the same analysis for the right canine. Hint: give appropriate names to the new objects you will create for the right canine. Such names should not overwrite previous ones in your script. Do the analysis and be ready to present it!

Discussion questions:

Can highly accurate measurements have low precision?
If you estimate a difference between the maximum and minimum value of 5 cm and your classmate estimates a difference of 2 cm, what measurements are more precise?
Mention two potential sources of sampling error or bias in this activity.

Great Work!

Data sampling, accuracy, and precision

Chapter 2