Data skills are at the core of ecological research and many other professions. The data we collect ourselves or gather from public data repositories, networks, and other sources, often requires organisation before it can be analysed. Knowledge of how to structure data and what types of data formats you might encounter is a crucial skill! This lab builds skills for data management, spreadsheet organisation, and analysis. In this lab you will examine the occurence of small mammals (ANIMALS!) in two habitat types using mark-recapture data from the National Ecological Observatory Network (NEON).
The National Ecological Observatory Network is a US-wide project funded by the National Science Foundation (NSF) to collect thirty years of key ecological data across the major ecosystem types in the US. The purpose of NEON is to provide standardized data to the scientific community, on a subset of important ecological indicators. All data is available to the public and NEON has invested in tools and training to develop essential skills for large-scale and integrated ecological studies. Learn more at the NEON website and in this intro video.
Small mammals are widespread and important parts of ecosystems even though we often think of them as pests when they live unchecked in human environments. Ecologically, small mammals play a role as grazers, predators, insectivores. They can keep pest populations in check by eating grubs, they cycle nutrients, they disperse seeds. They are also an essential foundation of many food webs as a food source for many larger animals - hawks, owls, snakes, foxes, coyotes, wolves. Small mammals have rapid lifecycles and respond quickly to environmental changes making them useful indicators of general ecosystem health. Small mammals are also an important link in the zoonotic disease cycle so their health and pathogen load has implications for many other animals. These two videos discuss trapping methods and reasons to care about small mammals.
National Park Service. From Field to Lab: Small Mammal Monitoring in Denali National Park: (1:32 - 2:30 highlights small mammal trapping/handling techniques)
University of Oxford. The Laboratory with Leaves (Part 10): Small Mammals: (This video provides context for why small mammal monitoring is important to ecology in general).
Don't try this at home. Animal reserach is always done with animal welfare review and permission to ensure safe and humane treatment of animals and minimize harm to the lowest possible extent necesary for the research!
This lab focuses on skills for:
Metadata is the data that explains the data. Metadata is critical to well documented data sets, to enhance the shareability, inter-operability, longevity, and preservation of the data.
In short, metadata allows someone else who has never worked with your dataset to understand the data without needing you to be there.
Good metadata describes what each data column contains, what abbreviations stand for, measurment units, how missing data values are recorded, the time interval and location of measurement, the methods for data collection, instrument calibration or accuracy, who collected the data, whether data collection is ongoing or completed.
The data is based on McNeil and Jones 1 and the data skills build on a Data Carpentry module by Bahlai and Teal 2. This lab has been adapted for R from Hernández-Pacheco 2018 3.
small_mammal_community.xls – Subset of the small mammal data from southern Arizona addressing the effects of rodents and ants on the plant community. This .xls contains two years of small mammal community data with multiple data table formats. This file should be used to identify common errors in formatting data tables and re-organize the data as recommended.
Abbreviated NEON Small Mammal Trapping Protocols.docx - An abbreviated version of the Small Mammal Trapping Protocol to highlight the methods used to trap, record, mark, and release the animals.
NEONSmallMammal_SCBI_BlankDataSheet.pdf - Field data collection sheet.
NEON.D02.SCBI.DP1.10072.001_variables.csv – Metadata file for NEON small mammal data (DP1.10072.001) describing the variable names.
NEON.D02.SCBI.DP1.10072.001.readme.txt – Metadata file for the NEON small mammal data (DP1.1072.001) providing more information on the data product.
NEON.D02.SCBI.DP1.10072.001.mam_pertrapnight.072014to052015.csv – This file is a NEON small mammal trapping data file from July 2014 to May 2015 at the SCBI Site which can be found on the NEON Field Sites list and here.
Do the following:
Important: Do not forget the first piece of advice: create a new file (or tab) for the cleaned data, never modify your original (raw) data.
Quality Assurance are techniques and processes to ensure that data are collected in a correct way
Quality Control are techniques and processes that ensure the collected data are up to standards and good for analysis
readxl
, dplyr
,ggplot2
,lubridate
library(readxl)
library(dplyr)
library(ggplot2)
library(lubridate)
read.csv()
then specify NA with na.strings=
and if using read_excel()
then specify with na=
. Remember to put a character in quotes, eg: na="NA"
, or na.strings="NA"
read_excel()
will format the date column to a date format automatically, if the dates are in mm/dd/yy format in excel then it should be interpreted correctly when importing.dat.format <- read_excel("formatted_small_mammal_comm-1707.xlsx", na="NA")
dat.format %>% glimpse
## Rows: 88
## Columns: 7
## $ Field_Season <dbl> 2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013…
## $ Date_Collected <dttm> 2013-07-16, 2013-07-16, 2013-07-16, 2013-07-16, 201…
## $ Species <chr> "DM", "DM", "DM", "DM", "DM", "DM", "DM", "DM", "DM"…
## $ Plot <dbl> 2, 7, 3, 1, 3, 7, 4, 4, 7, 7, 8, 7, 4, 6, 8, 3, 3, 1…
## $ Sex <chr> "F", "M", "M", "M", "M", "M", "F", "F", "M", "F", "F…
## $ Weight_grams <chr> NA, "33g", NA, NA, "40g", "48g", "29g", "46g", "36g"…
## $ Calibrated_Scale <chr> "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Ye…
arrange()
in a pipe to sort the data by the Weight_grams columnsummmary()
command is also very handy for quick data checks. Use summary()
on the datadat.format %>% arrange(Weight_grams)
## # A tibble: 88 x 7
## Field_Season Date_Collected Species Plot Sex Weight_grams
## <dbl> <dttm> <chr> <dbl> <chr> <chr>
## 1 2013 2013-11-13 00:00:00 DS 4 F 107
## 2 2013 2013-11-13 00:00:00 DS 14 F 113
## 3 2013 2013-11-12 00:00:00 DS 20 M 115
## 4 2013 2013-11-13 00:00:00 DS 4 F 115
## 5 2014 2015-07-08 00:00:00 <NA> 3 F 115
## 6 2013 2013-11-12 00:00:00 DS 9 F 117
## 7 2013 2013-11-13 00:00:00 DS 17 F 118
## 8 2013 2013-11-12 00:00:00 DS 9 F 120
## 9 2013 2013-11-12 00:00:00 DS 1 F 121
## 10 2013 2013-11-13 00:00:00 DS 11 F 122
## # … with 78 more rows, and 1 more variable: Calibrated_Scale <chr>
summary(dat.format)
## Field_Season Date_Collected Species
## Min. :2013 Min. :1978-01-08 00:00:00 Length:88
## 1st Qu.:2013 1st Qu.:2013-07-18 00:00:00 Class :character
## Median :2014 Median :2013-11-27 00:00:00 Mode :character
## Mean :2014 Mean :2010-01-02 03:16:21
## 3rd Qu.:2014 3rd Qu.:2014-01-11 18:00:00
## Max. :2014 Max. :2015-07-08 00:00:00
## Plot Sex Weight_grams Calibrated_Scale
## Min. : 1.000 Length:88 Length:88 Length:88
## 1st Qu.: 2.000 Class :character Class :character Class :character
## Median : 3.000 Mode :character Mode :character Mode :character
## Mean : 4.511
## 3rd Qu.: 4.500
## Max. :20.000
Answer:
Notice that Species, Sex, and Calibrated_Scale are characters and so summary doesn't really give us any information other than the class and length.
as.factor(VARIABLE)
, mutate()
and a pipe, to change the column formats of Species, Sex, Calibrated_scale; keep the column names the sameuse summary(dat.format)
again
Answer:
What additional information do you get from changing Species, Sex, and Calibrated_Scale to a factor?
dat.format <- dat.format %>%
mutate (Species = factor(Species),
Sex = factor(Sex),
Calibrated_Scale = factor(Calibrated_Scale))
summary(dat.format)
## Field_Season Date_Collected Species Plot
## Min. :2013 Min. :1978-01-08 00:00:00 DM :37 Min. : 1.000
## 1st Qu.:2013 1st Qu.:2013-07-18 00:00:00 DS :16 1st Qu.: 2.000
## Median :2014 Median :2013-11-27 00:00:00 DO :13 Median : 3.000
## Mean :2014 Mean :2010-01-02 03:16:21 OT : 6 Mean : 4.511
## 3rd Qu.:2014 3rd Qu.:2014-01-11 18:00:00 PF : 4 3rd Qu.: 4.500
## Max. :2014 Max. :2015-07-08 00:00:00 (Other): 4 Max. :20.000
## NA's : 8
## Sex Weight_grams Calibrated_Scale
## F :49 Length:88 No : 4
## M :34 Class :character Yes:84
## NA's: 5 Mode :character
##
##
##
##
levels()
command. The levels()
command is from base R and does not work with pipes. Instead you have to reference the intended column using the data$column syntax. Look at the levels like this:levels(dat.format$Species)
If everything looks good, great! If you noticed errors then you would have to either fix them in the clean data sheet (NOT THE RAW) or come up with code to do it in R.
geom_boxplot()
to graph the weights for each species capturedAnswer:
Answer the following: * How long are the traps deployed during each sampling bout?
* How often are the traps checked during each sampling bout?
* What are three pieces of information that get recorded for each animal captured?
* With the combination of protocols, field collection sheet, and metadata of the digital files do you feel you have a good enough basis to join the field sampling team and help them collect data? Why or why not? (Of course in reality you'd also need some training on how to safely handle small mammals because they BITE)
* What do you find helpful or confusing about the metadata files (eg: variable descriptions, units, enough background)?
read.csv()
with na.strings=c("","NA")
to tell R that cells with 'NA' or blank cells should all be read as NAdat.neon <- read.csv("NEON.D02.SCBI.DP1.10072.001.mam_pertrapnight.072014to052015.csv",
na.strings=c("","NA"))
dat.neon %>% glimpse
## Rows: 8,503
## Columns: 53
## $ uid <fct> 4516862d-da43-467c-9acf-c3285afa7f72, fd59ad…
## $ nightuid <fct> C8BBC2C9367C44FEBE9A9E4214626B80, C8BBC2C936…
## $ namedLocation <fct> SCBI_004.mammalGrid.mam, SCBI_004.mammalGrid…
## $ domainID <fct> D02, D02, D02, D02, D02, D02, D02, D02, D02,…
## $ siteID <fct> SCBI, SCBI, SCBI, SCBI, SCBI, SCBI, SCBI, SC…
## $ plotID <fct> SCBI_004, SCBI_004, SCBI_004, SCBI_004, SCBI…
## $ trapCoordinate <fct> A2, A10, H1, H2, H3, H4, H5, H6, H7, J10, H9…
## $ plotType <fct> distributed, distributed, distributed, distr…
## $ nlcdClass <fct> deciduousForest, deciduousForest, deciduousF…
## $ decimalLatitude <dbl> 38.89804, 38.89804, 38.89804, 38.89804, 38.8…
## $ decimalLongitude <dbl> -78.14542, -78.14542, -78.14542, -78.14542, …
## $ geodeticDatum <fct> WGS84, WGS84, WGS84, WGS84, WGS84, WGS84, WG…
## $ coordinateUncertainty <dbl> 45.3, 45.3, 45.3, 45.3, 45.3, 45.3, 45.3, 45…
## $ elevation <dbl> 307.9, 307.9, 307.9, 307.9, 307.9, 307.9, 30…
## $ elevationUncertainty <dbl> 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3,…
## $ trapStatus <fct> 6 - trap set and empty, 2 - trap disturbed/d…
## $ trapType <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ collectDate <fct> 7/21/14, 7/21/14, 7/21/14, 7/21/14, 7/21/14,…
## $ tagID <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ taxonID <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ scientificName <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ taxonRank <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ identificationQualifier <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ identificationReferences <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ nativeStatusCode <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ sex <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ recapture <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ fate <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ replacedTag <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ lifeStage <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ testes <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ nipples <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ pregnancyStatus <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ vagina <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ hindfootLength <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ earLength <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ tailLength <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ totalLength <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ weight <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ larvalTicksAttached <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ nymphalTicksAttached <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ adultTicksAttached <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ bloodSampleID <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ bloodSampleMethod <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ fecalSampleID <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ fecalSampleCondition <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ earSampleID <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ hairSampleID <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ hairSampleContents <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ voucherSampleID <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ measuredBy <fct> esNjjubcI51hdInmRHEEIrFR4+oIJtuN, esNjjubcI5…
## $ recordedBy <fct> vG946HPdyCak4YNcquSpnhDyztG1jnbiCP4A7p87kck=…
## $ remarks <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
ymd()
, mdy()
,dmy()
?mutate()
to create a NEW date column, don't overwrite collectDatemutate()
also create a month and year column from date (hint: we've used the functions called month()
and year()
before to extract that information from a date-object formatted date column)glimpse()
to assure yourself it worked (check column type and glimpse the new column content)dat.neon <- dat.neon %>%
mutate(date = mdy(collectDate),
month = month(date),
year = year(date))
dat.neon %>% glimpse
## Rows: 8,503
## Columns: 56
## $ uid <fct> 4516862d-da43-467c-9acf-c3285afa7f72, fd59ad…
## $ nightuid <fct> C8BBC2C9367C44FEBE9A9E4214626B80, C8BBC2C936…
## $ namedLocation <fct> SCBI_004.mammalGrid.mam, SCBI_004.mammalGrid…
## $ domainID <fct> D02, D02, D02, D02, D02, D02, D02, D02, D02,…
## $ siteID <fct> SCBI, SCBI, SCBI, SCBI, SCBI, SCBI, SCBI, SC…
## $ plotID <fct> SCBI_004, SCBI_004, SCBI_004, SCBI_004, SCBI…
## $ trapCoordinate <fct> A2, A10, H1, H2, H3, H4, H5, H6, H7, J10, H9…
## $ plotType <fct> distributed, distributed, distributed, distr…
## $ nlcdClass <fct> deciduousForest, deciduousForest, deciduousF…
## $ decimalLatitude <dbl> 38.89804, 38.89804, 38.89804, 38.89804, 38.8…
## $ decimalLongitude <dbl> -78.14542, -78.14542, -78.14542, -78.14542, …
## $ geodeticDatum <fct> WGS84, WGS84, WGS84, WGS84, WGS84, WGS84, WG…
## $ coordinateUncertainty <dbl> 45.3, 45.3, 45.3, 45.3, 45.3, 45.3, 45.3, 45…
## $ elevation <dbl> 307.9, 307.9, 307.9, 307.9, 307.9, 307.9, 30…
## $ elevationUncertainty <dbl> 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3,…
## $ trapStatus <fct> 6 - trap set and empty, 2 - trap disturbed/d…
## $ trapType <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ collectDate <fct> 7/21/14, 7/21/14, 7/21/14, 7/21/14, 7/21/14,…
## $ tagID <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ taxonID <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ scientificName <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ taxonRank <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ identificationQualifier <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ identificationReferences <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ nativeStatusCode <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ sex <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ recapture <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ fate <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ replacedTag <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ lifeStage <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ testes <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ nipples <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ pregnancyStatus <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ vagina <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ hindfootLength <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ earLength <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ tailLength <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ totalLength <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ weight <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ larvalTicksAttached <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ nymphalTicksAttached <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ adultTicksAttached <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ bloodSampleID <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ bloodSampleMethod <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ fecalSampleID <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ fecalSampleCondition <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ earSampleID <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ hairSampleID <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ hairSampleContents <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ voucherSampleID <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ measuredBy <fct> esNjjubcI51hdInmRHEEIrFR4+oIJtuN, esNjjubcI5…
## $ recordedBy <fct> vG946HPdyCak4YNcquSpnhDyztG1jnbiCP4A7p87kck=…
## $ remarks <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ date <date> 2014-07-21, 2014-07-21, 2014-07-21, 2014-07…
## $ month <dbl> 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,…
## $ year <dbl> 2014, 2014, 2014, 2014, 2014, 2014, 2014, 20…
levels(dat.neon$trapStatus)
## [1] "1 - trap not set"
## [2] "2 - trap disturbed/door closed but empty"
## [3] "3 - trap door open or closed w/ spoor left"
## [4] "5 - capture"
## [5] "6 - trap set and empty"
filter()
to select only the trapStatus '5 - capture'dat.neon.filter <- dat.neon %>%
filter(trapStatus == "5 - capture")
ggplot(dat.neon.filter,aes(decimalLongitude,decimalLatitude,colour=factor(nlcdClass)))+
geom_label(aes(label=plotID))+
labs(title="Plot Sampling by Land Cover",x="Longitude(decimal degrees)",y="Latitude (decimal degrees)")+
xlim(-78.2,-78.1)
levels()
to display the scientific names of all sampled species.levels()
to also display the four letter abbreviation (taxonID)Answer:
* Which species is denoted by each four letter abbreviation?
* What is the common name of each species?
levels(dat.neon.filter$scientificName)
## [1] "Blarina brevicauda" "Microtus pennsylvanicus"
## [3] "Mus musculus" "Peromyscus leucopus"
## [5] "Peromyscus maniculatus" "Sorex cinereus"
levels(dat.neon.filter$taxonID)
## [1] "BLBR" "MIPE" "MUMU" "PELE" "PEMA" "SOCI"
unique()
the same way.Answer:
* Write the range of days for each sample year and month.
unique(dat.neon$date)
## [1] "2014-07-21" "2014-07-22" "2014-07-23" "2014-07-24" "2014-07-25"
## [6] "2014-07-26" "2014-08-18" "2014-08-19" "2014-08-20" "2014-08-21"
## [11] "2014-08-22" "2014-08-23" "2014-09-17" "2014-09-18" "2014-09-19"
## [16] "2014-09-22" "2014-09-23" "2014-09-24" "2014-10-14" "2014-10-15"
## [21] "2014-10-16" "2015-04-14" "2015-04-15" "2015-04-16" "2015-04-17"
## [26] "2015-04-18" "2015-05-12" "2015-05-13" "2015-05-14" "2015-05-15"
In two steps:
1.
* Use filter()
to remove all tagIDs that are NA. (hint: is.na()
allows you to search for NA values in R and ! indicates 'not' so !is.na()
can be used to exclude NA values)
* Then use distinct(year,month,plotID,tagID,.keep_all=TRUE)
to remove all individuals that were recaptured in each sample month and at each plot
* Name this filtered dataframe dat.catch
2.
* Then, use dat.catch to calcuate the number of unique individuals caught for each year, month, taxonID, and nlcdClass
* Use group_by()
and summarise(count=n())
to calculate the number of individuals. For summarise(count=n())
, here () inside the n()
remains empty.
* Call this summarised dataframe dat.numbers
# remove tagIDs with NA and select only distinct captures in each year, month, plotID, and tag ID. Keep all data columns.
dat.catch <- dat.neon.filter %>%
filter(!is.na(tagID))%>%
distinct(tagID, .keep_all=TRUE)
# Calculate the number of unique individuals captured for each year, month, species, and habitat
dat.numbers <- dat.catch %>%
group_by(year,month,taxonID,nlcdClass) %>%
summarise(count=n())
geom_col(position="dodge",width=0.5)
to make a bar graphfacet_grid()
to show the habitat types in rows and the two sample years in columnsggplot(dat.numbers, aes(month, count, fill=taxonID))+
geom_col(position="dodge",width=0.5)+
facet_grid(nlcdClass~year)+
labs(title="Species abundance by habitat type at NEON SCBI",x="Month",y="Count of unique individuals")
scales="free_y"
inside the facet_grid()
command to allow the y-axes to scale to the maximum value in the figures, just be aware that the axes of the panels are different and deceptive at a quick glance!ggplot(dat.numbers, aes(month, count, fill=taxonID))+
geom_col(position="dodge",width=0.5)+
facet_grid(nlcdClass~year, scales="free_y")+
labs(title="Species abundance by habitat type at NEON SCBI (note y-axis scale)",x="Month",y="Count of unique individuals")
Answer:
McNeil, J., Jones, M. A. (2018). Data Management using NEON Small Mammal Data with Accompanying Lesson on Mark Recapture Analysis. NEON - National Ecological Observatory Network, QUBES. doi:10.25334/Q4XH5S↩
Christie Bahlai and Tracy Teal (eds): “Data Carpentry: Data Organization in Spreadsheets Ecology Lesson.” Version 2017.04.0, April 2017, http://www.datacarpentry.org/spreadsheet-ecology-lesson/, https://doi.org/10.5281/zenodo.570047↩
Hernández-Pacheco, R. H. (2018). More In Depth Spreadsheet Management Adaptation of Data Management using NEON Small Mammal Data. NEON Faculty Mentoring Network, QUBES Educational Resources. doi:10.25334/Q44X4D↩