Resources
A Critical Guide to the UniProtKB Flat-file Format
Author(s): Teresa Attwood1, GOBLET Foundation
The University of Manchester, Manchester, UK
981 total view(s), 245 download(s)
Description
This Critical Guide briefly presents the need for biological databases and for a standard format for storing and organising biological data. Web-based interfaces have made databases more user-friendly, but knowledge of the underlying file format offers a deeper understanding of how to navigate and mine the information they contain, so that humans and machines can get the most out of them. This Guide explores the file format that underpins one of today’s most popular protein sequence databases – UniProtKB.
Specifically, this Guide introduces the concept of database ‘flat-files’, and examines features of the UniProtKB flat-file format. On reading this Guide, users will be able to:
- identify key fields within UniProtKB/Swiss-Prot and /TrEMBL flat-files;
- explain what these fields mean, what information they contain and what the information is used for;
- analyse the information in different fields and infer structural and functional features of a sequence;
- examine and investigate the provenance of annotations; and
- compare annotations at different time-points and evaluate the likely impact of annotation changes.
Cite this work
Researchers should cite this work as follows:
- Attwood, T., GOBLET Foundation (2020). A Critical Guide to the UniProtKB Flat-file Format. Network for Integrating Bioinformatics into Life Sciences Education, QUBES Educational Resources. doi:10.25334/ZQRR-1577