Resources

Resource Image

A Critical Guide to the UniProtKB Flat-file Format

Author(s): Teresa Attwood1, GOBLET Foundation

The University of Manchester, Manchester, UK

996 total view(s), 249 download(s)

0 comment(s) (Post a comment)

Summary:
This Critical Guide briefly presents the need for biological databases and for a standard format for storing and organising biological data.

Licensed under CC Attribution-ShareAlike 4.0 International according to these terms

Version 1.0 - published on 05 Dec 2020 doi:10.25334/ZQRR-1577 - cite this

Description


This Critical Guide briefly presents the need for biological databases and for a standard format for storing and organising biological data. Web-based interfaces have made databases more user-friendly, but knowledge of the underlying file format offers a deeper understanding of how to navigate and mine the information they contain, so that humans and machines can get the most out of them. This Guide explores the file format that underpins one of today’s most popular protein sequence databases – UniProtKB.

Specifically, this Guide introduces the concept of database ‘flat-files’, and examines features of the UniProtKB flat-file format. On reading this Guide, users will be able to:

  •     identify key fields within UniProtKB/Swiss-Prot and /TrEMBL flat-files;
  •     explain what these fields mean, what information they contain and what the information is used for;
  •     analyse the information in different fields and infer structural and functional features of a sequence;
  •     examine and investigate the provenance of annotations; and
  •     compare annotations at different time-points and evaluate the likely impact of annotation changes.

Cite this work

Researchers should cite this work as follows: