You are here Biopharmaceutical/ Genomic glossary homepage > Biology > Nomenclature, genes, proteins & species

Nomenclature, Genes, proteins, species
Evolving terminology for emerging technologies
Comments? Revisions? Questions?

Mary Chitty, Library Director & Taxonomist MSLS
Last revised January 09, 2020

K. Tipton and  S Boyce in “History of the enzyme  nomenclature system”  write that “ambiguities in the words used for common objects or actions have been the basis for many, more- or- less memorable jokes, they can also cause a great deal of confusion … in the sciences ... many groups have stressed the need for standardized, universally accepted systems of nomenclature in chemistry, genetics, enzymology, etc. However, it is the universal acceptance that usually causes the problem. It is rare to find people who will admit that they find nomenclature to be an interesting subject, but many who profess contempt for it will get very excited if it is suggested that their pet nomenclature should be changed in the interest of clarity or uniformity.” Bioinformatics 16(1): 34- 40 Jan 2000

Biology & Chemistry term index   Related glossaries include Functional genomics,      Model & other organisms

Carbohydrate nomenclature
IUPAC-IUBMB Joint Commission on Biochemical Nomenclature, Nomenclature of Carbohydrates, Recommendations,

IUPAC-IUB Joint Commission on Biochemical Nomenclature (JCBN), Nomenclature of glycoproteins, glycopeptides and peptidoglycans, Recommendations 1985.

Chemicals: InChI Chemical Identifier: The IUPAC International Chemical Identifier (InChITM) is a non-proprietary identifier for chemical substances that can be used in printed and electronic data sources thus enabling easier linking of diverse data compilations.

Clone nomenclature: Standardized clone names, NCBI

drug nomenclature, clinical:  RxNorm provides normalized names for clinical drugs and links its names to many of the drug vocabularies commonly used in pharmacy management and drug interaction software, including those of First Databank, Micromedex, MediSpan, Gold Standard, and Multum. By providing links between these vocabularies, RxNorm can mediate messages between systems not using the same software and vocabulary.

Enzyme Nomenclature A classification according to the Enzyme Commission (EC) of the IUBMB (International Union of Biochemistry and Molecular Biology). Enzymes are allocated four numbers, the first of which defines the type of reaction catalyzed, the next two define the substrates, and the fourth is a catalogue numbers. Categories of enzymes are EC 1, Oxidoreductases; EC 2 Transferases; EC 3 Hyedrolases; EC 4 Lyases, EC 56 Isomerases; EC 6 Ligases (Synthetases). IUPAC Bioinorganic

Enzyme Nomenclature, Nomenclature Committee of the International Union of Biochemistry and Molecular Biology
Links to individual enzyme documents, includes citations for published versions.

Gene nomenclature - integrating
While the Gene OntologyTM Consortium   Functional genomics glossary is not dealing specifically with gene nomenclature their efforts at integrating terminology are an important step being able to compare genes in any number of organisms (plants, model organisms, and animals including humans).  

Genes with multiple aliases seem to be the rule, rather than the exception, whereas genes that have no functional relationship with each other can often bear the same names. As biologists strive to make sense of the growing wealth of genomic information, this messy nomenclature is becoming a bugbear. ... Attempts to impose standard names across the board are meeting stiff resistance, and approaches that would give genes unique ID numbers seem unlikely to take off unless journals enforce the system. But a coalition of leading geneticists may have the answer. The Gene Ontology (GO) Consortium is sidestepping the naming issue by developing 'controlled vocabularies'. These will allow software to scan the genomic databases and link related genes to one another using terms that consistently describe their functions, regardless of what the genes are called. Helen Pearson "Biology's name game" Nature 411: 631-632, 7 June 2001

Human gene nomenclature For each known human gene we approve a gene name and symbol (short-form abbreviation).  All approved symbols are stored in the HGNC database.  Each symbol is unique and we ensure that each gene is only given one approved gene symbol.  … We have already approved almost 33,000 symbols; the vast majority of these are for protein-coding genes, but also include symbols for pseudogenes, non-coding RNAs, phenotypes and genomic features  HUGO Gene Nomenclature Committee About the HGNC

The Human Gene Nomenclature Committee (HGNC) is part of the Human Genome Organization (HUGO), and has established official, unique nomenclature for human genes. If a gene lacks official nomenclature, the research community is encouraged to use the Nomenclature Committee web form to submit a proposed gene symbol and name. Guidelines for naming are provided. (09/20/07) SNP FAQ Archives

HUGO Gene Nomenclature Committee 
HGNC Gene Grouping Family Nomenclature
Nomenclature for the Description of Sequence Variations, Human Genome Variation Society. 2007
Standardized Human Gene Nomenclature, NCBI News, Fall/ Winter 2000

Human gene mutation nomenclature
Mutations "Recommendations for a nomenclature system for human gene mutations, Nomenclature Working Group Human Mutation 11(1):1-3 1998;2-O/abstract 

JT Dunnen, SE Antonarakis SE "Mutation nomenclature extensions and suggestions to describe complex mutations: a discussion" Human Mutation 2000; 15 (1): 7- 12  While a codified mutation nomenclature system for simple DNA lesions has now been adopted broadly by the medical genetics community, it is inherently difficult to represent complex mutations in a unified manner. In this article, suggestions are presented for reporting just such complex mutations. 

Model organisms The Gene OntologyTM project is a collaboration between the Arabadopsis, C. elegans, Drosophila, mouse and Saccharomyces people.  Gene definitions See also Model organisms glossary.

C. elegans Nomenclature: 
Flybase Drosophila Nomenclature:
Mouse Genome Nomenclature, Jackson Lab, US
Rat Nomenclature Guidelines 
Saccharomyces cerevisae
Nomenclature: SGD Gene Nomenclature Conventions,
Zebrafish nomenclature Guidelines

post-genomic nomenclature: Ideally, our formalized system of nomenclature is supposed to improve communication among biologists.  In reality, it seems to be a major obstacle, especially when misapplied.  Although the problem is evident in the literature, it is most severe in the sequence databases, which now serve as the principal source and repository of data used in comparative biology.  Moreover, the sequence databases tend to propagate such errors for a variety of reasons.  As biological data proliferates and interconnects, it depends increasingly on software infrastructure, and it becomes increasingly obvious that biological names do not meet the requirements of a good identifier, in strict computing terms.  A good identifier should be unique and persistent.  As an outgrowth of my current DOE funded project, we have been exploring a practical and workable solution that we believe will help solve the problem in a future- proof fashion. Dr. George Garrity – “Carolus Linneaus in the postgenomic era” Contractor Grantee Workshop, DOE Genomes to Life, US, Feb. 9-12, 2003 

Protein nomenclature
UniProt - Swiss-Prot Protein Knowledgebase, List of nomenclature related references for proteins, 2007
Nomenclature of glycoproteins, glycopeptides and peptidoglycans, Recommendations, IUPAC-IUB Joint Commission on Biochemical Nomenclature (JCBN)
Nomenclature and Symbols for Amino Acids and Peptides, Recommendations, IUPAC and IUBMB and IUPAC-IUB Joint Commission on Chemical Nomenclature 1983   
See also Enzyme Nomenclature, Sequences, DNA & beyond: splice variants

Species  taxonomy browser NCBI, US  browser  has direct links to some of the organisms commonly used in molecular research projects    See also Phylogenomics glossary: species

Taxonomic databases are rather controversial since the soundness of the taxonomic classifications done by one taxonomist will be directly questioned by next taxonomist!  Various efforts are going on to create a taxonomy resource (e.g. "The Tree of Life" project  (, "Species 2000" (, International Organization for Plant Information, Integrated Taxonomic Information System, etc.). The most generally useful taxonomic database is that maintained by the NCBI [see above]   This hierarchical taxonomy is used by the Nucleotide Sequence Databases, SWISS-PROT and TrEMBL, and is curated by an informal group of experts. [Introduction to Molecular Biology Databases, R. Apweiler, R. Lopez, B. Marx, 1999

uBio: an initiative within the science library community to join international efforts to create and utilize a comprehensive and collaborative catalog of known names of all living (and once-living) organisms. The Taxonomic Name Server (TNS) catalogs names and classifications to enable tools that can help users find information on living things using any of the names that may be related to an organism 

Nomenclature resources
HUGO, Gene Nomenclature Committee  2008 Update 

Gene Family index

INTERNATIONAL UNION OF BIOCHEMISTRY AND MOLECULAR BIOLOGY, Recommendations on Biochemical & Organic Nomenclature, Symbols & Terminology etc, prepared by G. P. Moss

IUPAC and IUBMB, Biochemical Nomenclature Committees,  2013  

Sequence Variant Nomenclature

IUPAC definitions are reprinted with the permission of the International Union of Pure and Applied Chemistry.

Contact | Privacy Statement | Alphabetical Glossary List | Tips & glossary FAQs | Site Map