You are here Biopharmaceutical/ Genomic glossary homepage > Biology > Proteins > Protein structures

Protein structures glossary & taxonomy
Evolving terminology for emerging technologies

Comments? Suggestions? Questions?
Mary Chitty MSLS
Last revised October 23, 2018 

Sequencing of the human genome has increased the number of candidate proteins for clinical development and therapeutic use.  Efforts are under way to identify and understand biological mechanisms that exist between proteins and to get information on the structure of proteins as they exist within biological complexes.  A major challenge is to understand how proteins fold and how protein structure relates to protein function.

Biology & chemistry term index   Related glossaries include  ProteomicsProtein informatics;   Technologies Protein Technologies   Mass Spectrometry, NMR & X-ray Crystallography  Sequencing
Biology Biomolecules, Expression, Protein categoriesProteinsSequences, DNA & beyond

3D protein structures: The conformation into which a protein “folds.” For proteins consisting of only one polypeptide chain, it is the tertiary structure that is usually referred to by the term “the 3D structure of a protein.  Related term: protein structures

aggregation: Hopelessly tangled and complete amorphous masses of protein fibers. W. Thomasson “Unraveling the mystery of protein folding” FASEB 1997  Related term: misfolding

alpha-helix, alpha-helices: See secondary protein structure, tertiary protein structure.

amino acid motifs: Commonly observed structural components of proteins formed by simple combinations of adjacent secondary structures. A commonly observed structure may be composed of a CONSERVED SEQUENCE which can be represented by a CONSENSUS SEQUENCE. MeSH, 2000  Related term: consensus sequence See also motifs.

beta-sheets: See secondary protein structure, tertiary protein structure

Biophysical and Structural Analysis April 10-11, 2019 Boston, MA Program Implementing Emerging Technologies for Improved Product Quality and Accelerated Development Timelines Biophysical and structural analysis are now playing increasingly important roles in the discovery and development of next generation biotherapeutics. Developability assessment is now standard practice across the industry, and understandings gained at this step is now being applied in the optimization of candidates at early stages of the pipeline.  Higher resolution tools are enabling better understandings of how to characterize and control aggregation and particulates and are increasingly allowing these methods to be used in a quantitative, rather than qualitative way.

Blue Gene Project: IBM, Blue Gene Project
comparative protein structure modeling: Protein informatics
conformation: See
protein conformation.
crystallomics: Omes & omics

denaturation: The process of partial or total alteration of the native structure of a macromolecule resulting from the loss of tertiary or tertiary and secondary structure that is a consequence of the disruption of stabilizing weak bonds. Denaturation can occur when proteins and nucleic acids are subjected to elevated temperature or to extremes of pH, or to non- physiological concentrations of salt, organic solvents, urea or other chemical agents. IUPAC Biotech

domain: An independently folded unit within a protein, often joined by a flexible segment of the polypeptide chain. IUPAC Bioinorganic

A discrete portion of a protein assumed to fold independently of the rest of  the protein and possessing its own function. [NCBI Bioinformatics]

A region of a protein’s amino acid sequence that has evolutionary, structural, or functional significance. The amino acid sequence of a domain determines a protein’s 3D structure. ... The stated goal of structural genomics, as a field, involves generating a set of structures representative of most of the possible folds for specific protein domains and then solving the structures for new proteins based on known fold- structure relationships. Pharmaceutical researchers are most interested in domains because these determine the “active” or “binding” sites of molecules. Related terms: mosaic proteins, multi- domain proteins, protein families, target selection

fold recognition: See threading

intrinsically disordered proteins:  protein that lacks a fixed or ordered three-dimensional structure.[2][3][4] IDPs cover a spectrum of states from fully unstructured to partially structured and include random coils, (pre-)molten globules, and large multi-domain proteins connected by flexible linkers. They constitute one of the main types of protein (alongside globular, fibrous and membrane proteins).[5]  The discovery of IDPs has challenged the traditional protein structure paradigm, that protein function depends on a fixed three-dimensional structure. This dogma has been challenged over the 2000s and 2010s by increasing evidence from various branches of structural biology, suggesting that protein dynamics may be highly relevant for such systems. Despite their lack of stable structure, IDPs are a very large and functionally important class of proteins. In some cases, IDPs can adopt a fixed three-dimensional structure after binding to other macromolecules. Overall, IDPs are different from structured proteins in many ways and tend to have distinct properties in terms of function, structure, sequence, interactions, evolution and regulation.[6]    Wikipedia accessed 2018 Feb 20

: Protein misfolding and protein aggregation have been shown to be involved in a number of diseases, particularly neurodegenerative ones.
Related terms fold alignment, fold recognition, protein folding;  foldedness

molecular chaperones: A family of cellular proteins that mediate the correct assembly or disassembly of other polypeptides, and in some cases their assembly into oligomeric structures, but which are not components of those final structures. It is believed that chaperone proteins assist polypeptides to self- assemble by inhibiting alternative assembly pathways that produce nonfunctional  structures. Some classes of molecular chaperones are the nucleoplasmins, the CHAPERONINS and HEAT- SHOCK PROTEINS. MeSH, 1995

motif: A short conserved region in a protein sequence. Motifs are frequently highly conserved parts of domains. [NCBI Bioinformatics] See also amino acid motifs

multi-domain proteins:
Most proteins are multi- domain.  Structure determination is easiest for single- domain proteins (and these are many of  the ones that have been solved). The interactions between a protein's domains can be complex and can be very significant for protein function and for drug discovery. 

multimeric: See under protein conformation.

native state: For proteins or nucleic acids, the formation in the intact cell. Final configuration

oligomeric proteins: Proteins composed of two or more polypeptide chains.  

peptide receptors: Cell surface receptors that bind peptide messengers with high affinity and regulate intracellular signals which influence the behavior of cells. MeSH, 1994

peripheral proteins:   See also under membrane proteins

protein conformation: The characteristic 3-dimensional shape of a protein, including the secondary, supersecondary (motifs), tertiary (domains) and quaternary structure of the peptide chain. Quaternary protein structure describes the conformation assumed by multimeric proteins (aggregates of more than one polypeptide chain). MeSH, 1972 

The spatial arrangement of the atoms affording distinction between stereoisomers which can be interconverted by rotations about formally single bonds. Some authorities extend the term to include inversion at trigonal pyramidal centres and other polytopal rearrangements. IUPAC Stereo

protein domains:  Wikipedia  See also domain Broader term?  Any non protein domains?

protein family:: Related terms: protein superfamily, protein subfamilies 

protein folding: A rapid biochemical reaction involved in the formation of proteins. It begins even before a protein has been completely synthesized and proceeds through discrete intermediates (primary, secondary, and tertiary structures) before the final structure (quaternary structure) is developed. MeSH, 1993

Related terms: misfolding, protein folds, protein folding problem, refolding;  Narrower term: high-throughput protein refolding

protein folding problem: The protein folding problem is the question of how a protein’s amino acid sequence dictates its three-dimensional atomic structure. The notion of a folding “problem” first emerged around 1960, with the appearance of the first atomic-resolution protein structures. Some form of internal crystalline regularity was previously expected (117), and α-helices had been anticipated by Linus Pauling and colleagues (180181), but the first protein structures—of the globins—had helices that were packed together in unexpected irregular ways. Since then, the protein folding problem has come to be regarded as three different problems: (a) the folding code: the thermodynamic question of what balance of interatomic forces dictates the structure of the protein, for a given amino acid sequence; (b) protein structure prediction: the computational problem of how to predict a protein’s native structure from its amino acid sequence; and (c) the folding process: the kinetics question of what routes or pathways some proteins use to fold so quickly. We focus here only on soluble proteins and not on fibrous or membrane proteins. Dill KA, Ozkan SB, Shell MS, Weikl TR. The Protein Folding Problem. Annual review of biophysics. 2008;37:289-316. doi:10.1146/annurev.biophys.37.092707.153558.

protein folds: The core 3D structure of a domain is called a fold. There are only a few thousand possible folds. Related terms: misfolding, refolding

protein sequence: Can this be related to protein structure?  Lots of people have been trying to find out for a long time. Related terms: protein folding, sequence homology.

protein structure: Determining the biomolecular structure of proteins is of high importance in drug development. Biophysical properties such as protein dynamics, conformation, self-association, aggregation and particulate formation affect the quality attributes of protein therapeutics. Detailed knowledge and characterization of the underlying proteins and their behavior thus enables assessment of how protein structure is affected by manufacturing, storage, handling and delivery; and, in turn, allows researchers to better determine the impact on safety and efficacy.

The 3D structure of a protein determines how the chemical groups that make up the binding site of a ligand, the active site of an enzyme, or the binding site for another protein come together. These binding sites or active sites are key to understanding the function of a protein in the cell, or to understanding how particular molecular targets (which are, in most cases, proteins) interact with drugs. Furthermore, knowledge of the 3D structure of a protein is also key to understanding how binding of a ligand (including drugs) changes the behavior of that protein. This knowledge can also aid the understanding of how particular mutations or variations in the gene that encodes a particular protein lead to changes in the protein’s behavior that can result in disease or in differences in drug interactions among different individuals. ... The 3D conformation of a target will be critical in determining whether the target is even druggable, and, if it is, which compounds will have the best fit based on this conformation. 

A greater ability to work with three- dimensional structures and to look for similarities in these structures (between the products of different genes) is expected to yield improved functional information. Related terms: high- throughput protein structure determination, protein structure prediction, protein structure technologies, structural genomics; Narrower terms quaternary protein structure, secondary protein structure, tertiary protein structure.  protein structure data: Protein Data Bank (PDB) 

protein subfamilies:
Many proteins belong to large families, as suggested by Dayhoff [1]. Such families are often composed of subfamilies related to each other by gene duplication events. ... subfamilies often differ in their biological functionality yet still exhibit a high degree of sequence similarity.  Christian M. Zmasek,  Sean R. Eddy, RIO: Analyzing proteomes by automated phylogenomics using resampled inference of orthologs, BMC Bioinformatics. 2002; 3 (1): 14, 2002]   Related terms: protein family, protein superfamilies

protein superfamily: Margaret O. Dayhoff introduced the term protein superfamily in 1974. Since that time, the sequences in the PIR - International Protein Sequence Database have been classified into protein superfamilies. Prior to about 1990, the superfamily classification permitted a sequence to be assigned to a single superfamily only. The recognition of  mosaic, multidomain proteins, whose component domains appear to have had separate evolutionary histories, has made this approach no longer effective. Moreover, the term superfamily has come into common usage and its meaning is no longer well defined. Although originally defined as a group of evolutionarily related proteins, it also has been used in the published literature to refer to a group of structurally or functionally related proteins not necessarily of common evolutionary origin. [David George, "Proposal for the Definition of  "Protein Superfamily", Aug. 18, 1993, PIR database 

The organization of proteins into superfamilies based primarily on their sequences is introduced: examples are given of the methods used to cluster the related sequences and to elucidate the evolutionary history of the corresponding genes within each superfamily. MO Dayhoff, The origin and evolution of protein superfamilies, Federation Proceedings 35(10): 2132- 2138, Aug. 1976 Related terms: protein family, protein subfamilies

quaternary protein structure: The defined organization of two or more macromolecules with tertiary structure such as a protein that are held together by hydrogen bonds and van der Waals and coulombic forces.  IUPAC Compendium

The characteristic 3-dimensional shape and arrangement of  multimeric proteins (aggregates of more than one polypeptide chain). MeSH, 2000

secondary protein structure: The conformational arrangement (a- helix, b- pleated sheet, etc.) of the backbone segments of a macromolecule such as a polypeptide chain of a protein without regard to the conformation of the side chains or the relationship to other segments. IUPAC Compendium

The level of protein structure in which regular hydrogen- bond interactions within contiguous stretches of  polypeptide chain give rise to alpha helices, beta strands (which align to form beta sheets) or other types of coils. This is the first folding level of protein conformation. MeSH, 1993 Related term: motif.

superfamily: See protein superfamily

tertiary protein structure: The spatial organization (including conformation) of an entire protein molecule or other macromolecule consisting of a single chain. [IUPAC Compendium]

The level of protein structure in which combinations of secondary protein structures (alpha helices, beta sheets, loop regions, and motifs) pack together to form folded shapes called domains. … Small proteins usually consist of only one domain but larger proteins may contain a number of domains connected by segments of  polypeptide chain which lack regular secondary structure. MeSH, 1993

Protein structure resources

NCBI Domains and Structures
UNI-PROT KnowledgeBase keywords, 2017   Swiss Institute of Bioinformatics, Geneva Switzerland, European Bioinformatics Institute, Hinxton, UK, PIR Protein Information Resource,

How to look for other unfamiliar  terms

IUPAC definitions are reprinted with the permission of the International Union of Pure and Applied Chemistry.


Contact | Privacy Statement | Alphabetical Glossary List | Tips & glossary FAQs | Site Map