SCOPE NOTE:
Protein informatics is a newer name for an already existing discipline. It
encompasses the techniques used in bioinformatics and molecular modeling that
are related to proteins. While bioinformatics is mainly concerned with the
collection, organization, and analysis of biological data, molecular modeling is
devoted to representation and manipulation of the structure of proteins.
Karl Heinz Zimmerman, An introduction to protein informatics, Springer, 2003
https://www.springer.com/us/book/9781402075780
Drug discovery term index
Drug targets
Molecular Diagnosticsc
Informatics
Algorithms
Bioinformatics .
Cheminformatics
Drug discovery Informatics
Genomic
Informatics
Ontologies &
Taxonomies
Technologies Protein Technologies
Mass spectrometry
NMR & X-Ray Crystallography
Metabolic
engineering
Biology
Protein Structures
Proteins
Functional
Genomics
Proteomics
ab
initio:
From the beginning (Latin) .
ab initio
protein modeling:
Predict
3D structure from sequence without using a homologous model/ template; this
technology is not at the stage of being broadly applicable to drug discovery.
CHI Structural
proteomics report
Ab initio
methods use the physiochemical properties of the amino acid sequence of
a protein to literally calculate a 3D structure (lowest energy model) based
on protein folding. As opposed to determining the structure of an entire
protein,
ab initio methods are typically used to predict and model
protein folds (domains). This method is gaining considerably, in part due
to the development of novel mathematical approaches, a boost in available
computational resources (for example, tera- and pentaFLOPS supercomputers),
and considerable interest from researchers investigating protein- ligand
(or drug) interactions. Christopher Smith "Bioinformatics,
Genomics, and Proteomics" Scientist 14[23]:26, Nov. 27, 2000 Related
terms protein structure prediction
ab initio protein structure prediction:
Prediction of
a protein’s structure based on amino acid sequence alone — that is, without
mapping the structure to structures of known sequences.
Broader term: protein structure prediction
(compared
with ab initio). Narrower term (compared with structure prediction)
ab initio quantum mechanical methods:
Methods of quantum
mechanical calculations independent of any experiment other than the determination
of fundamental constants. The methods are based on the use of the
full Schrödinger equation to treat all the electrons of a chemical
system. In practice, approximations are necessary to restrict the complexity
of the electronic wave function and to make its calculation possible. (Synonymous
with non- empirical quantum mechanical methods.) IUPAC Computational
ab initio quantum mechanical modeling: The application
of ab initio modelling cross diverse fields such as condensed matter
physics, materials science and chemistry has been demonstrated over the past 10 years.
... The recent completion of the Human Genome Project will offer an unprecedented
number of protein receptors and enzymes as
targets for pharmacological
intervention in disease processes. However, before this wealth of information
can be used to develop pharmaceuticals, an understanding of the biochemistry
of the newly identified proteins and their interactions must be obtained.
First principles quantum mechanical modelling will play an important role
in this process. [Matthew
Segall, Ursula Röthlisberger, Paolo Carloni, CECAM/Psi-k Workshop: Ab Initio
Modelling in the Biological Sciences Lyon, France 11-13 June 2001]
http://www.tcm.phy.cam.ac.uk/~mds21/Workshop2001/
Scientific/node1.html#SECTION00010000000000000000
annotation
protein - dictionary-driven:
For
many years, computational methods seeking to automatically determine the
properties (functional, structural, physiochemical, etc.) of a protein directly
from sequence have been the focus of numerous research groups, including ours.
By general admission, this is a difficult problem and the methods that have been
proposed over the years typically concentrated on the analysis of individual
genes. With the advent of advanced sequencing methods and systems, the number of
amino acid sequences and fragments being deposited in the public databases has
been increasing steadily. This in turn generated a renewed demand for automated
approaches that can quickly, exhaustively and objectively annotate individual
sequences as well as complete genomes. In this paper, we present one such
approach. The approach is centered around and exploits the Bio- Dictionary, an
exhaustive collection of amino acid patterns (referred to as seqlets)
that completely covers the natural sequence space of proteins to the extent that
this space is sampled by the currently available public databases. Isidore Rigoutsos,
Tien Huynh, Laxmi P. Parida, Daniel E. Platt, Aris Floratos,
Dictionary
Driven Protein Annotation, Nucleic Acids Research, 30 (no 17) 3901- 3916,
2002
CASP
Critical Assessment of Techniques for Protein Structure
Alignment Protein Structure Prediction Center
http://predictioncenter.org/ Links to
CASP meetings results
https://en.wikipedia.org/wiki/CASP
comparative modeling: See homology
modeling
comparative proteomics:
The
C. elegans proteome was used
as an alignment template to assist in novel human gene identification …
Among the available 18,452 C. elegans
protein sequences, our results
indicate that at least 83% had human homologous genes, with 7954 records
of C. elegans proteins matching known human gene transcripts. [CH
Lai et al "Identification of Novel Human Genes Evolutionarily Conserved
in Caenorhabditis elegans by Comparative Proteomics" Genome Research
10(5): 703-713 May 2000 Related terms
Functional
Genomics comparative genomics, evolutionary
genomics.
computational
biophysics:
Activities of the Theoretical and Computational Biophysics
Group center on the structure and function of supramolecular systems in the
living cell, and on the development of new algorithms and efficient computing
tools for structural biology. The Resource brings the most advanced
molecular modeling, bioinformatics, and computational technologies to bear on
questions of biomedical relevance. Theoretical and Computational Biophysics
Group, Univ. of Illinois Urbana Champaign, About the Group
http://www.ks.uiuc.edu/Overview/intro.html
Our
research focuses on the modeling of large macromolecular systems in realistic
environments. These efforts have produced insight into biomolecular processes
coupled to mechanical force, bioelectronic processes in metabolism and vision,
and the function and mechanism of membrane proteins. Theoretical and
Computational Biophysics Group, Univ. of Illinois Urbana Champaign,
Emerging Studies,
http://www.ks.uiuc.edu/Research/Recent/
contextual
data:
While proteomic studies
initially focused largely on expression and protein identification, progress in
these areas drove the demand for more detailed types of proteomic data. Now
researchers want information about where specific proteins are expressed, both
in terms of tissues and localization within the cell. Information relating
proteins to function require additional details of post- translational
modification, and studies of protein interactions have moved beyond just looking
at binary interactions to studies of protein complexes. For both genomics and proteomics, this
shift can be characterized as an interest in more contextual data. Enhanced
insight into biological context is essential for obtaining a better
understanding of how biology actually works, and thus there is now an emphasis
to move from genomic and proteomic snapshots to time series data of expression.
Such context is of particular value if biological studies are to be translated
into medical advances, because of the importance of being able to predict the
impact of potential treatments. The integration of genomic and proteomic data
with medical conditions, treatment and outcomes becomes another critical type of
contextual information. Christina Lingham, Beyond Genome: Thinking Globally,
Cambridge Healthtech
docking :
Computational simulation of a candidate ligand binding to a receptor. Wikipedia
docking glossary accessed 2018 Aug 26
https://en.wikipedia.org/wiki/Docking_(molecular)
Narrower term: pharmacophore based docking
docking studies:
Computational techniques for the exploration
of the possible binding modes of a substrate to a given receptor, enzyme
or other binding site. IUPAC Computational Related terms:
drug design, QSAR
domain shuffling:
Creating new proteins by bringing domains together.
It is thought that this is a major way that new proteins have arisen during
evolution. Thus, mining of databases for homology by domains, rather than
by whole proteins (which are not as evolutionarily conserved), is important
in obtaining clues to functionality.
A protein
sequence can have more than one domain. Related term: multi- domain proteins.
energy function:
Computationally, a shape is assigned to a protein
sequence based on an empirical energy function. The lower the energy of
a given structure, the more likely it is to be the correct fold. The structure
prediction challenge is therefore divided into two: (1) The first challenge
is the creation of many plausible folds or a set of structures that will
include the native shape. The creation of the appropriate set depends on
existing databases (such as the Protein Data Bank) or on the design of
automated algorithms (using physical or statistical information) to generate
plausible folds. Once the set is available, a selection procedure is used
to ``fish'' out the correct fold. (2) The ``fishing'' of the plausible
native shapes critically depends on the quality of the energy function.
The value of the energy function must be the lowest for the native structure.
Opportunities in Molecular Biomedicine in the Era of
Teraflop Computing: March 3 & 4, 1999, Rockville, MD, NIH Resource
for Macromolecular Modeling and Bioinformatics Beckman Institute
for Advanced Science and Technology, University of Illinois at Urbana- Champaign
Molecular Biomedicine in the Era of Teraflop Computing - DDDAS.org
fold alignment:
A critical step in homology modeling,
because it provides the key structures for the model. If suitably
matched folds cannot be identified, a type of fold assignment known as
protein threading can be used.
fold recognition:
Methods of protein fold recognition attempt
to detect similarities between protein 3D structure that are not accompanied
by any significant sequence similarity. There are many approaches, but
the unifying theme is to try and find folds that are compatible with a
particular sequence. Unlike sequence- only comparison, these methods take
advantage of the extra information made available by 3D structure information. In effect,
the turn the protein folding problem on it's head: rather than predicting
how a sequence will fold, they predict how well a fold will fit a sequence.
Robert B. Russell, Guide to Structure Prediction "Fold recognition
methods and links" Sept. 1999
http://www.sbg.bio.ic.ac.uk/people/rob/CCP11BBS/foldrec.html
Related terms threading;
Protein
structure. protein folding, protein folds
foldedness:
Methods for analyzing "foldedness" of expressed
proteins include NMR and circular dichroism spectroscopies.
Hidden Markov Model HMM:
Wikipedia
https://en.wikipedia.org/wiki/Hidden_Markov_model Useful for insights
into protein structure sequence and function.
Related term: simulated annealing
homeomorphic superfamilies:
Protein families are clustered into "homeomorphic superfamilies". Sequences are homeomorphic if they can be aligned from
end- to- end. In practice, we allow the amino and carboxyl ends to be ragged and moderate internal length variations (represented as
gaps in the sequences). However, all members of the superfamily should have the same overall domain architecture, i.e., the same
domains in the same order (except for domains missing due to alternative splicing or very recent genetic events). It is assumed, although in most cases this has not been investigated in detail, that the molecules in a homeomorphic superfamily share a common evolutionary history since the acquisition of their constituent domains. Thus, it should be valid to construct an evolutionary tree from the members of a homeomorphic superfamily. If two groups of proteins with the same architecture are shown to have come to that structure independently, they are appropriately separated into two homeomorphic superfamilies.
PIR Classification Terminology, Georgetown Univ, revised 1998
http://pir.georgetown.edu/pirwww/aboutpir/doc/short_sf_def.html
homology:
Genomic
informatics
homology domains:
Many types of domains have been found in diverse proteins. In common use, the term "immunoglobulin superfamily" refers to the collection of all proteins that contain an
immunoglobulin- like domain. We call such a group a "homology domain superfamily". Any given protein sequence will be assigned to only one homeomorphic superfamily, but it may contain sequence segments belonging to several homology domain superfamilies.
PIR Classification Terminology, Georgetown Univ, revised 1998
http://pir.georgetown.edu/pirwww/aboutpir/doc/short_sf_def.html
homology model: A model of a protein, whose three-dimensional
structure is unknown, built from, e.g., the X-ray coordinate data of similar
proteins or using alignment techniques and homology arguments.
IUPAC Computational Related terms:
Sequencing
alignment
homology modeling:
also known as comparative modeling of protein, refers to constructing an
atomic-resolution model of the "target" protein from
its amino acid
sequence and an experimental three-dimensional structure of a related
homologous protein (the "template"). Homology modeling relies on the
identification of one or more known protein structures likely to resemble the
structure of the query sequence, and on the production of an alignment that
maps residues in the query sequence to residues in the template sequence. It has
been shown that protein structures are more conserved than protein sequences
amongst homologues, but sequences falling below a 20% sequence identity can have
very different structure.[1]
Evolutionarily related proteins have similar sequences and naturally occurring
homologous proteins have similar protein structure. Wikipedia accessed 2018 Oct
23
https://en.wikipedia.org/wiki/Homology_modeling
A computational method for determining the
structure of a protein based on its similarity to known structures. The accuracy
of structures determined by homology modeling depends largely on the amount of
homology between the unknown and the known protein sequence. The most successful tool for prediction of
protein structure from sequence, but with significant room for improvement.
Related terms: structural homology;
Sequencing
glossary sequence homology;
Proteins glossary hypothetical
protein;
In silico & Molecular
Modeling Compare with similarity
interologs:
Protein interaction maps have provided insight into the
relationships among the predicted proteins of model organisms for which a genome
sequence is available. These maps have been useful in generating potential
interaction networks, which have confirmed the existence of known
complexes and pathways and have suggested the existence of new complexes
and or crosstalk between previously unlinked pathways. However, the generation
of such maps is costly and labor intensive. Here, we investigate the extent to
which a protein interaction
map generated in one species can be used to predict
interactions in another species. LR Matthews "Identification
of potential interaction networks using sequence- based searches for conserved
protein- protein interactions or "Interologs" Genome Research 11 (12):
2120- 2126, Dec. 2001
location proteomics: Seeks
to provide automated, objective high-resolution descriptions of protein location
patterns within cells. Methods have been developed to group proteins into
statistically indistinguishable location patterns using automated analysis of
fluorescence microscope images. ... Preliminary work suggests the feasibility of
expressing each unique pattern as a generative model that can be incorporated
into comprehensive models of cell behaviour. RF Murphy,
Location
proteomics: a systems approach to subcellular location, Biochem Society
Transactions, 33 (Pt 3): 535- 538, June 2005
membrane proteins: Drug
Targets
ontologies - proteomics: A
principal aim of post- genomic biology is elucidating the structures, functions
and biochemical properties of all gene products in a genome. However, to
adequately comprehend such a large amount of information we need new
descriptions of proteins that scale to the genomic level. In short, we need a
unified ontology for proteomics. Much progress has been made towards this end,
including a variety of approaches to systematic structural and functional
classification and initial work towards developing standardized, unified
descriptions for protein properties. In relation to function, there is a
particularly great diversity of approaches, involving placing a protein in
structured hierarchies or more- generalized networks and a recent approach based
on circumscribing a protein's function through systematic enumeration of
molecular interactions. N Lan, GT Montelione, M. Gerstein, Ontologies for
proteomics: towards a systematic definition of structure and function that
scales to the genome level, Current Opinion in Chemical Biology 7(1): 44- 54,
Feb. 2003
phylogenetic profiles:
Phylogenomics
Can be used to hypothesize protein function.
post- translational modification identification:
ExPASy Proteomics
Tools
https://www.expasy.org/proteomics
list a number of tools for prediction of post- translational modification, as do
other websites. Identification of these modifications may provide important
structural- functional information.
protein analysis
sequencing: A process that includes the determination of AMINO ACID SEQUENCE
of a protein (or peptide, oligopeptide or peptide fragment) and the information
analysis of the sequence. MeSH 2000
protein array
analysis:
Ligand-binding assays that
measure protein- protein, protein- small molecule or protein- nucleic acid
interactions using a very large set of capturing molecules, i.e., those attached
separately on the solid support, to measure the presence or interaction of
target molecules in the sample. MeSH 2003
protein bioinformatics:
Tools for Protein Informatics • sequence and structure comparison
• multiple alignments • phylogenetic tree construction •
composition/pI/mass analysis • motif/pattern identification • 2° structure
prediction/threading • TMD prediction/hydrophobicity analysis • homology
modeling • visualization A Very very very short introduction to
protein bioinformatics, Patricia
Babbitt 2003
http://pga.lbl.gov/Workshop/May2003/lectures/Babbitt.pdf
See also
protein
informatics Is there a difference?
protein databases:
Protein location can be determined by such genome-
wide techniques as green fluorescent protein (GFP) tagging, and protein-
protein interactions can be determined by affinity chromatography,
immunoprecipitation and yeast two- hybrid experiments. Databases resulting from
these methods are beginning to emerge, but they are of uncertain accuracy.
Defining the Mandate of Proteomics in the Post- Genomics Era, Board on
International Scientific Organizations, National Academy of Sciences, 2002
http://www.nap.edu/books/NI000479/html/R1.html
Dr. Stanley Fields, Professor of Genetics and
Medicine at the Univ. of Washington and developer of the yeast two hybrid system
writes that protein databases "will need to become much more sophisticated
if they are to help scientists make sense of the staggering number of experimental
measurements that will soon emerge. ... protein
data will need to be integrated with results from expression profiling, genome-
wide mutation or antisense analyses, and polymorphism detection.
As proteomic data accumulate, we will become better at triangulating from
multiple disparate bits of information to gain a bearing on what a protein
does in the cell. S. Fields "Proteomics in Genomeland" Science
291: 1221-1224 Feb. 16, 2001 Related terms: protein identification, protein localization;
Expression expression profiling
Protein databases
Databases
& software directory
protein dynamics : Certain parts of a particular protein will
be rigid, but others may be flexible and change their shape, even when
bound. ... NMR has the unique ability to characterize protein fluctuations
quantitatively, much more so than crystallography can. Understanding the function of a protein is fundamental for gaining insight
into many biological processes. Proteins are stable mechanical constructs
that allow certain internal motions to enable their biological function.
Structural properties of a protein can be obtained with
X-ray
crystallography or NMR acquisition techniques. Molecular dynamics
(MD) simulations at pico/ nano- second time scales output one or more
trajectory files which describe the coordinates of each individual atom
over time. The main problem with animating these trajectories is one of
temporal scale. Taking large time steps will destroy the impression of
smooth motion, while small time steps will result in the camouflage of
interesting motions. Henk Huitema, Robert van Liere " Interactive Visualization
of Protein Dynamics" ERCIM [European Research Consortium for Computers
and Informatics] News No. 44 - January 2001
http://www.ercim.org/publication/Ercim_News/enw44/van_liere.html
protein expression mapping:
Maps, genetic
& genomic
protein expression profiling:
Expression
protein folding problem:
Protein
structures See also protein structure
prediction
protein
function:
The focus of the group is the understanding
of protein function and evolution using genomic, structural and proteomic data.
Central to this question is the concept of the domain: a structurally conserved,
genetically mobile unit. When viewed at the three-dimensional level of protein
structure, a domain is a compact arrangement of secondary structures connected
by linker polypeptides. It usually folds independently and possesses a
relatively hydrophobic core. The importance of domains is that they cannot be divided
into smaller units they represent a fundamental building block that can be used
to understand the evolution and function of proteins... The advent of
complete genomic sequences, including more and more eukaryotes, is leading to a
fundamental change in protein domain analysis. Having characterised most of the
domain families and having developed tools to predict them, we can now start to
analyse their function and evolution on a higher level. Protein
Function Analysis Group, Max Planck Institute for Molecular Genetics, Germany
http://protfunc.molgen.mpg.de/
Function is not a fixed property for many, if not
most proteins. There are many ways that gene products can be altered to elicit
modified or completely new functions. For example there are exist - alternative
splicing - which may affect as many as ¼ or more of the genes in a higher
eukaryote and can alter biochemical function either drastically or subtly,
producing truncated proteins and proteins with different compositions - post-
translational modification, such as phosphorylation and glycosidation
(which can occur on numerous sites on the same protein) - pre-enzymes made for
secretion and pro- enzymes that are activated by cleavage - acylation and
ubiquitination - non- enzymatic modifications like oxidation, so a given protein
exists in the cell in different oxidized states. Defining the Mandate of
Proteomics in the Post- Genomics Era, Board on International Scientific
Organizations, National Academy of Sciences, 2002
http://www.nap.edu/books/NI000479/html/R1.html
More systematic attempts have been made to place
proteins within a hierarchy of standard functional categories or to connect them
in overlapping networks of varying types of associations. These networks
can obviously include protein- protein interactions ... More broadly, they can
include pathways, regulatory systems and signaling cascades... Perhaps, in the
future, the systematic combination of networks may provide for a truly rigorous
definition of protein function. Mark Gerstein, et. al "Integrating
Interactomes" Science 295 (5553): 284, Jan. 2002
A biologically useful definition of the function of a protein requires a description at several different levels. To the biochemist, function means the biochemical role of an individual protein: if it is an
enzyme, function refers to the reaction catalyzed; if it is a signaling protein, function refers to the interactions that the protein makes. To the geneticist or cell biologist, function includes these roles but will also encompass the cellular roles of the protein, such as the
phenotype of its deletion, the pathway in which it operates, among others. A physiologist or developmental biologist may have an even broader view of function, including tissue specificity and
expression during the life cycle of the organism.
Gregory A Petsko, Dagmar Ringe "Overview: The Structural Basis of Protein
Function" from Chapter 2 of Protein Structure and Function: New Science
Press, 1991-2001
In the expanded view of protein function, a
protein is defined as an element in the network of its interactions. Various
terms have been coined for this expanded notion of function, such as ‘contextual
function’ or ‘cellular function’ … Whatever the term, the idea is that
each protein in living matter functions as part of an extended web of interacting
molecules … Often it is possible to understand the cellular functions of
uncharacterized proteins through their linkages to characterized proteins.
In broader terms, the networks of linkages offer a new view of the meaning
of protein function, and in time should offer a deepened understanding
of the function of cells. David Eisenberg et al "Protein function in the post-
genomic era" Nature 405: 823- 826, 15 June 2000
The principal problem facing the post-
genome era.
Walter Blackstock & Malcolm Weir "Proteomics" Trends in Biotechnology: 121-134 Mar
1999
Related terms:
Protein
categories interaction proteomics;
Functional
genomics gene function, Gene OntologyTM
;
Maps
cell
mapping
protein identification:
The analytical method used most commonly to
visualize and identify large numbers of proteins is 2D-gel
electrophoresis.
One can theoretically visualize changes in protein production, both
qualitatively and quantitatively, from two individual samples (e.g., a
control preparation and a treated preparation). Furthermore, one can potentially
accomplish protein identification by "picking" proteins from the 2D-
gel and subjecting the highly purified protein to MALDI- TOF
mass
spectrometry.
protein informatics:
Computational
biological research has become an essential component of biological research. The great quantity
and diversity of the data being generated by different technologies is daunting,
and impossible to organize or oversee without computational assistance. In
functional genomics, a great deal of effort has been devoted to developing
community- based standards for reporting gene expression data to allow others to
replicate experiments. The same will need to be done for proteomics to validate
across the different technologies. Perhaps never before has a
bioinformatics
problem of this magnitude been approached. Without effective and integrated
databases to store and retrieve these data and advanced computational methods
such as pattern recognition and other machine learning approaches to analyze and
interpret them, the full implications of these data will not be realized.
Defining the Mandate of Proteomics in the Post- Genomics Era, Board on
International Scientific Organizations, National Academy of Sciences, 2002
http://www.nap.edu/books/NI000479/html/R1.html
Although mining of protein
structure homology data is a relatively small field now, it is likely to
experience dramatic growth and to become pivotal in the ultimate exploitation of
genomic data and tools. Related
terms: proteoinformatics;
Algorithms;
protein bioinformatics;
In Silico & molecular modeling
protein interactions:
Narrower terms: protein DNA interactions, protein protein interactions, protein RNA
interactions
Related terms: annotation- proteins, binary
interaction, interaction proteomics, protein networks;
-Omes
& -omics interactome
protein interaction mapping:
Maps genomic
& genetic
protein linkage maps:
Maps genomic &
genetic
protein & mRNA data:
Although the relationship between
mRNA and
protein levels is vague for individual genes, some of the statistics for broad
categories of protein properties are much more robust... In contrast to the
differences between mRNA and protein data for individual genes, the broad
categories show that the transcriptome and translatome populations are
remarkably similar; both contain roughly the same proportions of secondary
structure and functional categories. Moreover, this contrasts the difference
with the genome, which appears to have a distinctly different composition of
functional categories. This illustrates that we get a more consistent picture
when we average across the population, i.e. there is broad similarity between
the characteristics of highly expressed mRNA and highly abundant proteins.
Dov Greenbaum, Mark Gerstein et. al. "Interrelating Different Types of
Genomic Data" Dept. of Biochemistry and Molecular Biology, Yale Univ. 2001 http://bioinfo.mbb.yale.edu/e-print/omes-genomeres/text.pdf
Related terms:
Expression;
Genomics genome data;
functional
genomics data
-Omes & -omics
transcriptome, translatome
protein networks:
The individual steps in signal
transduction pathways involve protein interactions with target molecules that
may be other proteins, small molecules or DNA. Identifying all of the proteins
that take part in a given class of interactions, on a genome-wide scale, remains
an extremely challenging task. We propose to apply mRNA display (1,
2) technology to this problem, with the goal of developing
databases of protein-ligand interactions that will add value to the existing and
growing sequence databases. PI Jack Szostak, Definition of Protein Networks
using mRNA display, ParaBioSYs, MGH, HMS, BU
http://pga.mgh.harvard.edu/Parabiosys/projects/protein_networks_rna_display.php
protein sequence:
A process that includes the determination of an amino acid
sequence of a protein (or peptide, oligopeptide or peptide fragment) and the
information analysis of the sequence. MeSH, 2002 See
also amino acid sequence.
protein sequence space: [J.] Maynard-Smith's (1970. Natural Selection and the concept of a protein space. Nature 225: 563- 564) concept of a "protein
sequence space" in which each site in an alignment is represented on its own axis and the number
of axes required to represent all conceivable variants for a protein is equal to the number of sites
in its sequence. Each sequence occupies a unique point in this space; variants differing at one site
are adjacent (Hamming) neighbours. The collection of all viable sequence variants for a
particular protein forms a localized interconnected `neighbourhood' of points within the space.
This representation has proved conceptually intuitive and analytically powerful
...
In protein sequence space, constraints are reflected in the multidimensional shape of the
cluster of points that make up the "neighbourhood" of variants viable for a specific protein. The
boundary defining the edge of this neighbourhood is characteristic of the protein's function and
can be thought of as its functional "signature". Gavin JP Naylor,
"Measuring Shifts In Function and Evolutionary Opportunity Using
Variability Profiles: A Case Study of the Globins" also Journal of
Molecular Evolution 51 (3): 223-233 Sept. 2000
http://bioinfo.mbb.yale.edu/e-print/protspace-jme/text.pdf
protein sorting signals:
Amino acid sequences found in transported proteins that selectively guide the distribution of the proteins to specific cellular compartments.
MeSH, 2001
Protein
Spotlight, Swiss-Prot
http://au.expasy.org/spotlight/
One month, one protein
protein structure prediction:
Involves primary sequence alignment,
secondary and tertiary structure prediction and homology modelling.
Narrower term: ab initio
protein structure prediction
Related term: CASP
protein taxonomy:
A
Protein Taxonomy Based on Secondary Structure T. Przytycka, R. Aurora, GD Rose, Nature Structural Biology
6
(7): 1999.
protein threading: See
threading
proteogenomics:
The systematic study of annotated genomic information to global protein
expression in order to determine the relationship between genomic
sequences and both expressed proteins and predicted protein sequences.
MeSH Year introduced: 2017
proteome informatics:
Peer Bork and David Eisenberg, "Genome and
proteome informatics" Current Opinion in Structural Biology 10 (3):
341-342, 2000
Proteome Informatics group
is part of the Swiss
Institute of Bioinformatics (SIB). It is in charge of research and
development in the fields of bioinformatics, molecular
imaging
and the use of Internet for biomedical applications. Current Projects and
People, ExPASy, Swiss Institute of Bioinformatics
http://au.expasy.org/people/pig/
proteome map:
Maps, genomic & genetic proteome mining:
We
present the development and application of a new machine-learning approach to
exhaustively and reliably identify major histocompatibility complex class I
(MHC-I) ligands among all 208 octapeptides and in genome-derived proteomes of Mus musculus, influenza A H3N8, and vesicular stomatitis virus (VSV).
Exhaustive Proteome Mining
for Functional MHC-I Ligands
ACS
Chem. Biol., 2013, 8 (9),
pp 1876–1881 DOI: 10.1021/cb400252t
http://pubs.acs.org/doi/abs/10.1021/cb400252t
proteomic analysis:
Systematic and
quantitative analysis of the properties that define protein activity and
functions within a defined context, essential for biology and medicine. Ruedi
Aebersold quoted in Defining the Mandate of Proteomics in the Post- Genomics
Era, National Academies Press, 2002
http://www.nap.edu/books/NI000479/html/R1.html
A
systematic analysis of proteins for their identify quantity and function. J Peng
and Steven Gygi, Proteomics: the move to mixtures, Journal of Mass Spectrometry
35: 1083- 1091, 2001
Proteomic
Standards Initiative PSI:
The HUPO Proteomics Standards Initiative (PSI)
defines community standards for data representation in proteomics to facilitate
data comparison, exchange and verification. Proteomic Standards Initiative,
HUPO http://www.psidev.info/
regulatory homology:
Quantitative analysis of protein expression data
obtained by high - throughput methods has led us to define the concept of
"regulatory homology" and use it to begin to elucidate the basic
structure of gene expression control in vivo. N. Leigh Anderson, Norman
G. Anderson "Proteome and proteomics; New technologies, new concepts, and
new words" Electrophoresis 19(11):1853-61 August 1998
RNA structural genomics:
The systematic determination of all
macromolecular structures represented in a genome, is focused at present
exclusively on proteins. It is clear, however, that RNA molecules play a variety
of significant roles in cells, including protein synthesis and targeting,
many forms of RNA processing and splicing, RNA editing and modification,
and chromosome end maintenance. To comprehensively understand the biology of a
cell, it will ultimately be necessary to know the identity of all encoded RNAs,
the molecules with which they interact and the molecular structures of these
complexes. This report focuses on the feasibility of structural genomics of RNA,
approaches to determining RNA structures and the potential usefulness of an RNA
structural database for both predicting folds and deciphering biological
functions of RNA molecules. Jennifer A. Doudna "Structural Genomics of
RNA" Nature Structural Biology 7 (11) supp: 954-956 (Nov. 2000 Rosetta stone method: A way of looking at the correlation of
protein
domains across species. Some proteins have homologs that are fused
in other species, yielding clues as to the proteins with which they might
interact. In addition, proteins that have been identified in particular
complexes and pathways hint at the location and function of their homologs
in other species. S. Spengler “Bioinformatics in the information age”
Science 287 (5451): 221- 223 Feb. 18, 2000 Related term:
Phylogenomics phylogenetic profiles
sequence homology, amino acid:
The degree of similarity between sequences of amino acids. This information is useful for the understanding of genetic relatedness of certain species.
MeSH, 1993
seuqence similarity searching:
a method of searching sequence databases by using alignment to a query sequence.
By statistically assessing how well database and query sequences match one can
infer homology and transfer information to the query sequence.
European Bioinformatics Institute
https://www.ebi.ac.uk/Tools/sss/
Sequence similarity searching, typically with BLAST (units 3.3, 3.4), is
the most widely used, and most reliable, strategy for characterizing newly
determined sequences. Sequence similarity searches can identify”
homologous” proteins or genes by detecting excess similarity –
statistically significant similarity that reflects common ancestry.
Pearson WR. An Introduction to Sequence Similarity (“Homology”)
Searching. Current protocols in bioinformatics / editorial board,
Andreas D Baxevanis. [et al]. 2013;0 3:10.1002/0471250953.bi0301s42.
doi:10.1002/0471250953.bi0301s42.
structural bioinformatics:
Involves the process of determining
a protein's three- dimensional structure using comparative primary sequence
alignment,
secondary and tertiary structure prediction methods, homology modeling,
and crystallographic diffraction pattern analyses. Currently, there is
no reliable de novo predictive method for protein 3D- structure determination.
Over the past half- century,
protein structure has been determined by purifying
a protein, crystallizing it, then bombarding it with X-rays. The X-ray
diffraction pattern from the bombardment is recorded electronically and
analyzed using software that creates a rough draft of the 3D structure.
Biological scientists and crystallographers then tweak and manipulate the
rough draft considerably. The resulting spatial coordinate
file can be examined using modeling- structure software to study the gross
and subtle features of the protein's structure. Christopher Smith "Bioinformatics,
Genomics, and Proteomics" Scientist 14[23]:26, Nov. 27, 2000 Related terms
Algorithms,
In silico & Molecular
Modeling.
structural genomics:
Focuses on the physical aspects of the genome through the
construction and comparison of gene maps and sequences, as well as gene
discovery, localization, and characterization. Brush up on your 'omics, Chemical
& Engineering News, 81(49): 20, Dec. 2003
http://pubs.acs.org/cen/coverstory/8149/8149genomics1.html
Involves quickly determining the 3D structures of large numbers of
proteins
(or other complex biological molecules, such as nucleic acids), ultimately
accounting for an organism’s entire
proteome. Footnote: As traditionally
defined, the term structural genomics referred to the use of sequencing
and mapping technologies, with bioinformatic support, to develop complete
genome maps (genetic, physical, and transcript maps) and to elucidate genomic
sequences for different
organisms, particularly humans. Now, however, the
term is increasingly used to refer to high- throughput methods for determining
protein structures
Many of the criticisms leveled at the Human
Genome Project in
the mid- 1980’s have been redirected toward structural genomics. Unlike high-
throughput genome sequencing, it is not a simple matter to decide
when a structural genomics effort has reached completion. SK Burley et
al “Structural genomics: beyond the Human Genome Project” Nature Genetics
23: 151 Oct. 1999 Related term: structural proteomics
A good
explanation of structural genomics Joint
Center for Structural Genomics
http://www.jcsg.org/help/robohelp/Definitions/Structural_Genomics.htm
Human Proteomics Initiative, Swiss Institute of Bioinformatics, European
Bioinformatics Institute
http://us.expasy.org/sprot/hpi/
A major project to annotate all known human sequences
according to the quality standards of Swiss- Prot. This means providing, for
each known protein, a wealth of information that include the description of its
function, its domain structure, subcellular location, post- translational
modifications, variants, similarities to other proteins, etc.
structural homology: Identify
3D structures of proteins or domains in the same family as a sequence of
interest. Related terms: homology
Functional
genomics homology modeling
Molecular
modeling
structure based design:
A design
strategy for new chemical entities based on the three- dimensional (3D) structure of the target obtained by X-ray or nuclear
magnetic resonance (NMR) studies, or from protein homology models. IUPAC
Computational
structure from sequence:
See protein structure prediction,
structural homology
structure prediction problem:
The protein secondary structure
prediction problem has become a classic, challenging problem for the artificial-
intelligence and machine learning community. Virtually every conceivable
computational technique in these fields (e.g., information theory [6, 12, 13], artificial
neural networks [15, 20, 22], cascaded networks [18, 19, 27], hybrid systems
[28], nearest neighbor methods [21], hidden markov chains [4], machine
learning [17, 25], mutual information [26]) has been applied in the context of
protein structure prediction. The reason for this attention is well- founded and
clear: If protein structure, even secondary structure, can be accurately
predicted from the now abundantly available
gene
and protein sequences, such
sequences become immensely more valuable for the understanding of drug-
design, the genetic basis of disease, the role of protein
structure in its enzymatic, structural, and signal transduction
functions, and basic physiology from molecular to cellular, to fully systemic
levels. In short, the solution of the protein structure prediction problem (and
the related protein folding problem) will bring on the second phase of
the revolution. Peter Munson et. al "Protein Secondary Structure
Prediction, NIH, 1994
http://abs.cit.nih.gov/reprints/text3.html
SWISS- PROT:
Databases &
software directory
threading: In this approach, a target sequence is “threaded”
through a library of 3D folds to try to find a match. This method
is used when no sequence is clearly related to the target sequence.
Protein informatics resources
Joint Center for
Structural Genomics Technologies
http://www.jcsg.org/scripts/prod/technologies1.html
How
to look for other unfamiliar terms
IUPAC definitions are
reprinted with the permission of the International Union of Pure and Applied
Chemistry.
|
|