|
An understanding of the behavior of biological systems
at each level of their organization can only be achieved by careful study of the
complex dynamical interactions between the components of these systems. For this
understanding to be quantitative it is necessary to develop structurally,
biochemically and biophysically detailed mathematical models. Once developed,
these models can be simulated, analyzed, and visualized through application of
modern engineering and computational approaches. IBM, Functional Genomics
and Systems Biology Overview http://www.research.ibm.com/FunGen/
Informatics
Map: Finding guide to terms in these glossaries
Site
Map
Related glossaries include
Applications Drug
discovery & development, Pharmacogenomics,
Sequencing, Structural
genomics
Informatics Algorithms,
Cheminformatics,
Computers
& computing, Information
management & interpretation, Databases
& software directory
Biology Pharmaceutical biology,
Protein
Structure
3D protein structure prediction: See protein structure prediction
3D-QSAR Three-Dimensional Quantitative Structure-Activity Relationships:
Involves the analysis of the quantitative relationship between the
biological activity of a set of compounds and their three- dimensional properties
using statistical correlation methods. [IUPAC Computational]
Broader terms: QSAR; Drug
discovery & development SAR Structure Activity
Relationship Narrower terms: Algorithms
CoMFA
Comparative Molecular Field Analysis Related term Drug
discovery & development drug design
ab initio:
From the Latin: from the beginning. In modeling
refers to models devised without experimental data?
ab initio
calculations:
Quantum chemical calculations
using exact equations with no approximations which involve the whole
electronic population of the molecule. [IUPAC Computational]
ab initio
gene prediction:
Traditionally, gene prediction
programs that rely only on the statistical qualities of exons have
been referred to as performing ab initio predictions. Ab initio
prediction of coding sequences is an undeniable success by the standards
of the machine- learning algorithm field, and most of the widely used gene
prediction programs belong to this class of algorithms. It is impressive
that the statistical analysis of raw genomic sequence can detect around 77- 98% of the genes present ... This is, however, little consolation
to the bench biologist, who wants the complete sequences of all genes present,
with some certainty about the accuracy of the predictions involved. As
Ewan Birney (European Bioinformatics Institute, UK) put it, what looks
impressive to the computer scientist is often simply wrong to the biologist.
[Meeting report "Gene prediction: the end of the beginning" Colin Semple,
Genome Biology 2000 1(2): reports 4012.1-4012.3] http://www.genomebiology.com/2000/1/2/reports/4012/
All ab initio gene prediction programs have to balance sensitivity
against accuracy.
Broader term: gene
prediction.
ab initio
molecular dynamics:
The Parrinello group has applied ab initio Molecular Dynamics
(MD) in which all forces were computed quantum- chemically to chemical reactions
in general and to biological systems in particular, with results that compared
favorably with experiment and older force field methods. The ab initio
method was found to be of ``useful accuracy'' for simulations of biomolecules
... With a 1000 times faster computer (relative
to 32 processors on a Cray T3E) the dynamics of a quantum- chemical system
consisting of up to 10 atoms could be simulated for 10 s. [Opportunities in Molecular Biomedicine in the Era of Teraflop
Computing: Report on a Meeting Held March 3 & 4, 1999 in Rockville,
MD, Organized by the NIH Resource for Macromolecular Modeling and Bioinformatics
Beckman Institute for Advanced Science and Technology, University of Illinois
at Urbana- Champaign]
http://www.ks.uiuc.edu/Publications/Reports/teraflop/node4.html
ab initio
protein structure prediction:
See Structural
genomics glossary
ab initio
quantum mechanical methods:
Methods of quantum
mechanical calculations independent of any experiment other than the determination
of fundamental constants. The methods are based on the use of the
full Schrödinger equation to treat all the electrons of a chemical
system. In practice, approximations are necessary to restrict the complexity
of the electronic wave function and to make its calculation possible. (Synonymous
with non- empirical quantum mechanical methods.) [IUPAC Computational]
ab initio
quantum mechanical modeling: The application
of ab initio modelling cross diverse fields such as condensed matter
physics, materials science and chemistry has been demonstrated over the past 10 years.
... The recent completion of the Human Genome Project will offer an unprecedented
number of protein receptors and enzymes as targets for pharmacological
intervention in disease processes. However, before this wealth of information
can be used to develop pharmaceuticals, an understanding of the biochemistry
of the newly identified proteins and their interactions must be obtained.
First principles quantum mechanical modelling will play an important role
in this process. [Matthew
Segall, Ursula Röthlisberger, Paolo Carloni, CECAM/Psi-k Workshop: Ab Initio
Modelling in the Biological Sciences Lyon, France 11-13 June 2001] http://www.tcm.phy.cam.ac.uk/~mds21/Workshop2001/
Scientific/node1.html#SECTION00010000000000000000
alignment: Sequencing
glossary
binding site: Pharmaceutical
biology glossary
biocomplexity, biological complexity: Genomics glossary
biomimetic synthesis: Combinatorial
libraries & synthesis glossary
CADD: See Computer Assisted Drug Design
CAMD: See Computer Aided Molecular Design, Computer Assisted Molecular
Design CAMM See Computer Assisted Molecular Modeling
cancer- computer simulation: Cancer
genomics glossary
chemical similarity: Cheminformatics
glossary ClogP values:
Calculated 1-octanol/ water partition coefficients,
frequently used in Structure-Property Correlation (SPC)
or
quantitative structure-activity relationship (QSAR) studies
(Leo, 1993). [IUPAC Computational]
Logarithm of the partition coefficient.
Comparative Molecular Field Analysis CoMFA:
A 3D-QSAR
method that uses statistical correlation techniques for the analysis of the
quantitative relationship between the biological activity of a set of compounds
with a specified alignment, and their three-dimensional electronic and steric
properties. Other properties such as hydrophobicity and hydrogen bonding can
also be incorporated into the analysis. (See also Three-dimensional
Quantitative Structure-Activity Relationship [3D-QSAR]). [IUPAC
Medicinal Chemistry]
Uses statistical correlation techniques for the analysis of the quantitative
relationship between the biological activity of a set of compounds with
a specified alignment, and their three- dimensional electronic and steric
properties. Other properties, such as hydrophobicity and H-bonding
can also be incorporated into the analysis (Cramer et al., 1988; Kubinyi,
1993b). [IUPAC Computational]
Narrower term: topomeric CoMFA
computational biology:
Bioinformatics
glossary
computational
biophysics: Activities of the Theoretical and Computational Biophysics
Group center on the structure and function of supramolecular systems in the
living cell, and on the development of new algorithms and efficient computing
tools for structural biology. The Resource brings the most advanced
molecular modeling, bioinformatics, and computational technologies to bear on
questions of biomedical relevance. Theoretical and Computational Biophysics
Group, Univ. of Illinois Urbana Champaign, About the Group http://www.ks.uiuc.edu/Overview/intro.html Our
research focuses on the modeling of large macromolecular systems in realistic
environments. These efforts have produced insight into biomolecular processes
coupled to mechanical force, bioelectronic processes in metabolism and vision,
and the function and mechanism of membrane proteins. Theoretical and
Computational Biophysics Group, Univ. of Illinois Urbana Champaign,
Emerging Studies, http://www.ks.uiuc.edu/Research/Recent/
computational chemistry: Chemistry
& biology glossary See also Cheminformatics
Related terms: binding site, molecular graphics, Van der Waals
computational gene recognition:
Interpreting nucleotide sequences
by computer, in order to provide tentative annotation on the location,
structure and functional class of protein- coding genes. [JW Fickett 1996]
Gene recognition is much more difficult in higher eukaryotes than in
prokaryotes, as coding regions (exons) are often interrupted by
non- coding
regions (introns) and genes are highly variable in size. This
is particularly so for human genes. As someone remarked recently people have
non- coding regions occasionally interrupted by genes.
Broader terms: gene recognition, molecular recognition.
computational genomics:
Our laboratory develops new machine learning
techniques and algorithms to model the transcriptional regulatory networks that
control gene expression programs in living cells. We have a very productive
interdisciplinary collaboration with leading biologists that has allowed us to
tackle extraordinarily difficult and interesting problems that underlie cellular
function and development. Computational Genomics Research
Group, C SAIL, MIT http://www.psrg.csail.mit.edu/
Google = about 5,670 July 19, 2002;
about 19,500 July 26, 2004, about 454,000 May 7, 2007
Computational
analysis of microarray data,
John Quackenbush, Nature Reviews 2, 418-
427, June 2001 http://www.nature.com/cgi-taf/DynaPage.taf?file=/nrg/journal/v2/n6/full/nrg0601_418a_fs.html
Related terms:
Expression glossary, Microarrays glossary
computational modeling:
See ab initio modeling, homology
modeling, molecular modeling.
computational
physiology: The International Union of Physiological
Sciences (IUPS) Physiome Project is an internationally collaborative open-
source project to provide a public domain framework for computational
physiology, including the development of modeling standards, computational tools
and web-accessible databases of models of structure and function at all spatial
scales [1,2,3]. It aims to develop an infrastructure for linking models of
biological structure and function across multiple levels of spatial organization
and multiple time scales. The levels of biological organisation, from genes to
the whole organism, includes gene regulatory networks, protein- protein and
protein- ligand interactions, protein pathways, integrative cell function,
tissue and whole heart structure- function relations. The whole heart models
include the spatial distribution of protein expression. Keynote: Peter J.
Hunter, Univ of Auckland, International Society of Computational Biology,
Detroit, MI, 2005 http://www.iscb.org/ismb2005/keynotes.html
computational quantum chemistry: Chemistry
& biology glossary
computational video: Computers & computing
glossary
Computer Aided Molecular Design (CAMD): Involves all computer-assisted
techniques used to discover, design and optimize compounds with desired
structure and properties. [IUPAC Combinatorial]
Also known as molecular modeling or computational chemistry,
uses computers to analyze and model the physicochemical properties of a
molecule. CAMD programs allow integrated molecular design to take drug
discovery to a new level by using a more cross-functional team approach
to drug research and development. [Oxford Molecular]
Computer-Assisted Drug Design CADD:
Involves all computer- assisted
techniques used to discover, design and optimize biologically active compounds
with a putative use as drugs. [IUPAC Computational]
Broader term: Drug
discovery & development glossary drug design
Computer-Assisted Molecular Design CAMD:
Involves all computer-assisted
techniques used to discover, design and optimize compounds with desired
structure and properties. [IUPAC Computational]
Computer-Assisted molecular modeling
CAMM: The investigation
of molecular structures and properties using computational chemistry and
graphical visualization techniques. [IUPAC Computational]
conformational analysis:
Consists of the exploration of energetically
favorable spatial arrangements (shapes) of a molecule (conformations) using
molecular mechanics, molecular dynamics, quantum chemical
calculations or analysis of experimentally- determined structural
data, e.g., NMR or crystal structures.
Molecular mechanics and quantum chemical methods are employed to compute
conformational energies, whereas systematic and random searches,
Monte
Carlo, molecular dynamics, and distance geometry are methods
(often combined with energy minimization procedures) used to explore the
conformational space. IUPAC Computational]
decoys:
Potential energy functions to fold proteins are usually
designed by a learning approach. A learning algorithm is presented with
a large set of wrong shapes [decoys] and a few native sequences. The energy
function is trained on the set to recognize the few correct folds and is
used and tested on other proteins that were not included in the training set. [Opportunities in Molecular Biomedicine in the Era of Teraflop Computing:
March 3 & 4, 1999, Rockville, MD, NIH Resource for Macromolecular Modeling
and Bioinformatics Beckman Institute for Advanced Science and Technology,
University of Illinois at Urbana- Champaign] http://www.ks.uiuc.edu/Publications/Reports/teraflop/node4.html
docking: Three-
dimensional molecular structure is one of the
foundations of structure- based drug design. Often, data are available
for the shape of a protein and a drug separately, but not for the two together.
The program AutoDock was originally written in FORTRAN-77 in 1990 by David
S. Goodsell here in Arthur J. Olson's laboratory. It performs automated
docking of ligands (small molecules like a candidate drug) to their macromolecular
targets (usually proteins, sometimes DNA) [Garrett B. Morris, “Molecular
docking web”, Scripps, Dec. 2000] http://www.scripps.edu/pub/olson-web/people/gmm/index.html
Wikipedia http://en.wikipedia.org/wiki/Docking_%28molecular%29
Narrower term: pharmacophore based docking
docking programs:
Programs for evaluating lead compounds against
target proteins; these programs are “informed” by structure data. [CHI
Structural proteomics report]
Traditional ligand- docking programs - such as DOCK, developed by Irwin
Kuntz at the University of California at Berkeley; MacroModel, developed
by Clark Still at Columbia University; and GOLD from MSI (now part of
Pharmacopeia) - give information about potential ligands for a known protein structure.
These programs select molecules predicted to be highly complementary to
the receptor structure and can screen many of these ligands against the
protein. This type of virtual screening technology has already been incorporated into many
major pharmaceutical companies’ discovery programs and offers the ability
to screen many more compounds at once than the traditional laboratory- based
method. [CHI Structural
proteomics report]
docking studies:
Computational techniques for the exploration
of the possible binding modes of a substrate to a given receptor, enzyme
or other binding site. [IUPAC Computational] Related terms: drug design, QSAR
Pharmaceutical
biology glossary.
drug design: See structure-based drug design Drug
discovery & development glossary Related terms: 3D
QSAR, QSAR Algorithms glossary and Data
& information management glossary.
dynamic modeling:
Mathematical
approaches to studying biological variation have changed little in several
decades. There is a need to develop new dynamic models to illuminate how systems
interact and evolve. Just as important, it is critical to study the nature of
biological and mathematical assumptions of models and statistics. Tools for
analyzing and interpreting data on the architecture of complex phenotypes should
be developed in the context of real biological information. Genetic
Architecture, Biological Variation and Complex Phenotypes, PA-02-110, May 29,
2002- June 5, 2005 http://grants1.nih.gov/grants/guide/pa-files/PA-02-110.html
dynamic programming methods:
Sequencing glossary
energy function:
Computationally, a shape is assigned to a protein
sequence based on an empirical energy function. The lower the energy of
a given structure, the more likely it is to be the correct fold. The structure
prediction challenge is therefore divided into two: (1) The first challenge
is the creation of many plausible folds or a set of structures that will
include the native shape. The creation of the appropriate set depends on
existing databases (such as the Protein Data Bank) or on the design of
automated algorithms (using physical or statistical information) to generate
plausible folds. Once the set is available, a selection procedure is used
to ``fish'' out the correct fold. (2) The ``fishing'' of the plausible
native shapes critically depends on the quality of the energy function.
The value of the energy function must be the lowest for the native structure. [Opportunities in Molecular Biomedicine in the Era of
Teraflop Computing: March 3 & 4, 1999, Rockville, MD, NIH Resource
for Macromolecular Modeling and Bioinformatics Beckman Institute
for Advanced Science and Technology, University of Illinois at Urbana-
Champaign]
http://www.ks.uiuc.edu/Publications/Reports/teraflop/node4.html
exon parsing:
Identifying precisely the 5' and 3' boundaries of
genes
(the transcription unit) in metazoan genomes, as well as the correct sequences
of the resulting mRNA ("exon parsing") has been a major challenge of
bioinformatics for years. Yet, the current program performances are still
totally insufficient for a reliable automated annotation (Claverie 1997;
Ashburner 2000). It is interesting to recapitulate quickly the research in this
area to illustrate the essential limitation plaguing modern bioinformatics.
Encoding a protein imposes a variety of constraints on nucleotide sequences,
which do not apply to noncoding regions of the genome. These constraints induce
statistical biases of various kinds, the most discriminant of which was soon
recognized to be the distribution of six nucleotide- long "words" or
hexamers. Claverie and Bougueleret 1986; Fickett and Tung 1992). [JM
Claverie "From Bioinformatics to Computational
Biology" Genome Res 10: (9) 1277-
1279 Sept. 2000
exon prediction:
Since prokaryotes don't have introns,
exon prediction implies working with eukaryotes. Is exon prediction
equivalent to gene prediction in prokaryotes? Related terms: ab
initio gene prediction; GRAIL Sequencing
glossary
flexible ligands: See under protein flexibility modeling:
force field:
A set of functions and parametrization used in molecular
mechanics calculations. [IUPAC Computational]
Long-time simulations will pose a challenging benchmark for the force
fields employed in molecular modeling. One question is, how will proteins
and DNA that were described by the available force fields (and remained
stable over nanosecond periods) behave in microsecond simulations? The
high cost of long- time simulations will require that the issue is addressed
in a systematic way by providing standard cases against which simulations
can be tested [Opportunities in Molecular Biomedicine in the Era of Teraflop
Computing: March 3 & 4, 1999, Rockville, MD, NIH Resource for Macromolecular
Modeling and Bioinformatics Beckman Institute for Advanced Science
and Technology, University of Illinois at Urbana- Champaign] http://www.ks.uiuc.edu/Publications/Reports/teraflop/node4.html
Related term: van der
Waals
gene finding programs:
http://cmgm.stanford.edu/classes/genefind/
Bioinformatics Resource, Center for Molecular and Genetic Medicine, Stanford
Univ. School of Medicine. List of programs has been compiled and updated from
James W. Fickett, "Finding genes by computer: the state of the art"
Trends in Genetics, August 1996, 12 (8) 316- 320
gene identification:
Using marker SNPs
to hone in on otherwise hard to
find genes.
The effectiveness of finding genes by similarity
to a given sequence segment is determined by a much simpler statistic,
the total coverage of the genome by the collective set of sequence
contigs. As the overall coverage of the genome is virtually complete (>
90%), there is a strong likelihood that every gene is represented, at least
in part, in the data. Thus, finding any gene by sequence similarity
searches using sufficient sequence to ensure significance is almost always
possible using the data published this week. Caution must be exercised,
however, as the identification of the gene may still be ambiguous. This
is because a highly similar sequence from a receptor gene from Drosophila,
for example, could be found in several different, homologous genes,
which may have similar or entirely different functions or are nonfunctioning pseudogenes. In other words, common domains or motifs can be present
in many different genes. The use of the approximate similarity search tool
BLAST is probably still the best way to find similar sequences. [David
Galas "Making Sense of the Sequence" Science 291: 12257-1260 Feb. 16, 2001]
Genes (and their corresponding mRNAs and proteins) are identified by aligning reference sequences
(RefSeq), GenBank, mRNAs, and ESTs to the genome sequence using a program called Acembly.
Acembly takes advantage of paired EST reads, measured clone lengths, and polyA tails. Transcript
models are reconstructed by attempting to settle disagreements between individual sequence
alignments without using an a priori model (such as codon usage, initiation, or polyA signals). In
practice, there is an initial low stringency analysis followed by a clean up procedure
which keeps the
best hits. ... An obvious challenge in using alignments to annotate
genes is the treatment of sequence differences
between the mRNA and genomic sequence. These differences could represent sequencing errors,
assembly errors, naturally occurring polymorphisms, or paralogs. It is difficult to resolve these
differences automatically; therefore the default treatment is to provide the mRNA and
protein sequence
that corresponds to the genomic sequence. The only exception is where a sequence difference changes
the reading frame relative to the supporting mRNA and EST data; then the genomic sequence is
frameshifted to provide the protein product that corresponds to the mRNA data.
[NCBI Contig Assembly and Annotation Process, 2001] http://www.ncbi.nlm.nih.gov/genome/guide/build.html#contig
There are two basic approaches to gene identification: by homology
and ab initio approaches.
gene parsing:
Initial gene parsing methods were then simply
based on word frequency computation, eventually combined with the detection of
splicing consensus motifs. The next generation of software implemented the same
basic principles into a simulated neural network architecture (Uberbacher and
Mural 1991). Finally, the last generation of software, based on Hidden Markov
Models, added an additional refinement by computing the likelihood of the
predicted gene architectures (e.g., favoring human genes with an average of
seven coding exons, each 150 nucleotides long) is added (Kulp et al. 1996; Burge
and Karlin, 1997)). These ab initio methods are used in conjunction with a
search for sequence similarity with previously characterized genes or expressed
sequence tags (EST). [JM Claverie "From
Bioinformatics to Computational Biology" Genome
Res 10: (9) 1277- 1279.Sept. 2000] http://igs-server.cnrs-mrs.fr/igs/abstract/an2000/abstract13.html
gene prediction:
Wikipedia http://en.wikipedia.org/wiki/Gene_finding
One of the first useful products from the human genome will
be a set of predicted genes. Besides its intrinsic scientific interest, the
accuracy and completeness of this data set is of considerable importance for
human health and medicine. Though progress has been made on computational gene
identification during the past decade, the accuracy of gene prediction tools is
not sufficient to locate the genes reliably in higher eukaryotic genomes. Thus,
while the precise sequence of the human genome is increasingly deciphered, gene
number estimations are becoming increasingly variable. ... In 1996 we published
a comprehensive
evaluation of gene prediction programs accuracy (Burset and Guigó, 1996).
... Recently we have published a revised
version of this evaluation (Guigó et al., 2000). This revised evaluation
suggest that though gene prediction will improve with every new protein that is
discovered and through improvements in the current set of tools, we still have a
long way to go before we can decipher the precise exonic structure of every gene
in the human genome using purely computational methodology. Genome
Bioinformatics Research Lab, Center for Genomic Regulation (Centre de Regulació
Genòmica - CRG, Barcelona, 2004 http://genome.imim.es/research/eval.html
Many methods for predicting genes are based
on compositional signals that are found in the DNA sequence. These methods
detect characteristics that are expected to be associated with genes, such
as splice sites and coding regions, and then piece this information together
to determine the complete or partial sequence of a gene. Unfortunately,
these ab initio methods tend to produce false positives, leading
to overestimates of gene numbers, which means that we cannot confidently
use them for annotation. They also do not work well with unfinished sequence
that has gaps and errors, which may give rise to frameshifts, when the
reading frame of the gene is disrupted by the addition or removal of bases.
... The most effective algorithms integrate gene- prediction methods with
similarity comparisons.... The
most powerful tool for finding genes may be other vertebrate genomes. Comparing
conserved sequence regions between two closely related organisms will enable
us to find genes and other important regions in both genomes with no previous
knowledge of the gene content of either. [Ewan Birney et. al "Mining
the draft human genome" Nature 409: 827-828 15 Feb. 2001]
Sadly, it is often claimed that matching back cDNA to genomic sequences is
the best gene identification protocol; hence, admitting that the best way to
find genes is to look them up in a previously established catalog! Thus, the two
main principles behind state- of- the- art gene prediction software are (1) common
statistical regularities and (2) plain sequence similarity. From an
epistemological point of view, those concepts are quite primitive. [JM
Claverie "From Bioinformatics to Computational
Biology" Genome Res 10: (9) 1277-
1279.Sept. 2000] http://igs-server.cnrs-mrs.fr/igs/abstract/an2000/abstract13.html
Algorithms
have been developed and are combined to recognize gene structural components.
Narrower/synonymous? term: ab initio gene prediction Related term: comparative
genomics
gene prediction validation:
gene recognition:
Principally used for finding open reading
frames, tools of this type also recognize a number of features of
genes, such as regulatory regions, splice junctions, transcription and
translation stops and starts, GC islands, and poly adenylation sites.
[Laura De Francesco "Some things considered" Scientist 12[20]:18, Oct.
12, 1998]
http://www.the-scientist.com/yr1998/oct/profile1_981012.html
granularity: Information
management & interpretation glossary
Hidden Markov Models HMM:
Searching a protein sequence database
for homologues is a powerful tool for discovering the structure and function
of a sequence. Amongst the algorithms and tools available for this task,
Hidden Markov model (HMM) - based search methods improve both the sensitivity
and selectivity of database searches by employing position- dependent scores
to characterize and build a model for an entire family of sequences. HMMs have been used to analyze proteins using two complementary strategies.
In the first, a sequence is used to a search a collection of protein families,
such as Pfam, to find which of the families it matches. In the second approach
an HMM for a family is used to search a primary sequence database to identify
additional members of the family. The latter approach has yielded insights
into protein involved in both normal and abnormal human pathology. [Lawrence Berkeley Lab, US "Advanced
Computational Structural Genomics"] http://cbcg.lbl.gov/ssi-csb/Meso.html
A widely used probabilistic model for data that are observed in a sequential
fashion (e.g., over time). A HMM makes two primary assumptions. The first
assumption is that the observed data arise from a mixture of K
probability distributions. The second assumption is that there is a discrete-
time Markov chain with K states, which is generating the observed
data by visiting the K distributions in Markov fashion. The
"hidden" aspect of the model arises from the fact that the state-
sequence is not directly observed. Instead, one must infer the state-
sequence from a sequence of observed data using the probability model. Although
the model is quite simple, it has been found to be very useful in a variety of
sequential modeling problems, most notably in SPEECH
RECOGNITION (Rabiner 1989) and more recently in other disciplines such as computational
biology (Krogh et al. 1994). [MITECS Online MIT Encyclopedia of the
Cognitive Sciences http://cognet.mit.edu/MITECS/Entry/pearl.html
homology model, homology modeling: Structural
genomics glossary
immersive virtual reality: Cheminformatics
glossary
in silico:
Literally "in the computer".
In a white
paper I wrote for the European Commission in 1988 I advocated the funding of
genome programs, and in particular the use of computers. In this endeavour I
coined "in silico" following "in vitro" and "in
vivo" I think that the first public use of the word is in the following
paper: A. Danchin, C. Médigue, O. Gascuel, H. Soldano, A. Hénaut, From
data banks to data bases. Res. Microbiol. (1991) 142: 913- 916. You
can find a developed account of this story in my book The
Delphic Boat, Harvard University Press, 2003 [personal communication Antoine
Danchin, Institute Pasteur, 2003] Narrower
terms: in silico biology, in silico modeling, in silico
proteomics, in silico screening, in silico target discovery; Cell biology virtual cells
in silico;
Related terms: Chemoinformatics glossary
rules of five
in silico biology:
The
considerable "algorithmic complexity" of biological systems requires a
huge amount of detailed information for their complete description. Although far
from being complete, the overwhelming quantity of small pieces of information
gathered for all kind of biological systems at the molecular and cellular level
requires computational tools to be adequately stored and interpreted.
Interpretation of data means to abstract them as much as allowed to provide a
systematic, an integrative view of biology.
Most of the presently available scientific journals focus either on accumulating
more data from elaborate experimental approaches, or on presenting new
algorithms for the interpretation of these data. Both approaches are
meritorious. However, since both communities do not interact much with each
other, neither the experimental nor the computational biologists really apply
the theoretical tools to that extent which would be possible and desirable to
achieve that progress of research which is already feasible. ["Aims and
Scope" In Silico Biology: An international journal of computational
biology] http://www.bioinfo.de/isb/aims.html Related
terms: in silico, virtual cells
in
silico modeling: Modeling of biological pathways and other biological
processes for drug discovery and development. Given the enormous increase in
genetic and molecular data, such models will continue to improve and are
predicted to become an essential tool for evaluating hypotheses, with only the
more promising ones being subjected to empirical testing. in silico proteomics:
Prediction of protein
structure and function. [Gareth W. Roberts and Jonathan Swinton "In Silico
Proteomics: Playing by the rules" Current Drug Discovery 5: Aug. 1, 2001] http://www.current-drugs.com/CDD/CDD/CDDPDF/issue%205/Roberts.pdf in silico
screening: See also virtual
screening
Google = about 1,780
Mar. 1, 2004; about 14,500 Aug 12, 2008
in silico transcriptomics:
Omes & -omics
glossary
ligand binding:
One of the biggest challenges in computational drug design is the accurate calculation of the free energy of binding of small ligands. Currently, typical errors in these calculations make them unusable to distinguish between strong binders (which would potentially make good drugs) and
non- specific binders (which wouldn't). We are using distributed computing methods to greatly increase the accuracy of such calculations.
[Vijay Pande, Pande Group Projects, Stanford Univ. US] http://www.stanford.edu/group/pandegroup/projects.html#ligandbinding
Related terms: Drug discovery &
development: drug design,
molecular design; Pharmaceutical
biology glossary binding site, ligand
ligand design: Drug discovery &
development:
ligand docking: See under docking.
molecular dynamics: A simulation procedure consisting of the
computation of the motion of atoms in a molecule or of individual
atoms or molecules in solids, liquids and gases, according to Newton's
laws of motion. The forces acting on the atoms, required to simulate their
motions, are generally calculated using molecular mechanics force
fields. [IUPAC Computational]
Narrower term: ab initio molecular dynamics
molecular geometry:
http://www.ics.uci.edu/~eppstein/gina/molmod.html
molecular graphics:
A technique for the visualization
and manipulation of molecules on a graphical display device. [IUPAC Computational]
molecular mechanics:
The calculation of molecular conformational
geometries and energies using a combination of empirical force fields (Burkert
and Allinger, 1982).
Method of calculation of geometrical and energy characteristics
of molecular entities on the basis of empirical potential functions
(see force field) the form of which is taken from classical
mechanics. The method implies transferability of the potential functions
within a network of similar molecules. An assumption is made on "natural”
bond lengths and angles, deviations from which result in bond and angle
strain respectively. Repulsive or attractive van der Waals and electrostatic
forces between nonbonded atoms are also taken into account. Synonymous
with force field method. [IUPAC Computational]
Related terms: decoys, energy function, force fields
molecular mimicry: Drug discovery & development glossary
molecular modeling, molecular modelling:
A technique for the investigation of molecular
structures and properties using computational chemistry and graphical
visualization techniques in order to provide a plausible three- dimensional
representation under a given set of circumstances. [IUPAC Medicinal
Chemistry, IUPAC Computational]
The scope note for the Journal of Molecular Modeling
includes the following subjects: computer- aided molecular design, rational
drug design, de novo ligand design and receptor modeling, ·
application of computational and modeling methods in the field of medical
chemistry, protein and peptide modeling, quantum chemistry, application of semi
empirical, DFT and ab initio calculations, · prediction of biological
activities (QSAR) and physico- chemical properties (QSPR), molecular
mechanics/ dynamics simulation of polymers and biopolymers, genetic
algorithms and neural nets, modeling of catalysts, advanced
materials, and stationary phases in separation science, enhanced desktop
computational tools for the life sciences visualisation, classification and
handling of chemical data. http://link.springer.de/link/service/journals/00894/aims.htm
Molecular modeling applications use falls into two broad categories:
interactive visualization and computational analyses. ... Three of the most prominent uses of modern molecular
modeling applications are structure analysis, homology modeling,
and docking ... in essence, objective modeling revolves around three
different approaches (each based on different underlying physical and chemical
theories): molecular dynamics, molecular mechanics, and quantum
mechanics . All of these are concerned with developing a
unique solution to what is referred to as the "protein folding" problem
- designing and testing algorithms and applications that will reliably
predict 3-D structure from primary sequence. [Christopher Smith "Molecular
Modeling - Seeing the Whole Picture with Modeling Software Packages" Scientist
12[17]:0, Aug. 31, 1998] http://www.the-scientist.com/yr1998/august/profile2_980831.html
Molecular modeling software includes AMBER, DOCK, MODELER, RasMol and
many other programs.
Related terms: computational chemistry,
Computer Assisted Drug Design;
molecular graphics, molecular dynamics, molecular mechanics.
molecular models:
Models used experimentally or theoretically
to study molecular shape, electronic properties, or interactions; includes
analogous molecules, computer generated graphics, and mechanical structures.
MeSH, 1984
molecular recognition: Drug discovery
and development glossary
Monte Carlo technique:
A simulation procedure consisting of randomly
sampling the conformational space of a molecule. [IUPAC Computational]
Broader
term: simulation
ORF prediction: Related terms: exon prediction, gene prediction,
gene recognition.
ORF recognition:
ESTs provide candidate genes, useful in positional
cloning (during walks and for recognizing ORFs) and for ORF recognition in
cloning of insertion sites. [Report from the Workshop on Genomic and Genetic
Tools for the Zebrafish May 10-11, 1999, Trans- NIH Zebrafish Initiative] http://www.nih.gov/science/models/zebrafish/reports/genomic-genetic.html
parsing: Algorithms glossary
Narrower terms: exon parsing, gene parsing, protein structure domain parsing
pathway &
disease modeling: Expression glossary
peptidomimetic: Drug discovery & development
glossary
phenomics: -Omes & -omics glossary
prediction: Narrower
terms: exon prediction, gene prediction,
ORF prediction, protein sequence prediction; Structural
genomics glossary protein structure prediction; Related terms:
recognition
protein structure prediction:
Structural
genomics glossary
Quantitative Structure-Activity Relationships QSAR:: Mathematical relationships linking chemical structure and pharmacological
activity in a quantitative manner for a series of compounds. Methods which
can be used in QSAR include various regression and
pattern recognition
techniques. QSAR is often taken to be equivalent to chemometrics or multivariate
statistical data analysis. It is sometimes used in a more limited
sense as equivalent to Hansch analysis. QSAR is a subset of the more general
term SPC. [IUPAC Computational]
The building of structure – biological activity
models by using regression analysis with physicochemical constants,
indicator variables or theoretical calculations. The term has been extended
by some authors to include chemical reactivity, i.e. activity is regarded
as synonymous with reactivity. This extension is, however, discouraged. Related
term: correlation analysis. [IUPAC Compendium]
A quantitative prediction of the biological, ecotoxicological or
pharmaceutical activity of a molecule. It is based upon structure and activity
information gathered from a series of similar compounds. MeSH, 2001
QSARs
attempt to correlate chemical structure with activity using statistical
approaches. The QSAR models are useful for various purposes including the
prediction of activities of untested chemicals. Quantitative structure- activity
relationships and other related approaches have attracted broad scientific
interest, particularly in the pharmaceutical industry for drug discovery and in
toxicology and environmental science for risk assessment. An assortment of new
QSAR methods have been developed during the past decade, most of them focused on
drug discovery. Besides advancing our fundamental knowledge of QSARs, these
scientific efforts have stimulated their application in a wider range of
disciplines, such as toxicology, where QSARs have not yet gained full
appreciation.
Related terms: Algorithms glossary SAR Structure Activity Relationship; Hansch
analysis; Drug discovery &
development drug design; Pharmacogenomics
toxicogenomics
QSPR: Quantitative Structure Property
Relationship
Quantitative 13C NMR Spectrometric
Data-Activity Relationships Modeling QSDAR: NMR &
X-ray crystallography glossary
quantum chemical calculations:
Molecular property calculations
based on the Schrödinger equation, which take into account the interactions
between electrons in the molecule. [IUPAC Computational]
quantum mechanics:
The laws of physics that apply on very small scales. The essential feature is that energy, momentum and angular momentum as well as charge come in discrete amounts called quanta.
More... [SLAC Glossary, Stanford Linear Accelerator Center, Stanford Univ. US] http://www2.slac.stanford.edu/vvc/glossary.html#sectQ
Narrower terms: ab initio quantum mechanical methods, ab initio quantum mechanical
modeling, semi- empirical quantum mechanical methods
RNA computational molecular archaeology: Our
long- term intellectual interest is in identifying novel structural and catalytic RNAs. The "RNA world" hypothesis asserts that an ecosphere of
RNA- based life
preceded protein/ DNA based life, and it is widely argued that many of the
RNA genes (tRNA, rRNA, catalytic introns) that we see today are ancient relics of the RNA world. We hope that we might be able to learn something about the origins of life by identifying new RNA genes and studying their evolutionary history. Screening for new RNA genes is
non- trivial; classical genetics can identify new genes based on their functional phenotype, but not based on what material their product is made of. We think that the best way to discover novel RNA genes is to look for them directly in genome sequence data using computational genetics and algorithmic screens.
[Sean Eddy Lab, Washington Univ. St. Louis, US 2001] http://www.genetics.wustl.edu/eddy/
receptor mapping:
The technique used to describe the geometric and/or
electronic features of a binding site when insufficient structural data for this
receptor or enzyme
are available. Generally the active site cavity is defined by comparing the
superposition of active to that of inactive molecules. [IUPAC Medicinal
Chemistry, IUPAC Compendium]
Over the past ten to fifteen years [before
1987], receptor mapping has expanded from a very minor technique, besieged
by problems and limited in its approach, to one that is widespread, extended
beyond receptors and applied to clinical problems and populations with
modern imaging and scanning techniques. [MJ Kuhar "Imaging receptors for
drugs in neural tissue" Neuropharmacology 1987 Jul. 26 (7B): 911-6]
recognition:
Narrower terms: computational gene recognition, gene recognition, molecular recognition.
recognition site: Pharmaceutical
biology glossary
SAR Structure Activity Relationship: Algorithms
glossary Narrower terms 3D-QSAR, QSAR
SBML Systems Biology Markup Language:
SPC Structure-Property Correlations:
All statistical mathematical
methods used to correlate any molecular property (intrinsic, chemical or
biological) to any other property, using statistical regression or pattern
recognition techniques (Van de Waterbeemd, 1992). QSAR is a
subset of the more general term SPC. [IUPAC Computational]
Narrower terms: 3D QSAR, QSAR
scoring methods: Sequencing
glossary
semi-empirical methods:
Molecular orbital calculations using
various degrees of approximation and
using only valence electrons. [IUPAC Computational]
semi-empirical quantum mechanical methods:
Use parameters derived
from experimental data to simplify computations.
The simplification may occur at various levels: simplification of the Hamiltonian
(e.g. as in the Extended Hückel method), approximate evaluation of
certain molecular integrals (see, for example, zero differential
overlap), simplification of the wave function (for example, use of p electron
approximation as in Pariser-Parr-Pople). [IUPAC Computational]
simulated annealing SA:
A procedure used in molecular dynamics
simulations, in which the system is allowed to equilibrate at high temperatures,
and then cooled down slowly to remove kinetic energy and to permit trajectories
to settle into local minimum energy conformations. [IUPAC Computational]
simulations:
Up until now, biomolecular simulations in drug design
have been of limited use because of the short time scales, long turnaround
times (implying poor sampling), the limited accuracy of simulations alluded
to above, and the relatively small size of systems simulated when one wishes
to account for proper inclusion of the physiological environment like membranes
and solvent. Developing a new drug goes beyond finding binding compounds
and must rely on good properties from the outset: activity, absorption,
distribution, metabolism, excretion. Pharmacological researchers would
like to predict these properties first, before one optimizes activity as
conventionally done, and before analogs are made. ... When sufficient resources
are available, simulations can determine the relative free energy values
of drugs passing through membranes. These values are required to estimate
the bioavailability of drugs. Opportunities in Molecular Biomedicine
in the Era of Teraflop Computing March 3 & 4, 1999, Rockville,
MD, NIH Resource for Macromolecular Modeling and Bioinformatics Beckman
Institute for Advanced Science and Technology, University of Illinois at Urbana-
Champaign
http://www.ks.uiuc.edu/Publications/Reports/teraflop/node4.html
small molecule ligands:
In my talk I will present SMoG- a highly
versatile, fast, and accurate algorithm to design small- molecule ligands for
proteins of known structure. The statistical- mechanical derivation of highly
accurate knowledge- based scoring function will be presented, as well as an
example of a successful application of SMoG where it was used to design novel
ligand for carbonic anhydrase with record potency of 30pm dissociation constant.
Dr. Eugene Shakhnovitch, Harvard University "Focused Combinatorial
Chemistry in Silico: SMoG Algorithm and Its Use to Design Novel Picomolar
Inhibitor for a Known Enzyme" Structure- Based Drug Design
Apr. 18- 19, 2002 Cambridge MA
Related term: Microarrays glossary small
molecule microarrays
spatio
temporal dynamics:
Bioinformatics
glossary
structural homology: Structural
genomics glossary
Structure Activity Relationship SAR: Drug
discovery & development glossary
structure analysis:
The integration of gene identification and
promoter recognition programs will be very important point for a complete analysis. [HGMP training course notes: "Gene Structure Prediction"
Luciano Milanesi, I.T.B.A-CNR, Italy, 1998] http://www.hgmp.mrc.ac.uk/Courses/GeneProteinID/milanesi/milanesi.htm
structure- based design: Drug discovery
& development glossary
structure prediction problem: Structural
genomics
synthetic
biology: A) the design and construction of new
biological parts, devices, and systems, and B) the re-design of existing,
natural biological systems for useful purposes. http://syntheticbiology.org/ts,
devices, and systems, and
Life Reinvented,
Wired on synthetic biology, Jan 2005 http://www.wired.com/wired/archive/13.01/mit.html?pg=1
systems biology: Genetic
manipulation & disruption glossary
three dimensional: See 3D
VRML Virtual Reality Modeling Language:
An open language under
development. [Web3D Consortium] http://www.web3d.org/vrml/vrml.htm
VRML was supposed to be the standard language for V[irtual]
R[eality], but VRML browsers and plug- ins tend to be large.
XML (Extensible Markup Language) is emerging as the most likely
alternative to or fix for VRML. [Mike Hurwicz "Virtual Reality in
VRML or XML?" Web Developer's Journal June 21, 2000] http://www.webdevelopersjournal.com/articles/virtual_reality.html
van der Waals forces: The attractive or repulsive forces
between molecular entities (or between groups within the same molecular
entity) other than those due to bond formation or to the electrostatic
interaction of ions or of ionic groups with one another or with neutral
molecules. ... The term is sometimes used loosely for the totality of nonspecific
attractive or repulsive forces. [IUPAC Compendium]
virtual cancer
patient: Cancer genomics glossary
Virtual
Cell Program:
Jeremy Gunawardena, Harvard Medical
School http://vcp.med.harvard.edu/home.html
virtual cells in silico: Rapid accumulation of biological data from
genome, proteome,
transcriptome and metabolome projects can bring us to the point where it is no longer purely speculative to discuss how to construct virtual cells
in silico. This article describes attempts to construct whole cell models. The E-CELL project has completed a couple of virtual cell models, and computer simulations have revealed some biological surprises.
M. Tomita, "Whole- cell simulation: a grand challenge of the 21st
century" Trends in Biotechnology 19 (6): 205- 210, June 2001 .
Related terms: -Omes & -omics glossary metabolome,
transcriptome Virtual Cell, Dept of Plant Biology, Univ. of Illinois- Urbana Champaign, US http://www.life.uiuc.edu/plantbio/cell/
virtual genomes: A distributed computing project to use protein design
to generate new "virtual genomes." Our project, Genome@home,
studies real genomes and proteins directly, by designing new sequences for
existing 3-D protein structures, which come from real genomes. The protein
structure files that are sent out as work contain the Cartesian atomic
coordinates of a protein. This data was obtained experimentally through X-ray
crystallography or NMR techniques. Note that this was not done by us;
thousands of scientists have spent decades compiling this data, which is
generously made freely available to the public. By designing new sequences that
could form these specific protein structures, we're setting the stage to attack
a number of significant contemporary issues in structural biology, genetics, and
medicine. [Vijay Pande, Pande Group Projects, Stanford Univ. US] http://www.stanford.edu/group/pandegroup/projects.html#design
virtual library: Chemoinformatics
glossary
virtual patient:
See virtual cancer patient: Cancer
genomics glossary
virtual proteomics: See in silico proteomics
virtual screening:
Selection of compounds by evaluating their
desirability in a computational model. Also termed in silico
screening. IUPAC Combinatorial Chemistry
A strategy for
bringing a more focused approach to HTS by using computational analysis to
select a subset of compounds considered to be appropriate for a given receptor.
Clearly, this strategy implies that some information is available regarding
either the nature of the receptor binding site or the type of ligand that is
expected to bind productively, or both. It should be stressed that virtual
screening encompasses a variety of computational screens. B. Waszkowycz, T. D.
J. Perkins, R. A. Sykes, J. Li, Large- scale virtual screening for discovering
leads in the postgenomic era, IBM Systems Journal 40(2) 2001 http://www.research.ibm.com/journal/sj/402/waszkowycz.html
Wikipedia http://en.wikipedia.org/wiki/Virtual_screening
Google = about 9,480
Mar. 1, 2004, about 55,000 March 11, 2005, about 148,000 Aug 12, 2008
Narrower terms: grid
based virtual screening, high throughput virtual screening
Related terms: docking; Pharmaceutical
biology & chemistry ligands, receptors;
Combinatorial libraries &
synthesis glossary
visualization: Algorithms glossary
Bibliography
Molecular
modeling, Folding@home Education@home,,
Stanford Univ. http://www.stanford.edu/group/pandegroup/folding/education/molmodel.html
SLAC Glossary, Stanford Linear Accelerator Center,
Stanford Univ. US, 2002, 300 definitions. http://www2.slac.stanford.edu/vvc/glossary.html
Tollenaere
JP, EE Moret, Hyperglossary of [Molecular Modelling in Drug Design] Terminology,
Utrecht University, 1996. 150+ definitions. http://wwwcmc.pharm.uu.nl/webcmc/glossary.html
not working 11/17/2006
Alpha
glossary index
How
to look for other unfamiliar terms
IUPAC definitions are reprinted with the permission of the International
Union of Pure and Applied Chemistry.
|