You are here Biopharmaceutical Glossary homepage/Search > Informatics > In silico & molecular modeling for pharmaceutical; research

In Silico & molecular drug modeling glossary & taxonomy
Evolving terminology for emerging technologies

Suggestions? Comments? Questions? Mary Chitty  mchitty@healthtech.com
Last revised August 12, 2008 


New Page 1

Please register for CHI's Genomics Glossaries & Taxonomies website. This sign-in box with then disappear from each page, if you accept cookies. Use of this site will continue to be free, but better demographic data on who is accessing this material helps us to justify the expense of maintaining this resource. Registration policy has details.

Registered users of the Genomics Glossaries & Taxonomies will automatically be signed up for CHI's complimentary email monthly newsletter, GenomeLink, unless you choose to opt out of receiving it.

Mr.     Ms.     Mrs.     Dr.     Prof.

First:

         

Last:

Title:

Dept.:

Company:

Address:

City:

State:

Zip:

Country:

Email:

Opt-out of Email

YES    NO

Telephone:

Would you like to receive CHI event updates via fax? 
Yes       No 

Fax:


An understanding of the behavior of biological systems at each level of their organization can only be achieved by careful study of the complex dynamical interactions between the components of these systems. For this understanding to be quantitative it is necessary to develop structurally, biochemically and biophysically detailed mathematical models. Once developed, these models can be simulated, analyzed, and visualized through application of modern engineering and computational approaches.  IBM, Functional Genomics and Systems Biology Overview  http://www.research.ibm.com/FunGen/

Informatics  Map: Finding guide to terms in these glossaries  Site Map
Related glossaries include

Applications Drug discovery & development,   Pharmacogenomics,   Sequencing,   Structural genomics
Informatics
Algorithms,   Cheminformatics,   Computers & computing,   Information management & interpretation,   Databases & software directory 
Biology Pharmaceutical biology,   Protein Structure 

3D protein structure prediction: See protein structure prediction

3D-QSAR Three-Dimensional Quantitative Structure-Activity Relationships:  Involves the analysis of  the quantitative relationship between the biological activity of a set of compounds and their three- dimensional properties using statistical correlation methods. [IUPAC Computational]  

Broader terms: QSAR; Drug discovery & development  SAR Structure Activity Relationship  Narrower terms: Algorithms CoMFA Comparative Molecular Field Analysis Related term Drug discovery & development drug design

ab initio: From the Latin: from the beginning. In modeling refers to models devised without experimental data?

ab initio calculations: Quantum chemical calculations using exact equations with no  approximations which involve the whole electronic population of the molecule. [IUPAC Computational]

ab initio gene prediction: Traditionally, gene prediction programs that rely only on the statistical qualities of exons have been referred to as performing ab initio predictions. Ab initio prediction of coding sequences is an undeniable success by the standards of the machine- learning algorithm field, and most of the widely used gene prediction programs belong to this class of  algorithms. It is impressive that the statistical analysis of raw genomic sequence can detect around 77- 98% of the genes present ...  This is, however, little consolation to the bench biologist, who wants the complete sequences of all genes present, with some certainty about the accuracy of the predictions involved. As Ewan Birney (European Bioinformatics Institute, UK) put it, what looks impressive to the computer scientist is often simply wrong to the biologist. [Meeting report "Gene prediction: the end of the beginning" Colin Semple, Genome Biology 2000 1(2): reports 4012.1-4012.3]    http://www.genomebiology.com/2000/1/2/reports/4012/

All ab initio gene prediction programs have to balance sensitivity against accuracy.

Broader term: gene prediction.

ab initio molecular dynamics: The Parrinello group has applied ab initio Molecular Dynamics (MD) in which all forces were computed quantum- chemically to chemical reactions in general and to biological systems in particular, with results that compared favorably with experiment and older force field methods. The ab initio method was found to be of ``useful accuracy'' for simulations of biomolecules ... With a 1000 times faster computer (relative to 32 processors on a Cray T3E) the dynamics of a quantum- chemical system consisting of up to 10 atoms could be simulated for 10 s. [Opportunities in Molecular Biomedicine in the Era of  Teraflop Computing: Report on a Meeting Held March 3 & 4, 1999 in Rockville, MD, Organized by the NIH Resource for Macromolecular Modeling and Bioinformatics  Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana- Champaign] http://www.ks.uiuc.edu/Publications/Reports/teraflop/node4.html

ab initio protein structure prediction: See Structural genomics glossary

ab initio quantum mechanical methods: Methods of quantum mechanical calculations independent of any experiment other than the determination of  fundamental constants. The methods are based on the use of the full Schrödinger equation to treat all the electrons of a chemical system. In practice, approximations are necessary to restrict the complexity of the electronic wave function and to make its calculation possible. (Synonymous with non- empirical quantum mechanical methods.) [IUPAC Computational]

ab initio quantum mechanical modeling:  The application of ab initio modelling cross diverse fields such as condensed matter physics, materials science and chemistry has been demonstrated over the past 10 years. ... The recent completion of the Human Genome Project will offer an unprecedented number of protein receptors and enzymes as targets for pharmacological intervention in disease processes. However, before this wealth of information can be used to develop pharmaceuticals, an understanding of the biochemistry of the newly identified proteins and their interactions must be obtained. First principles quantum mechanical modelling will play an important role in this process.  [Matthew Segall, Ursula Röthlisberger, Paolo Carloni, CECAM/Psi-k Workshop: Ab Initio Modelling in the Biological Sciences Lyon, France 11-13 June 2001] http://www.tcm.phy.cam.ac.uk/~mds21/Workshop2001/
Scientific/node1.html#SECTION00010000000000000000

alignment: Sequencing glossary

binding site:  Pharmaceutical biology glossary

biocomplexity, biological complexity: Genomics glossary

biomimetic synthesis: Combinatorial libraries & synthesis glossary

CADD: See Computer Assisted Drug Design

CAMD: See Computer Aided Molecular Design, Computer Assisted Molecular Design

CAMM See Computer Assisted Molecular Modeling

cancer- computer simulation: Cancer genomics glossary

chemical similarity: Cheminformatics glossary

ClogP values: Calculated 1-octanol/ water partition coefficients, frequently used in   Structure-Property Correlation (SPC) or quantitative structure-activity relationship (QSAR) studies (Leo, 1993).  [IUPAC Computational]

Logarithm of the partition coefficient.

Comparative Molecular Field Analysis CoMFA: A 3D-QSAR method that uses statistical correlation techniques for the analysis of the quantitative relationship between the biological activity of a set of compounds with a specified alignment, and their three-dimensional electronic and steric properties. Other properties such as hydrophobicity and hydrogen bonding can also be incorporated into the analysis. (See also Three-dimensional Quantitative Structure-Activity Relationship [3D-QSAR]). [IUPAC Medicinal Chemistry]

Uses statistical correlation techniques for the analysis of the quantitative relationship between the biological activity of a set of compounds with a specified alignment, and their three- dimensional electronic and steric properties. Other properties, such as  hydrophobicity and H-bonding can also be incorporated into the analysis (Cramer et al., 1988; Kubinyi, 1993b).  [IUPAC Computational]

Narrower term: topomeric CoMFA

computational biology: Bioinformatics glossary

computational biophysics:  Activities of the Theoretical and Computational Biophysics Group center on the structure and function of supramolecular systems in the living cell, and on the development of new algorithms and efficient computing tools for structural biology.  The Resource brings the most advanced molecular modeling, bioinformatics, and computational technologies to bear on questions of biomedical relevance. Theoretical and Computational Biophysics Group, Univ. of Illinois Urbana Champaign,  About the Group  http://www.ks.uiuc.edu/Overview/intro.html 

Our research focuses on the modeling of large macromolecular systems in realistic environments. These efforts have produced insight into biomolecular processes coupled to mechanical force, bioelectronic processes in metabolism and vision, and the function and mechanism of membrane proteins. Theoretical and Computational Biophysics Group, Univ. of Illinois Urbana Champaign,  Emerging Studies,  http://www.ks.uiuc.edu/Research/Recent/ 

computational chemistry: Chemistry & biology glossary   See also Cheminformatics
Related terms: binding site, molecular graphics, Van der Waals

computational gene recognition: Interpreting nucleotide sequences by computer, in order to provide tentative annotation on the location, structure and functional class of protein- coding genes. [JW Fickett 1996]  

Gene recognition is much more difficult in higher eukaryotes than in prokaryotes, as coding regions (exons) are often interrupted by non- coding regions (introns) and genes are highly variable in size.  This is particularly so for human genes. As someone remarked recently people have non- coding regions occasionally interrupted by genes.

Broader terms: gene recognition, molecular recognition.

computational genomics: Our laboratory develops new machine learning techniques and algorithms to model the transcriptional regulatory networks that control gene expression programs in living cells. We have a very productive interdisciplinary collaboration with leading biologists that has allowed us to tackle extraordinarily difficult and interesting problems that underlie cellular function and development. Computational Genomics Research Group, C SAIL, MIT  http://www.psrg.csail.mit.edu/ 

Google = about 5,670 July 19, 2002; about 19,500 July 26, 2004, about 454,000 May 7, 2007

Computational analysis of microarray data, John Quackenbush, Nature Reviews 2, 418- 427, June 2001 http://www.nature.com/cgi-taf/DynaPage.taf?file=/nrg/journal/v2/n6/full/nrg0601_418a_fs.html

Related terms: Expression glossaryMicroarrays glossary 

computational modeling: See ab initio modeling, homology modeling, molecular modeling.

computational physiology: The International Union of Physiological Sciences (IUPS) Physiome Project is an internationally collaborative open- source project to provide a public domain framework for computational physiology, including the development of modeling standards, computational tools and web-accessible databases of models of structure and function at all spatial scales [1,2,3]. It aims to develop an infrastructure for linking models of biological structure and function across multiple levels of spatial organization and multiple time scales. The levels of biological organisation, from genes to the whole organism, includes gene regulatory networks, protein- protein and protein- ligand interactions, protein pathways, integrative cell function, tissue and whole heart structure- function relations. The whole heart models include the spatial distribution of protein expression. Keynote: Peter J. Hunter, Univ of Auckland, International Society of Computational Biology, Detroit, MI, 2005 http://www.iscb.org/ismb2005/keynotes.html 

computational quantum chemistry: Chemistry & biology glossary

computational video: Computers & computing glossary

Computer Aided Molecular Design (CAMD): Involves all computer-assisted techniques used to discover, design and optimize compounds with desired structure and properties.  [IUPAC Combinatorial]

Also known as molecular modeling or computational chemistry, uses computers to analyze and model the physicochemical properties of a molecule. CAMD programs allow integrated molecular design to take drug discovery to a new level by using a more cross-functional team approach to drug research and development.  [Oxford Molecular]

Computer-Assisted Drug Design CADD: Involves all computer- assisted techniques used to discover, design and optimize biologically active compounds with a putative use as drugs.   [IUPAC Computational]

Broader term: Drug discovery & development glossary  drug design 

Computer-Assisted Molecular Design CAMD: Involves all computer-assisted techniques used to discover, design and optimize compounds with desired structure and properties.  [IUPAC Computational]

Computer-Assisted molecular modeling CAMM:  The investigation of molecular structures and properties using computational chemistry and graphical visualization techniques.  [IUPAC Computational]

conformational analysis: Consists of the exploration of energetically favorable spatial arrangements (shapes) of a molecule (conformations) using molecular mechanicsmolecular dynamics, quantum chemical calculations or analysis of  experimentally- determined structural data, e.g., NMR or crystal structures.

Molecular mechanics and quantum chemical methods are employed to compute conformational energies, whereas systematic and random searches, Monte Carlo, molecular dynamics, and distance geometry are methods (often combined with energy minimization procedures) used to explore the conformational space. IUPAC Computational]

decoys: Potential energy functions to fold proteins are usually designed by a learning approach. A learning algorithm is presented with a large set of wrong shapes [decoys] and a few native sequences. The energy function is trained on the set to recognize the few correct folds and is used and tested on other proteins that were not included in the training set.  [Opportunities in Molecular Biomedicine in the Era of  Teraflop Computing: March 3 & 4, 1999, Rockville, MD, NIH Resource for Macromolecular Modeling and Bioinformatics  Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana- Champaign] http://www.ks.uiuc.edu/Publications/Reports/teraflop/node4.html

docking: Three- dimensional molecular structure is one of the foundations of structure- based drug design. Often, data are available for the shape of a protein and a drug separately, but not for the two together.  The program AutoDock was originally written in FORTRAN-77 in 1990 by David S. Goodsell here in Arthur J. Olson's laboratory.  It performs automated docking of ligands (small molecules like a candidate drug) to their macromolecular targets (usually proteins, sometimes DNA) [Garrett B. Morris, “Molecular docking web”, Scripps, Dec. 2000] http://www.scripps.edu/pub/olson-web/people/gmm/index.html

Wikipedia http://en.wikipedia.org/wiki/Docking_%28molecular%29 

Narrower term: pharmacophore based docking

docking programs: Programs for evaluating lead compounds against target proteins; these programs are “informed” by structure data. [CHI Structural proteomics report]

Traditional ligand- docking programs - such as DOCK, developed by Irwin Kuntz at the University of California at Berkeley; MacroModel, developed by Clark Still at Columbia University; and GOLD from MSI (now part of Pharmacopeia) - give information about potential ligands for a known protein structure.  These programs select molecules predicted to be highly complementary to the receptor structure and can screen many of these ligands against the protein.  This type of virtual screening technology  has already been incorporated into many major pharmaceutical companies’ discovery programs and offers the ability to screen many more compounds at once than the traditional laboratory- based method.  [CHI Structural proteomics report]

docking studies: Computational techniques for the exploration of the possible binding modes of a substrate to a given receptor, enzyme or other binding site. [IUPAC Computational] Related terms: drug design, QSAR Pharmaceutical biology glossary.

drug design: See structure-based drug design Drug discovery & development glossary    Related terms: 3D QSAR, QSAR Algorithms glossary and Data & information management glossary.

dynamic modeling: Mathematical approaches to studying biological variation have changed little in several decades. There is a need to develop new dynamic models to illuminate how systems interact and evolve. Just as important, it is critical to study the nature of biological and mathematical assumptions of models and statistics. Tools for analyzing and interpreting data on the architecture of complex phenotypes should be developed in the context of real biological information. Genetic Architecture, Biological Variation and Complex Phenotypes, PA-02-110, May 29, 2002- June 5, 2005 http://grants1.nih.gov/grants/guide/pa-files/PA-02-110.html

dynamic programming methods:  Sequencing glossary

energy function: Computationally, a shape is assigned to a protein sequence based on an empirical energy function. The lower the energy of a given structure, the more likely it is to be the correct fold. The structure prediction challenge is therefore divided into two: (1) The first challenge is the creation of many plausible folds or a set of structures that will include the native shape. The creation of the appropriate set depends on existing databases (such as the Protein Data Bank) or on the design of automated algorithms (using physical or statistical information) to generate plausible folds. Once the set is available, a selection procedure is used to ``fish'' out the correct fold. (2) The ``fishing'' of the plausible native shapes critically depends on the quality of the energy function. The value of the energy function must be the lowest for the native structure.  [Opportunities in Molecular Biomedicine in the Era of  Teraflop Computing: March 3 & 4, 1999, Rockville, MD, NIH Resource for Macromolecular Modeling and Bioinformatics  Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana- Champaign] http://www.ks.uiuc.edu/Publications/Reports/teraflop/node4.html

exon parsing: Identifying precisely the 5' and 3' boundaries of genes (the transcription unit) in metazoan genomes, as well as the correct sequences of the resulting mRNA ("exon parsing") has been a major challenge of bioinformatics for years. Yet, the current program performances are still totally insufficient for a reliable automated annotation (Claverie 1997; Ashburner 2000). It is interesting to recapitulate quickly the research in this area to illustrate the essential limitation plaguing modern bioinformatics. Encoding a protein imposes a variety of constraints on nucleotide sequences, which do not apply to noncoding regions of the genome. These constraints induce statistical biases of various kinds, the most discriminant of which was soon recognized to be the distribution of six nucleotide- long "words" or hexamers. Claverie and Bougueleret 1986; Fickett and Tung 1992).  [JM Claverie "From Bioinformatics to Computational Biology" Genome Res 10: (9) 1277- 1279 Sept. 2000

exon prediction:  Since prokaryotes don't have introns, exon prediction implies working with eukaryotes. Is exon prediction equivalent to gene prediction in prokaryotes?  Related terms: ab initio gene prediction; GRAIL Sequencing glossary

flexible ligands: See under protein flexibility modeling:

force field: A set of functions and parametrization used in molecular mechanics  calculations.  [IUPAC Computational]

Long-time simulations will pose a challenging benchmark for the force fields employed in molecular modeling. One question is, how will proteins and DNA that were described by the available force fields (and remained stable over nanosecond periods) behave in microsecond simulations? The high cost of long- time simulations will require that the issue is addressed in a systematic way by providing standard cases against which simulations can be tested [Opportunities in Molecular Biomedicine in the Era of  Teraflop Computing: March 3 & 4, 1999, Rockville, MD, NIH Resource for Macromolecular Modeling and Bioinformatics  Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana- Champaign] http://www.ks.uiuc.edu/Publications/Reports/teraflop/node4.html

Related term: van der Waals

gene finding programs: http://cmgm.stanford.edu/classes/genefind/ Bioinformatics Resource, Center for Molecular and Genetic Medicine, Stanford Univ. School of Medicine. List of programs has been compiled and updated from James W. Fickett, "Finding genes by computer: the state of the art" Trends in Genetics, August 1996, 12 (8) 316- 320

gene identification: Using marker SNPs to hone in on otherwise hard to find genes.

The effectiveness of finding genes by similarity to a given sequence segment is determined by a much simpler statistic, the total  coverage of the genome by the collective set of sequence contigs. As the overall coverage of the genome is virtually complete (> 90%), there is a strong likelihood that every gene is represented, at least in part, in the data. Thus, finding any gene by  sequence similarity searches using sufficient sequence to ensure significance is almost always possible using the data published  this week. Caution must be exercised, however, as the identification of the gene may still be ambiguous. This is because a  highly similar sequence from a receptor gene from Drosophila, for example, could be found in several different, homologous  genes, which may have similar or entirely different functions or are nonfunctioning pseudogenes. In other words, common  domains or motifs can be present in many different genes. The use of the approximate similarity search tool BLAST is probably still the best way to find similar sequences. [David Galas "Making Sense of the Sequence" Science 291: 12257-1260 Feb. 16, 2001]

Genes (and their corresponding mRNAs and proteins) are identified by aligning reference sequences (RefSeq), GenBank, mRNAs, and ESTs to the genome sequence using a program called Acembly. Acembly takes advantage of paired EST reads, measured clone lengths, and polyA tails. Transcript models are reconstructed by attempting to settle disagreements between individual sequence alignments without using an a priori model (such as codon usage, initiation, or polyA signals). In practice, there is an initial low stringency analysis followed by a clean up procedure which keeps the best hits.  ... An obvious challenge in using alignments to annotate genes is the treatment of sequence differences between the mRNA and genomic sequence. These differences could represent sequencing errors, assembly errors, naturally occurring polymorphisms, or paralogs. It is difficult to resolve these differences automatically; therefore the default treatment is to provide the mRNA and protein sequence that corresponds to the genomic sequence. The only exception is where a sequence difference changes the reading frame relative to the supporting mRNA and EST data; then the genomic sequence is frameshifted to provide the protein product that corresponds to the mRNA data. [NCBI Contig Assembly and Annotation Process, 2001]  http://www.ncbi.nlm.nih.gov/genome/guide/build.html#contig

There are two basic approaches to gene identification: by homology and ab initio approaches.

gene parsing:  Initial gene parsing methods were then simply based on word frequency computation, eventually combined with the detection of splicing consensus motifs. The next generation of software implemented the same basic principles into a simulated neural network architecture (Uberbacher and Mural 1991). Finally, the last generation of software, based on Hidden Markov Models, added an additional refinement by computing the likelihood of the predicted gene architectures (e.g., favoring human genes with an average of seven coding exons, each 150 nucleotides long) is added (Kulp et al. 1996; Burge and Karlin, 1997)). These ab initio methods are used in conjunction with a search for sequence similarity with previously characterized genes or expressed sequence tags (EST). [JM Claverie "From Bioinformatics to Computational Biology" Genome Res 10: (9) 1277- 1279.Sept. 2000]  http://igs-server.cnrs-mrs.fr/igs/abstract/an2000/abstract13.html  

gene prediction: Wikipedia http://en.wikipedia.org/wiki/Gene_finding 

One of the first useful products from the human genome will be a set of predicted genes. Besides its intrinsic scientific interest, the accuracy and completeness of this data set is of considerable importance for human health and medicine. Though progress has been made on computational gene identification during the past decade, the accuracy of gene prediction tools is not sufficient to locate the genes reliably in higher eukaryotic genomes. Thus, while the precise sequence of the human genome is increasingly deciphered, gene number estimations are becoming increasingly variable. ... In 1996 we published a comprehensive evaluation of gene prediction programs accuracy (Burset and Guigó, 1996). ... Recently  we have published a revised version of this evaluation (Guigó et al., 2000). This revised evaluation suggest that though gene prediction will improve with every new protein that is discovered and through improvements in the current set of tools, we still have a long way to go before we can decipher the precise exonic structure of every gene in the human genome using purely computational methodology. Genome Bioinformatics Research Lab, Center for Genomic Regulation (Centre de Regulació Genòmica - CRG, Barcelona, 2004  http://genome.imim.es/research/eval.html 

Many methods for predicting genes are based on compositional signals that are found in the DNA sequence. These methods detect characteristics that are expected to be associated with genes, such as splice sites and coding regions, and then piece this information together to determine the complete or partial sequence of a gene. Unfortunately, these ab initio methods tend to produce false positives, leading to overestimates of gene numbers, which means that we cannot confidently use them for annotation. They also do not work well with unfinished sequence that has gaps and errors, which may give rise to frameshifts, when the reading frame of the gene is disrupted by the addition or removal of bases. ... The most effective algorithms integrate gene- prediction methods with similarity comparisons.... The most powerful tool for finding genes may be other vertebrate genomes. Comparing conserved sequence regions between two closely related organisms will enable us to find genes and other important regions in both genomes with no previous knowledge of the gene content of either.  [Ewan Birney et. al "Mining the draft human genome" Nature 409: 827-828 15 Feb. 2001]  

Sadly, it is often claimed that matching back cDNA to genomic sequences is the best gene identification protocol; hence, admitting that the best way to find genes is to look them up in a previously established catalog! Thus, the two main principles behind state- of- the- art gene prediction software are (1) common statistical regularities and (2) plain sequence similarity. From an epistemological point of view, those concepts are quite primitive. [JM Claverie "From Bioinformatics to Computational Biology" Genome Res 10: (9) 1277- 1279.Sept. 2000]  http://igs-server.cnrs-mrs.fr/igs/abstract/an2000/abstract13.html  

Algorithms have been developed and are combined to recognize gene structural components.

Narrower/synonymous? term: ab initio gene prediction Related term: comparative genomics

gene prediction validation: 

gene recognition: Principally used for finding open reading frames, tools of this type also recognize a number of features of  genes, such as regulatory regions, splice junctions, transcription and  translation stops and starts, GC islands, and poly adenylation sites. [Laura De Francesco "Some things considered" Scientist 12[20]:18, Oct. 12, 1998] http://www.the-scientist.com/yr1998/oct/profile1_981012.html

granularity: Information management  & interpretation glossary

Hidden Markov Models HMM: Searching a protein sequence database for homologues is a powerful tool for discovering the structure and function of a sequence. Amongst the algorithms and tools available for this task, Hidden Markov model (HMM) - based search methods improve both the sensitivity and selectivity of database searches by employing position- dependent scores to characterize and build a model for an entire family of sequences. HMMs have been used to analyze proteins using two complementary strategies. In the first, a sequence is used to a search a collection of protein families, such as Pfam, to find which of the families it matches. In the second approach an HMM for a family is used to search a primary sequence database to identify additional members of the family. The latter approach has yielded insights into protein involved in both normal and abnormal human pathology. [Lawrence Berkeley Lab, US "Advanced Computational Structural Genomics"]  http://cbcg.lbl.gov/ssi-csb/Meso.html

A widely used probabilistic model for data that are observed in a sequential fashion (e.g., over time). A HMM makes two primary assumptions. The first assumption is that the observed data arise from a mixture of K probability distributions. The second assumption is that there is a discrete- time Markov chain with K states, which is generating the observed data by visiting the K distributions in Markov fashion. The "hidden" aspect of the model arises from the fact that the state- sequence is not directly observed. Instead, one must infer the state- sequence from a sequence of observed data using the probability model. Although the model is quite simple, it has been found to be very useful in a variety of sequential modeling problems, most notably in SPEECH RECOGNITION (Rabiner 1989) and more recently in other disciplines such as computational biology (Krogh et al. 1994). [MITECS Online MIT Encyclopedia of the Cognitive Sciences  http://cognet.mit.edu/MITECS/Entry/pearl.html

homology model, homology modeling: Structural genomics glossary

immersive virtual reality: Cheminformatics glossary

in silico: Literally "in the computer". 

In a white paper I wrote for the European Commission in 1988 I advocated the funding of genome programs, and in particular the use of computers. In this endeavour I coined "in silico" following "in vitro" and "in vivo" I think that the first public use of the word is in the following paper: A. Danchin, C. Médigue, O. Gascuel, H. Soldano, A. Hénaut, From data banks to data bases. Res. Microbiol. (1991) 142: 913- 916.  You can find a developed account of this story in my book The Delphic Boat, Harvard University Press, 2003 [personal communication Antoine Danchin, Institute Pasteur, 2003]

Narrower terms: in silico biology, in silico modeling, in silico proteomics, in silico screening, in silico target discovery; Cell biology virtual cells in silico; Related terms: Chemoinformatics glossary rules of five 

in silico biology: The considerable "algorithmic complexity" of biological systems requires a huge amount of detailed information for their complete description. Although far from being complete, the overwhelming quantity of small pieces of information gathered for all kind of biological systems at the molecular and cellular level requires computational tools to be adequately stored and interpreted. Interpretation of data means to abstract them as much as allowed to provide a systematic, an integrative view of biology. Most of the presently available scientific journals focus either on accumulating more data from elaborate experimental approaches, or on presenting new algorithms for the interpretation of these data. Both approaches are meritorious. However, since both communities do not interact much with each other, neither the experimental nor the computational biologists really apply the theoretical tools to that extent which would be possible and desirable to achieve that progress of research which is already feasible. ["Aims and Scope" In Silico Biology: An international journal of computational biology] http://www.bioinfo.de/isb/aims.html

Related terms: in silico, virtual cells 

in silico modeling: Modeling of biological pathways and other biological processes for drug discovery and development. Given the enormous increase in genetic and molecular data, such models will continue to improve and are predicted to become an essential tool for evaluating hypotheses, with only the more promising ones being subjected to empirical testing. 

in silico proteomics: Prediction of protein structure and function. [Gareth W. Roberts and Jonathan Swinton "In Silico Proteomics: Playing by the rules" Current Drug Discovery 5: Aug. 1, 2001] http://www.current-drugs.com/CDD/CDD/CDDPDF/issue%205/Roberts.pdf

in silico screening: See also virtual screening 

Google = about 1,780 Mar. 1, 2004; about 14,500 Aug 12, 2008

in silico transcriptomics:  Omes & -omics glossary

ligand binding: One of the biggest challenges in computational drug design is the accurate calculation of the free energy of binding of small ligands. Currently, typical errors in these calculations make them unusable to distinguish between strong binders (which would potentially make good drugs) and non- specific binders (which wouldn't). We are using distributed computing methods to greatly increase the accuracy of such calculations.   [Vijay Pande, Pande Group Projects, Stanford Univ. US]  http://www.stanford.edu/group/pandegroup/projects.html#ligandbinding

Related terms: Drug discovery & development: drug design, molecular design;  Pharmaceutical biology glossary binding site, ligand

ligand design: Drug discovery & development:  

ligand docking: See under docking.

molecular dynamics: A simulation procedure consisting of the computation of the motion of  atoms in a molecule or of individual atoms or molecules in solids, liquids and gases, according to Newton's laws of motion. The forces acting on the atoms, required to simulate their motions, are generally calculated using molecular mechanics force fields.  [IUPAC Computational]

Narrower term: ab initio molecular dynamics

molecular geometry:  http://www.ics.uci.edu/~eppstein/gina/molmod.html

molecular graphics: A technique for the visualization and manipulation of molecules on a graphical display device. [IUPAC Computational]

molecular mechanics: The calculation of  molecular conformational geometries and energies using a combination of empirical force fields (Burkert and Allinger, 1982).

Method of calculation of  geometrical and energy characteristics of  molecular entities on the basis of empirical potential functions (see force field) the form of  which is taken from classical mechanics. The method implies transferability of the potential  functions within a network of similar molecules. An assumption is made on "natural” bond lengths and angles, deviations from which result in bond and angle strain  respectively. Repulsive or attractive van der Waals and electrostatic forces between nonbonded atoms are also taken into account. Synonymous with force field method. [IUPAC Computational]

Related terms: decoys, energy function, force fields

molecular mimicry: Drug discovery & development glossary

molecular modeling, molecular modelling: A technique for the investigation of molecular structures and  properties using computational chemistry and graphical visualization techniques in order to provide a plausible three- dimensional representation under a given set of  circumstances. [IUPAC Medicinal Chemistry, IUPAC Computational]

The scope note for the Journal of Molecular Modeling includes the following subjects: computer- aided molecular design, rational drug design, de novo ligand design and receptor modeling, · application of computational and modeling methods in the field of medical chemistry, protein and peptide modeling, quantum chemistry, application of semi empirical, DFT and ab initio calculations, · prediction of biological activities (QSAR) and physico- chemical properties (QSPR), molecular mechanics/ dynamics simulation of polymers and biopolymers, genetic algorithms and neural nets, modeling of catalysts, advanced materials, and stationary phases in separation science, enhanced desktop computational tools for the life sciences visualisation, classification and handling of chemical data.  http://link.springer.de/link/service/journals/00894/aims.htm

Molecular modeling applications use falls into two broad categories: interactive visualization and computational analyses. ... Three of the most prominent uses of modern molecular modeling applications are structure analysis, homology modeling, and docking ... in essence, objective modeling revolves around three different approaches (each based on different underlying physical and chemical theories): molecular dynamics, molecular mechanics, and quantum mechanics . All of these are concerned with developing a unique solution to what is referred to as the "protein folding" problem - designing and testing algorithms and applications that will reliably predict 3-D structure from primary sequence. [Christopher Smith "Molecular Modeling - Seeing the Whole Picture with Modeling Software Packages" Scientist 12[17]:0, Aug. 31, 1998] http://www.the-scientist.com/yr1998/august/profile2_980831.html

Molecular modeling software includes AMBER, DOCK, MODELER, RasMol and many other programs. 

Related terms: computational chemistry, Computer Assisted Drug Design; molecular graphics,  molecular dynamics, molecular mechanics.

molecular models: Models used experimentally or theoretically to study molecular shape, electronic properties, or interactions; includes analogous molecules, computer generated graphics, and mechanical structures. MeSH, 1984

molecular recognition: Drug discovery and development glossary

Monte Carlo technique: A simulation procedure consisting of randomly sampling the conformational space of a molecule. [IUPAC Computational] 

Broader term: simulation

ORF prediction: Related terms: exon prediction, gene prediction, gene recognition.

ORF recognition: ESTs provide candidate genes, useful in positional cloning (during walks and for recognizing ORFs) and for ORF recognition in cloning of insertion sites. [Report from the Workshop on Genomic and Genetic Tools for the Zebrafish May 10-11, 1999, Trans- NIH Zebrafish Initiative] http://www.nih.gov/science/models/zebrafish/reports/genomic-genetic.html

parsing: Algorithms glossary Narrower terms: exon parsing, gene parsing, protein structure domain parsing

pathway & disease modeling: Expression glossary

peptidomimetic: Drug discovery & development glossary

phenomics: -Omes & -omics glossary 

prediction: Narrower terms: exon prediction, gene prediction, ORF prediction, protein sequence prediction;  Structural genomics glossary protein structure prediction; Related terms: recognition

protein structure prediction:  Structural genomics glossary

Quantitative Structure-Activity Relationships QSAR:: Mathematical relationships linking chemical structure and pharmacological activity in a quantitative manner for a series of compounds. Methods which can be used in QSAR include various regression and pattern recognition techniques. QSAR is often taken to be equivalent to chemometrics or multivariate statistical data analysis.  It is sometimes used in a more limited sense as equivalent to Hansch analysis. QSAR is a subset of the more general term SPC.  [IUPAC Computational]

The building of structure – biological activity models by using regression analysis with physicochemical constants, indicator variables or theoretical calculations. The term has been extended by some authors to include chemical reactivity, i.e. activity is regarded as synonymous with reactivity. This extension is, however, discouraged. Related term: correlation analysis. [IUPAC Compendium]

A quantitative prediction of the biological, ecotoxicological or pharmaceutical activity of a molecule. It is based upon structure and activity information gathered from a series of similar compounds. MeSH, 2001

QSARs attempt to correlate chemical structure with activity using statistical approaches. The QSAR models are useful for various purposes including the prediction of activities of untested chemicals. Quantitative structure- activity relationships and other related approaches have attracted broad scientific interest, particularly in the pharmaceutical industry for drug discovery and in toxicology and environmental science for risk assessment. An assortment of new QSAR methods have been developed during the past decade, most of them focused on drug discovery. Besides advancing our fundamental knowledge of QSARs, these scientific efforts have stimulated their application in a wider range of disciplines, such as toxicology, where QSARs have not yet gained full appreciation.

Related terms: Algorithms glossary SAR Structure Activity Relationship;  Hansch analysis; Drug discovery & development drug design; Pharmacogenomics  toxicogenomics

QSPR: Quantitative Structure Property Relationship 

Quantitative 13C NMR Spectrometric Data-Activity Relationships Modeling QSDAR: NMR & X-ray crystallography glossary

quantum chemical calculations: Molecular property calculations based on the Schrödinger equation, which take into account the interactions between electrons in the molecule. [IUPAC Computational]

quantum mechanics: The laws of physics that apply on very small scales. The essential feature is that energy, momentum and angular momentum as well as charge come in discrete amounts called quanta.  More... [SLAC Glossary, Stanford Linear Accelerator Center, Stanford Univ. US] http://www2.slac.stanford.edu/vvc/glossary.html#sectQ

Narrower terms: ab initio quantum mechanical methods, ab initio quantum mechanical modeling, semi- empirical quantum mechanical methods

RNA computational molecular archaeology:  Our long- term intellectual interest is in identifying novel structural and catalytic RNAs. The "RNA world" hypothesis asserts that an ecosphere of RNA- based life preceded protein/ DNA based life, and it is widely argued that many of the RNA genes (tRNA, rRNA, catalytic introns) that we see today are ancient relics of the RNA world. We hope that we might be able to learn something about the origins of life by identifying new RNA genes and studying their evolutionary history. Screening for new RNA genes is non- trivial; classical genetics can identify new genes based on their functional phenotype, but not based on what material their product is made of. We think that the best way to discover novel RNA genes is to look for them directly in genome sequence data using computational genetics and algorithmic screens. [Sean Eddy Lab, Washington Univ. St. Louis, US 2001]  http://www.genetics.wustl.edu/eddy/

receptor mapping: The technique used to describe the geometric and/or electronic features of a binding site when insufficient structural data for this receptor or enzyme are available. Generally the active site cavity is defined by comparing the superposition of active to that of inactive molecules. [IUPAC Medicinal Chemistry, IUPAC Compendium]

 Over the past ten to fifteen years [before 1987], receptor mapping has expanded from a very minor technique, besieged by problems and limited in its approach, to one that is widespread, extended beyond receptors and applied to clinical problems and populations with modern imaging and scanning techniques. [MJ Kuhar "Imaging receptors for drugs in neural tissue"  Neuropharmacology 1987 Jul. 26 (7B): 911-6]

recognition: Narrower terms: computational gene recognition, gene recognition, molecular recognition.

recognition site: Pharmaceutical biology glossary

SAR Structure Activity Relationship: Algorithms glossary  Narrower terms 3D-QSAR, QSAR

SBML Systems Biology Markup Language:   

SPC Structure-Property Correlations: All statistical mathematical methods used to correlate any molecular property (intrinsic, chemical or biological) to any other property, using statistical regression or pattern recognition techniques (Van de Waterbeemd, 1992). QSAR is a subset of the more general term SPC.  [IUPAC Computational]

Narrower terms: 3D QSAR, QSAR

scoring methods: Sequencing glossary

semi-empirical methods: Molecular orbital calculations using various degrees of  approximation and using only valence electrons. [IUPAC Computational]

semi-empirical quantum mechanical methods: Use parameters derived from experimental data to simplify computations. The simplification may occur at various levels: simplification of the Hamiltonian (e.g. as in the Extended Hückel method), approximate evaluation of certain molecular integrals (see, for example, zero  differential overlap), simplification of the wave function (for example, use of p electron approximation as in Pariser-Parr-Pople). [IUPAC Computational]

simulated annealing SA: A procedure used in molecular dynamics simulations, in which the system is allowed to equilibrate at high temperatures, and then cooled down slowly to remove kinetic energy and to permit trajectories to settle into local minimum energy conformations.  [IUPAC Computational]   

simulations: Up until now, biomolecular simulations in drug design have been of limited use because of the short time scales, long turnaround times (implying poor sampling), the limited accuracy of simulations alluded to above, and the relatively small size of systems simulated when one wishes to account for proper inclusion of the physiological environment like membranes and solvent. Developing a new drug goes beyond finding binding compounds and must rely on good properties from the outset: activity, absorption, distribution, metabolism, excretion. Pharmacological researchers would like to predict these properties first, before one optimizes activity as conventionally done, and before analogs are made. ... When sufficient resources are available, simulations can determine the relative free energy values of drugs passing through membranes. These values are required to estimate the bioavailability of drugs. Opportunities in Molecular Biomedicine in the Era of  Teraflop Computing March 3 & 4, 1999,  Rockville, MD, NIH Resource for Macromolecular Modeling and Bioinformatics  Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana- Champaign http://www.ks.uiuc.edu/Publications/Reports/teraflop/node4.html

small molecule ligands:  In my talk I will present SMoG- a highly versatile, fast, and accurate algorithm to design small- molecule ligands for proteins of known structure. The statistical- mechanical derivation of highly accurate knowledge- based scoring function will be presented, as well as an example of a successful application of SMoG where it was used to design novel ligand for carbonic anhydrase with record potency of 30pm dissociation constant. Dr. Eugene Shakhnovitch, Harvard University "Focused Combinatorial Chemistry in Silico: SMoG Algorithm and Its Use to Design Novel Picomolar Inhibitor for a Known Enzyme"  Structure- Based Drug Design Apr. 18- 19, 2002 Cambridge MA

Related term: Microarrays glossary small molecule microarrays

spatio temporal dynamics:  Bioinformatics glossary

structural homology: Structural genomics glossary

Structure Activity Relationship SAR: Drug discovery & development glossary

structure analysis: The integration of gene identification and promoter recognition programs will be very important point for a complete  analysis. [HGMP training course notes: "Gene Structure Prediction" Luciano Milanesi, I.T.B.A-CNR,  Italy, 1998] http://www.hgmp.mrc.ac.uk/Courses/GeneProteinID/milanesi/milanesi.htm

structure- based design: Drug discovery & development glossary

structure prediction problem: Structural genomics

synthetic biology:  A) the design and construction of new biological parts, devices, and systems, and B) the re-design of existing, natural biological systems for useful purposes. http://syntheticbiology.org/ts, devices, and systems, and
Life Reinvented, Wired on synthetic biology, Jan 2005 http://www.wired.com/wired/archive/13.01/mit.html?pg=1

systems biology: Genetic manipulation & disruption glossary

three dimensional: See 3D

VRML Virtual Reality Modeling Language: An open language under development [Web3D Consortium] http://www.web3d.org/vrml/vrml.htm

VRML was  supposed to be  the standard language for V[irtual] R[eality], but VRML  browsers and  plug- ins tend to be large. XML (Extensible  Markup  Language) is emerging as the most likely  alternative to or fix for VRML.  [Mike Hurwicz "Virtual Reality in VRML or XML?" Web Developer's Journal June 21, 2000]  http://www.webdevelopersjournal.com/articles/virtual_reality.html

van der Waals forces:  The attractive or repulsive forces between molecular entities (or between groups within the same molecular entity) other than those due to bond formation or to the electrostatic interaction of ions or of ionic groups with one another or with neutral molecules. ... The term is sometimes used loosely for the totality of nonspecific attractive or repulsive forces. [IUPAC Compendium]

virtual cancer patient: Cancer genomics glossary

Virtual Cell Program:  Jeremy Gunawardena, Harvard Medical School   http://vcp.med.harvard.edu/home.html 

virtual cells in silico: Rapid accumulation of biological data from genome, proteome, transcriptome and metabolome projects can bring us to the point where it is no longer purely speculative to discuss how to construct virtual cells in silico. This article describes attempts to construct whole cell models. The E-CELL project has completed a couple of virtual cell models, and computer simulations have revealed some biological surprises. M. Tomita, "Whole- cell simulation: a grand challenge of the 21st century" Trends in  Biotechnology 19 (6): 205- 210, June 2001 . 

Related terms: -Omes & -omics glossary metabolome, transcriptome

Virtual Cell, Dept of Plant Biology, Univ. of Illinois- Urbana Champaign, US http://www.life.uiuc.edu/plantbio/cell/

virtual genomes: A distributed computing project to use protein design to generate new "virtual genomes."  Our project, Genome@home, studies real genomes and proteins directly, by designing new sequences for existing 3-D protein structures, which come from real genomes. The protein structure files that are sent out as work contain the Cartesian atomic coordinates of a protein. This data was obtained experimentally through X-ray crystallography or NMR techniques. Note that this was not done by us; thousands of scientists have spent decades compiling this data, which is generously made freely available to the public. By designing new sequences that could form these specific protein structures, we're setting the stage to attack a number of significant contemporary issues in structural biology, genetics, and medicine. [Vijay Pande, Pande Group Projects, Stanford Univ. US]  http://www.stanford.edu/group/pandegroup/projects.html#design

virtual library: Chemoinformatics glossary

virtual patient: See virtual cancer patient: Cancer genomics glossary

virtual proteomics: See in silico proteomics

virtual screening: Selection of compounds by evaluating their desirability in a computational model. Also termed in silico screening. IUPAC Combinatorial Chemistry

A strategy for bringing a more focused approach to HTS by using computational analysis to select a subset of compounds considered to be appropriate for a given receptor. Clearly, this strategy implies that some information is available regarding either the nature of the receptor binding site or the type of ligand that is expected to bind productively, or both. It should be stressed that virtual screening encompasses a variety of computational screens. B. Waszkowycz, T. D. J. Perkins, R. A. Sykes, J. Li, Large- scale virtual screening for discovering leads in the postgenomic era, IBM Systems Journal 40(2) 2001 http://www.research.ibm.com/journal/sj/402/waszkowycz.html 

Wikipedia http://en.wikipedia.org/wiki/Virtual_screening 

Google = about 9,480 Mar. 1, 2004, about 55,000 March 11, 2005, about 148,000 Aug 12, 2008

Narrower terms: grid based virtual screening, high throughput virtual screening
Related terms: docking; Pharmaceutical biology & chemistry ligands, receptors; Combinatorial libraries & synthesis glossary 

visualization: Algorithms glossary

Bibliography
Molecular modeling, Folding@home Education@home,, Stanford Univ. http://www.stanford.edu/group/pandegroup/folding/education/molmodel.html 
SLAC Glossary, Stanford Linear Accelerator Center, Stanford Univ. US, 2002, 300 definitions.   http://www2.slac.stanford.edu/vvc/glossary.html
Tollenaere JP, EE Moret, Hyperglossary of [Molecular Modelling in Drug Design] Terminology, Utrecht University, 1996. 150+ definitions. http://wwwcmc.pharm.uu.nl/webcmc/glossary.html  not working 11/17/2006 

Alpha glossary index

How to look for other unfamiliar  terms

IUPAC definitions are reprinted with the permission of the International Union of Pure and Applied Chemistry.

 

Contact | Privacy Statement | Alphabetical Glossary List | Tips & glossary FAQs | Site Map