You are here >
Genomics
& bioinformatics (and beyond) home page Overviews:
Bioinformatics, cheminformatics and beyond
Bioinformatics
in drug discovery & Development
not being updated
Mary Chitty mchitty@healthtech.com
781 972 5416
Overviews & introductions Bioinformatics
cheminformatics Molecular Medicine informatics
Information
resources Bioinformatics
Cheminformatics Drug discovery & development
Molecular Medicine Business
Bioinformatics is inextricably
intertwined with the biological,
chemical and medical resources in all the other sections.
What is Bioinformatics?
Many definitions, difficult to reach
agreement on.
The field of science in
which biology, computer science, and information technology merge into a single
discipline. The ultimate goal of the field is to enable the discovery of new
biological insights as well as to create a global perspective from which
unifying principles in biology can be discerned. There are three important sub-
disciplines within bioinformatics: the development of new algorithms and
statistics with which to assess relationships among members of large data
sets; the analysis and
interpretation
of various types
of data including nucleotide and amino acid sequences, protein domains,
and protein structures; and the development and implementation of tools that
enable efficient access and management of different types of information.
"Education" NCBI, 2003 http://www.ncbi.nlm.nih.gov/Education/index.html
The definition of bioinformatics is not universally agreed upon. Generally
speaking, we define it as the creation and development of advanced information
and computational technologies for problems in biology, most commonly molecular
biology (but increasingly in other areas of biology). As such, it deals with
methods for storing, retrieving and analyzing biological data, such as nucleic
acid (DNA/ RNA) and protein sequences, structures, functions, pathways and
genetic interactions. Some people construe bioinformatics more narrowly, and include only those
issues dealing with the management of genome project sequencing data. Others
construe bioinformatics more broadly and include all areas of computational
biology, including population modeling and numerical simulations. Russ
Altman "What is bioinformatics?" Stanford Univ. 2002
http://smi-web.stanford.edu/people/altman/bioinformatics.html
Roughly, bioinformatics describes any use of computers to handle biological
information. In practice the definition used by most people is narrower;
bioinformatics to them is a synonym for "computational molecular
biology" - the use of computers to characterise the molecular components of
living things. Damian Counsell, bioinformatics.org FAQ http://bioinformatics.org/faq/#whatIsBioinformatics
Conceptualizing biology in terms of molecules (in the sense of physical-
chemistry) and then applying "informatics" techniques (derived from
disciplines such as applied math, CS [computer science] and statistics to
understand and organize the information associated with these molecules on a
large- scale. Mark Gerstein "What is Bioinformatics?" MB&B 474b3,
2001
http://bioinfo.mbb.yale.edu/what-is-it.html
Research, development, or application of computational tools
and approaches for expanding the use of biological, medical, behavioral or
health data, including those to acquire, store, organize, archive, analyze, or
visualize such data. NIH, BISTIC Biomedical Information Science and
Technology Initiative, 2005 http://www.bisti.nih.gov/
More
bioinformatics definitions More bioinformatics
terminology
Computational biology
A field of biology concerned with the
development of techniques for the collection and manipulation of biological
data, and the use of such data to make biological discoveries or predictions.
This field encompasses all computational methods and theories applicable to
MOLECULAR BIOLOGY and areas of computer-based techniques for solving biological
problems including manipulation of models and datasets. [MeSH, 1997]
Computational biology maps to bioinformatics in PubMed.
Computational biology
FAQ, Robert D. Phair, US, 2000 http://www.bioinformaticsservices.com/bis/resources/faq/faq.html
I find that
people use "computational biology" when discussing that subset of
bioinformatics (in the broadest sense) closest to the field of classical general
biology. Computational biologists interest themselves more with
evolutionary, population and theoretical biology rather than cell and molecular
biomedicine. It is inevitable that molecular biology is profoundly important in
computational biology, but it is certainly not what computational biology is all
about ... Richard Durbin, Head of Informatics at the Wellcome Trust Sanger
Institute, expressed an interesting opinion on this distinction in an interview
on this distinction: "I do not think all biological computing is
bioinformatics, e.g. mathematical modelling is not bioinformatics, even when
connected with biology- related problems. In my opinion, bioinformatics has to
do with management and the subsequent use of biological information, particular
genetic information." [Damian Counsell, bioinformatics.org FAQ, 2001]
https://bioinformatics.org/faq/#definitionOfCompbiol
Bioinformatics
Overviews & introductions
NCBI, NLM, NIH: Science Primer http://www.ncbi.nlm.nih.gov/About/primer/index.html
Bioinformatics and molecular modeling
Very very very short introduction to protein
bioinformatics, Patricia Babbitt et. al., 57 pages http://baygenomics.ucsf.edu/education/workshop1/lectures/w1.color2.pdf
What is Informatics?
Informatics according to the OED
[translation Russian informatika from
information SEE –ICS.] (See quotation 1967) Cf. information science 1967
FID News Bull. XVii 73/2 Informatics is the discipline of science which
investigates the structure and properties (not specific content) of scientific
information, as well as the regularities of scientific information activity, its
theory, history, methodology and organization. Oxford English Dictionary Oxford
English Dictionary, 2nd
edition.
According to NIH's Office of Rare Diseases
The study of the application of computer and statistical techniques to the
management of information. In genome projects, informatics includes the
development of methods to search databases quickly, to analyse DNA sequence
information, and to predict protein sequence and structure from DNA sequence
data. ORD Office of Rare Diseases, NIH glossary. http://ord.aspensys.com/asp/resources/glossary_f-m.asp#I
It is interesting that the OED definition specified
domain independent information, while the ORD NIH definition is very domain
specific. While "ontologies" offer the hope of cross domain
interoperability, much effort is still being devoted to facilitating
communication within domains. While the pharmaceutical research is increasingly
interdisciplinary and NIH has come out with new initiatives such as the NIH Road
map http://nihroadmap.nih.gov/index.asp
there are still many obstacles to truly interdisciplinary research.
from the Dept. of
Biopharmaceutical Sciences, UCSF Bioinformatic and experimental analysis of protein superfamilies for
understanding protein structure- function relationships and developing
strategies for protein engineering. Using superfamily analysis to understand how
protein sequence and structure determine protein function. Our computational
approach begins with identifying the sets of divergently related proteins that
comprise enzyme superfamilies and then attempts to correlate their conserved and
variable structural features to similarities and differences in their functions.
This work also requires the development of new tools in protein
bioinformatics to identify and evaluate distant relationships and to distinguish
those elements of structure that provide common function
from those that determine specificity. Designed to take advantage of the huge
volumes of data coming out of the genome projects, this approach provides a much
more contextual picture of the structure- function paradigm than can be achieved
by studying a single protein at a time. This work has been successfully applied
to such problems as the prediction of function for unknown reading frames and
elucidation of enzyme mechanisms. Patricia Babbitt, Dept. of Biopharmaceutical
Sciences, Univ. of California San Francisco, US http://www.ucsf.edu/dbps/faculty/pages/babbitt.html
Introductions
to protein bioinformatics
Protein bioinformatics
Some/ many? of the above definitions specifically include
proteins.
[Protein] Structural bioinformatics
Involves the process of determining a protein's three- dimensional structure
using comparative primary sequence alignment, secondary and tertiary
structure prediction methods, homology modeling, and crystallographic
diffraction pattern analyses. Currently, there is no reliable de novo
predictive method for protein 3D- structure determination. Over the past half-
century, protein structure has been determined by purifying a protein,
crystallizing it, then bombarding it with X-rays. The X-ray diffraction
pattern from the bombardment is recorded electronically and analyzed using
software that creates a rough draft of the 3D structure. Biological scientists
and crystallographers then tweak and manipulate the rough draft considerably.
The resulting spatial coordinate file can be examined using modeling- structure
software to study the gross and subtle features of the protein's structure.
Christopher Smith "Bioinformatics, Genomics, and Proteomics"
Scientist 14[23]:26, Nov. 27, 2000
See also systems
bioinformatics
What are cheminformatics,
chemoinformatics, chemi- informatics? The terminology is
even less standardized here.
Google hits
for:
cheminformatics about 16,300 hits Dec. 11, 2003 about 168,000 Oct. 14, 2005
chemoinformatics about 8,670 Dec. 11, 2003, about 85,300 Oct. 14, 2005
"chemical informatics" about 3,300 Dec. 11, 2003, about
2,100,000 Oct. 14, 2005
chemiinformatics about 35 Dec. 11, 2003; about 168 Oct. 14, 2005
Cheminformatics definitions
Cheminformatics:
Going by the literature
Mixing of information technology and management to transform data into
information and information into knowledge for the intended purpose of making
better decisions faster in the arena of drug lead identification and
optimization. . In Chemoinformatics there are really only two [primary]
questions: 1.) what to test next and 2.) what to make next. The main processes
within drug discovery are lead identification, where a lead is something that
has activity in the low micromolar range, and lead optimization, which is the
process of transforming a lead into a drug candidate. Frank Brown,
"Chemoinformatics: What is it and How does it Impact Drug Discovery"
Annual Reports in Medicinal Chemistry 33: 375-384, 1998
Increasingly
incorporates "compound registration into databases, including library
enumeration; access to primary and secondary scientific literature; QSAR
Quantitative Structure Activity Relationships) and similar tools for relating
activity to structure; physical and chemical property calculations; chemical
structure and property databases, chemical library design and analysis;
structure- based design and statistical methods. Because these techniques have
traditionally been considered the realms of scientists from different
disciplines, differences in computer systems and terminology provide a barrier
to effective communication. This is probably the single most challenging problem
that chemoinformatics must solve. M Hann and R Green
"Chemoinformatics a new name for an old problem?" Current
Opinion in Chemical Biology 3:379- 383, 1999
Many people view
chemoinformatics as an extension of chemical information, which is a well
established concept covering many areas that employ chemical structures, data
storage and computational methods, such as compound registration databases, on-
line chemical literature, SAR analysis and molecule- property calculation.
Timothy Ritchie "Chemoinformatics; manipulating chemical information to
facilitate decision- making in drug discovery" Drug Discovery Today 6(16) :
813- 814, Aug. 2001
Chemical informatics
Variously known as
chemoinformatics, cheminformatics, or even chemiinformatics, chemical
informatics is the application of computer technology to chemistry in all of its
manifestations. Much of the current use of cheminformatics techniques is in the
drug industry. Indeed, one definition of chemical informatics is "the
mixing of information resources to transform data into information and
information into knowledge, for the intended purpose of making decisions faster
in the arena of drug lead identification and optimization." Now chemical
informatics is being applied to problems across the full range of chemistry. Gary D. Wiggins, "What is Chemical
Informatics?" Indiana Univ., US, 2006 http://www.chembiogrid.org/resources/whatis.html
Cheminformatics overviews &
introductions
25 Years of Research in
Cheminformatics: A Portrait of the Research Group of Prof. Johann Gasteiger,
Computer Chemie Centrum and Institute of Organic Chemistry, Univ. of Erlangen-
Nurnberg, 2001 http://www2.chemie.uni-erlangen.de/presentations/symposium/torvs_e.pdf
Cheminformatics
and beyond
Drug discovery and development is in the
midst of a critical transition, from a discipline dominated by empirical tests
and brute force to one in which biological and chemical structural knowledge are
exploited intelligently, using computational assistance. Cheminformatics, the
combination of chemical synthesis, biological screening, and data mining
approaches used to guide drug discovery and development, cheminformatic tools
that allow for the rational selection of designed compounds with drug- like
properties from an almost infinite number of synthetic possibilities, building
smarter focused libraries for virtual and high- throughput screening and the
exploitation of previously obtained discovery data to guide lead optimization
efforts are all important.
There are many sources of chemical data; registered
chemical structures with stereochemistry, synthesis records, spectral data
including NMR Nuclear Magnetic Resonance, purity determinations, not
to mention the volume of data generated by HTS High Throughput Screening,
SAR Structure Activity Relationship studies and the calculation of
physiochemical properties. Accessibility, manipulation, and data mining of
chemical information translates to knowledge for smarter drug development.
Chemoinformatic tools for storage, design and mining of chemical databases/
information have had success in lead identification and optimization.
Chemoinformatics is about presenting and integrating a vast and complex array of
information so that people who make the decisions in drug discovery can make
better choices (relatively) quickly and easily.
Molecular
modeling and systems biology
Many people include these concepts under
chemical informatics.
Molecular
modeling:
A technique for the investigation of molecular
structures and properties using computational chemistry and graphical visualization
techniques in order to provide a plausible three- dimensional representation
under a given set of circumstances. IUPAC Medicinal Chemistry
in silico: Literally "in the computer" (as
contrasted with "in vitro" (in glass) or "in vivo"
(in life). Can be used to screen out compounds which are not druggable.
In a white paper I wrote for the European Commission in 1988 I advocated the
funding of genome programs, and in particular the use of computers. In this
endeavour I coined "in silico" following "in vitro"
and "in vivo" I think that the first public use of the word is
in the following paper: A. Danchin, C. Médigue, O. Gascuel, H. Soldano, A.
Hénaut, From
data banks to data bases. Res. Microbiol.
(1991) 142: 913- 916. You can find a developed account of this story in my
book The
Delphic Boat, Harvard University Press,
2003, personal communication Antoine Danchin, Institute Pasteur, 2003
Mapping and modeling networks and pathways
The experimental task of mapping genetic regulatory
networks using genetic footprinting and [yeast] two- hybrid techniques
is well underway, and the kinetics of these networks is being generated at an
astounding rate. ... If the promise of the genome projects and the structural
genomics effort is to be fully realized, then predictive simulation methods must
be developed to make sense of this emerging experimental data.
There are three bottlenecks in the numerical analysis of biochemical reaction
networks. The first is the multiple time scales involved. Since the time between
biochemical reactions decreases exponentially with the total probability of a
reaction per unit time, the number of computational steps to simulate a unit of
biological time increases roughly exponentially as reactions are added to the
system or rate constants are increased. The second bottleneck derives from the
necessity to collect sufficient statistics from many runs of the Monte- Carlo
simulation to predict the phenomenon of interest. The third bottleneck is a
practical one of model building and testing: hypothesis exploration, sensitivity
analyses, and back calculations, will also be computationally intensive.
Lawrence Berkeley Lab "Advanced Computational Structural Genomics"
Glossary, c. 1999
more on Networks and pathways
Systems biology
There are two opinions on what systems biology is supposed to
be. One group sees systems biology as another level of combining data
from different levels (like DNA, RNA and protein level) (see [Leroy] HOOD). Another
group wants to combine classical molecular and cell biology with
systems theory and focus on the new forms of behavior that emerge when systems
of genes and proteins are studied in a wholistic way. For this they need data
from all those different levels as well, of course. That is why they see systems
biology as a cooperative effort, with systems theory providing a theoretical
framework and a new view on things for biologists, along with lots of experience
with complex systems, and biology providing in-depth knowledge of the field of
application as well as practical handling experience. This data is the basis for
developing the kind of detailed models that are necessary for such studies of
systemic properties and behavior. For both groups, the goal is to reach a new
level of understanding of biological systems often referred to as 'systems
level' understanding. A glossary for Systems Biology, Systems Biology Group,
Stuttgart http://www.sysbio.de/projects/glossary/SYSTEMS_BIOLOGY.shtml#systems_biology
Institute for Systems Biology, Seattle WA http://www.systemsbiology.org/
Lee Hood's group.
Systems bioinformatics
With the completion of the Human Genome Project, the scientific community is
now faced with the even greater challenge of analyzing the resulting data from
this and other large- scale genome projects to better understand the networks
underlying biological function. Second International Computational Systems
Bioinformatics Conference To be Held August 11-14, 2003 at Stanford University,
IEEE CS Bioinformatics Technical Chair via BizWire http://quickstart.clari.net/qs_se/webnews/wed/bx/Bca-ieee-cs_csb2003.RMsB_DuP.html
Drug
discovery and development informatics
Pharmainformatics
The multidisciplinary informatics needs of the pharmaceutical industry (HTS High
Throughput Screening data, Computational Chemistry, Combinatorial Chemistry,
ADME Informatics, Cheminformatics, Toxicology, Metabolic Modeling,
Bioinformatics in Drug Discovery and Metabolism etc. information access and
communication between various departments like the development and discovery.
Yahoo Groups Pharmainformatics http://health.groups.yahoo.com/group/pharmainformatics/
Pharmaceutical bioinformatics
Bioinformatics and structure- aided drug design are really part of the same
continuum. Bioinformatics offers a means to get to a structure through sequence;
while structure- aided drug design offers a means to get to a drug through
structure. We plan to combine innovative computational techniques with
biochemical and structural expertise to bring bioinformatics and structure-
aided drug design even closer together. In particular, we intend to blend
computational chemistry with computational biology to create software that will
aid protein chemists in understanding, evaluating and predicting the structure,
function and activity of medically and industrially important proteins. My
laboratory is currently involved in three "bioinformatics" projects.
These include: (1) the development of novel methods to identify remote sequence/
structure relationships; (2) the creation of a compact, relational database with
advanced bioinformatics functionality; and (3) the development of novel methods
for predicting and evaluating protein secondary and tertiary structure. David
Wishart, Wishart Pharmaceutical Research Group, Univ. of Alberta, Canada http://redpoll.pharmacy.ualberta.ca/projects/bioinfo.html
Research informatics
The explosion of genomic information, from sequences and gene
expression to SNPs and protein structures, is of limited value for
pharmaceutical researchers without powerful software capable of interpretation
and comparisons. Data mining, multiple location data sharing, and computational
enhancements of biological and chemistry projects, as well as integration of
these efforts need various approaches for overcoming the problems of legacy
information systems, the very different language and perspectives of chemists
and biologists, and the organizational issues of compartmentalization and information
silos.
Laboratory informatics
The specialized application of information technology to maximize laboratory
operations. Laboratory informatics encompasses data acquisition, data
processing, laboratory information management system (LIMS), laboratory
automation, scientific data management (including data analysis and long- term
archiving), and electronic laboratory notebooks. Focus is on the application of
this technology in analytical, production, and R&D laboratories.
Graduate Programs: Laboratory Informatics, Indiana Univ. School of Informatics,
US http://www.informatics.iupui.edu/Academics/graduate/laboratory_informatics/index.php
Toxicoinformatics
Toxicogenomics
Toxicoinformatics
An emerging scientific discipline that
integrates approaches from multidisciplinary fields of bioinformatics,
chemoinformatics, computational toxicology, informatics technologies and
physiologically- based pharmacokinetic modeling with the objective of knowledge
discovery and the elucidation of mechanisms of toxicity. NCTR's Center for
Toxicoinformatics, National Center for Toxicological Research, FDA, 2003 http://www.fda.gov/nctr/science/centers/toxicoinformatics/
In the end,
successfully moving research from the laboratory into the clinic is the ultimate target validation.
While new technologies may be helpful and/or necessary, the challenges of
scaling, automating (both for cost effectiveness and reproducibility)
standardizing and simplifying are equally, if not more important.
Medical Bioinformatics
Covers haplotyping, genotyping and population
genomics, gene expression
profiling, particularly for use in diagnosis, prognosis and therapeutic
stratification of patients. Most of this work is being done first in oncology.
Medical informatics
Medical informatics has many different contexts.
The field of information science concerned with
the analysis and dissemination of medical data through the application of
computers to various aspects of health care and medicine. [MeSH 1987]
Medical informatics has
to do with all aspects of understanding and promoting the effective
organization, analysis, management, and use of information in health care. While
the field of medical informatics shares the general scope of these interests
with some other health care specialties and disciplines, medical informatics has
developed its own areas of emphasis and approaches that have set it apart from
other disciplines and specialties. For one, a common thread through medical
informatics has been the emphasis on technology as an integral tool to help
organize, analyze, manage, and use information. In addition, as professionals
involved at the intersection of information and technology and health care,
those in medical informatics have historically tended to be engaged in the
research, development, and evaluation side of things, and in studying and
teaching the theoretical and methodological underpinnings of data applications
in health care. However, today medical informatics also counts among its
profession many whose activities are focussed on dimensions that include the
administration and everyday collection and use of information in health
care. FAQ, American Medical Informatics Association, 2003 http://www.amia.org/about/faqs/f7.html
Health information data
Includes Clinical data captured during the process of diagnosis and
treatment. Epidemiological databases , that aggregate data about a population.
Demographic data used to identify and communicate with and about an individual.
Financial data derived from the care process or aggregated for an organization
or population. Research data gathered as a part of care and used for research or
gathered for specific research purposes in clinical trials. Reference data that
interacts with the care of the individual or with the healthcare deliver
systems, like a formulary, protocol, care plan, clinical alerts or reminders,
etc. Coded data that is translated into a standard nomenclature or
classification so that it may be aggregated, analyzed, and compared.
Health Information Management; Professional definitions, Committees on
Professional Development, American Health Information Management Association,
1999, 2000 http://www.ahima.org/infocenter/definitions/HIMprofessionaldefinition.htm
Public health informatics
The systematic application of information and computer sciences to public
health practice, research, and learning. It is the discipline that integrates
public health with information technology. The development of this field and
dissemination of informatics knowledge and expertise to public health
professionals is the key to unlocking the potential of information systems to
improve the health of the nation. www.nlm.nih.gov/pubs/cbm/phi2001.html
[MeSH 2003]
Emerging
medical informatics specialties
Social informatics
An important and often ignored piece of the puzzle.
A serviceable working conception of "social informatics" is that
it identifies a body of research that examines the social aspects of
computerization. A more formal definition is "the interdisciplinary study
of the design, uses and consequences of information technologies that takes into
account their interaction with institutional and cultural contexts." ...
Social informatics has been a subject of systematic analytical and critical
research for the last 25 years. Unfortunately, social informatics studies are
scattered in the journals of several different fields, including computer
science, information systems, information science and some social sciences. Each
of these fields uses somewhat different nomenclature. This diversity of
communication outlets and specialized terminologies makes it hard for many non-
specialists (and even specialists) to locate important studies. Rob Kling, What
is social informatics and why does it matter? D-Lib 5(1): Jan. 1999 http://www.dlib.org/dlib/january99/kling/01kling.html
Social informatics
HomePage
http://www.slis.indiana.edu/SI/
Information overload
Biomedical literature growth http://www.ncbi.nih.gov/About/tools/restable_stat_pubmed.html
Trying to read (and think) faster just doesn't scale. We need new ways of
managing and interpreting information and data, and balancing the competing --
and conflicting demands.
Information resources
Bioinformatics Cheminformatics
Genomics Proteomics
Chemical
genomics Drug
discovery & development Molecular
Medicine Business
What are genomics
proteomics chemical
genomics pharmacogenomics toxicogenomics
bioinformatics
cheminformatics
|