|
Applications
Informatics Maps: Finding guides to terms in these glossaries Site
Map
Related glossaries include
Applications: Drug
Discovery & Development, Functional
genomics
Informatics: Algorithms & data
analysis, Chemoinformatics,
Computers & computing, Information management
& interpretation
Databases
& software directory, In
silico & molecular Modeling
Technologies Sequencing
Biology: DNA, Expression,
Proteins, Sequences,
DNA & beyond
annotated databases: Databases may contain a combination of amino acid sequences, comments, literature references and notes on known
post- translational modifications to the sequence. A database that contains all of these elements is referred to as "annotated". Other databases only contain the sequence, an accession number and a descriptive title. Annotation of each entry is obviously very
time- consuming and difficult to maintain without errors. Therefore annotated databases usually have many fewer sequence entries than
non- annotated ones. Annotation also implies that some functional or structural information is known about the mature protein, as opposed to a sequence that is known only from the translation of a stretch of nucleotide sequence. Even the best annotated databases now include large numbers of entries that have very
little real information about the mature protein other than some reference to who sequenced and translated the nucleotide sequence. Annotated databases are technically superior for many purposes, because they contain information about the true form of the mature protein.
[Biopolymer Markup Language — BIOML Working Draft Proposal,
1999] http://www.rdcormia.com/COIN79/b_chpt1.htm
annotation:
The annotation process identifies sequence features on the
contigs - such as variation,
sequence tagged sites, FISH mapped clone regions, known and predicted genes, and gene models. This stage
provides contig, mRNA, and protein records with added feature annotation. [NCBI Contig Assembly and Annotation Process,
2001] http://www.ncbi.nlm.nih.gov/genome/guide/build.html#contig
The value of a genome
is only as good as its annotation. At the Sanger Institute, we are providing
high quality manual curation in addition to automated prediction provided by
Ensembl. Finished genomic sequence is analysed on a clone by clone basis using a
combination of similarity searches against DNA and protein databases as well as
a series of ab initio gene predictions. Manual Curation of the Human
Genome, Wellcome Trust, Sanger Institute, 2003 http://www.sanger.ac.uk/HGP/havana/
Each fragment of DNA contains unique features. A DNA fragment may encode a
portion of a gene or a gene control sequence, or the fragment may be a portion
of a
genome that has no apparent function. Bioinformaticists perform detailed
analysis of DNA fragments, comparing new DNA sequence, previously annotated DNA
sequences and identifying common characteristics, and assigning known or
putative potential functions to the DNA sequence. Cross species DNA sequence
comparison is quite common and can reveal common genes shared between organisms.
A bioinformatic study may also require peptide to peptide comparisons allowing
common structural features of proteins to define the function a DNA fragment
encoding a specific protein or enzyme. [CHI High
Throughput Genomics] report, 2001.
The elucidation and description of biologically relevant features in the sequence
is essential in order for genome data to be useful. The quality
with which annotation is done will have direct impact on the value of the
sequence. At a minimum, the data must be annotated to indicate the existence
of gene coding regions and control regions. Further annotation activities
that add value to a genome include finding simple and complex repeats,
characterizing the organization of promoters and gene families, the distribution
of G + C content, and tying together evidence for functional motifs and homologs.
[Lawrence Berkeley Lab, US "Advanced Computational Structural Genomics"]
Explanatory notes, comments, analysis and commentaries added to a database.
May refer to sequence data or protein structures and includes predictions, characterizations,
summaries, and other detailed information, including gene function. Annotation can be manual (as in SWISS- PROT) or automated (as in TrEMBL).
Since annotation is highly skilled and labor intensive, efforts are being
made to automate the process, at least for preliminary data. Related terms: annotated databases, curated databases, comparative genome annotation,
distributed annotation system, genome annotation; SNPs
& genetic variations Genetic Annotation Initiative
Narrower
terms: baseline annotation, computational annotation, distributed sequence
annotation; Proteomics: annotation - proteins;
annotation- proteins: Proteomics
bacterial bioinformatics:
Antibiotic resistance
amongst virulent species is on the increase, causing major concern worldwide.
... It is evident that some current therapies are no longer effective, and
accordingly, novel antimicrobials will need to be developed. To help counteract
these problems, advances in technology can be used to hasten the hunt for new
drug and vaccine targets. One obvious advantage of using computer- based
screening techniques (bioinformatics) to scan newly sequenced pathogen
genomes is the speed at which identification of novel targets can be carried
out. '"Bacteriology for the Bioinformatician" Edward
Jenner Institute for Vaccine Research, UK http://www.jenner.ac.uk/BacBix3/BACforBIX.htm
Bacterial Bioinformatics
http://www.jenner.ac.uk/BacBix3/Welcomehomepage.htm
baseline annotation:
As good as possible,
computational only annotation. [Project Ensembl, Wellcome Trust, Sanger
Institute, EBI, UK, 2001] http://www.ensembl.org/Docs/wiki/html/EnsemblDocs/ScienceDocumentation.html
BioConductor:
An open source and open development software project to provide tools for the
analysis and comprehension of genomic data (bioinformatics). http://www.bioconductor.org/
biocorba.org:
Provides an object- oriented, language neutral,
platform independent method for describing and solving bioinformatic problems.
BioCORBA's mission is to leverage the code of the other Bio projects in
a simple and easy to use fashion. For example language neutral environment
allows users to write programs using BioPython and access BioPerl modules
through the CORBA server. http://www.biocorba.org/
bioinformatics:
The interdisciplinary fields of
Bioinformatics and Computational Biology are locked in a high stakes race with
analytical instrument developers and innovators. The pace and scope of change in
many fields of biomedical research rivals what we once associated only with
semiconductor devices. This report explores the interlocking challenges facing
instrumentation advances, computational demands and our evolving systems biology
knowledge. Key challenges presented in this report include:
-
Instrumentation capable of generating
terabytes of raw data daily
-
Storage requirements for human gene sequences
-
Need for cross platform data analysis
standards
-
Appropriateness of analysis & modeling
applications
-
Database data quality and annotation protocols
Insight Pharma Reports, Bioinformatics
& Computational Biology, 2009
Carole Goble, Seven
Deadly Sins of Bioinformatics, 2007 presentation http://www.slideshare.net/dullhunk/the-seven-deadly-sins-of-bioinformatics
Roughly, bioinformatics describes any use of computers to handle biological information. In practice the definition used by most people is narrower; bioinformatics to them is a synonym for "computational molecular
biology" - the use of computers to characterise the molecular components of living things.
Damian Counsell, bioinformatics.org FAQ] http://bioinformatics.org/faq/#whatIsBioinformatics See
above bioinformatics.org FAQ for tight and loose definitions of bioinformatics,
and information on how long the term has been used. The
definition of bioinformatics is not universally agreed upon. Generally speaking,
we define it as the creation and development of advanced information and
computational technologies for problems in biology, most commonly molecular
biology (but increasingly in other areas of biology). As such, it deals with
methods for storing, retrieving and analyzing biological data, such as nucleic
acid (DNA/RNA) and protein sequences, structures, functions, pathways and
genetic interactions. Some people
construe bioinformatics more narrowly, and include only those issues dealing
with the management of genome project sequencing data. Others construe
bioinformatics more broadly and include all areas of computational biology,
including population modeling and numerical simulations. Biomedical
informatics is a slightly broader umbrella that includes not only
bioinformatics, but other areas of informatics in biology, medicine and
health-care. They are closely
related. Russ Altman "Guide to
informatics at Stanford University, 2006 http://www-helix.stanford.edu/people/altman/bioinformatics.html
Research, development or application of computational tools and approaches
for expanding the use of biological, medical, behavioral or health data,
including those to acquire, store, organize, archive, analyze, or visualize such
data. Biomedical Information Science and Technology Initiative BISTI
Bioinformatics at the NIH, 2000 http://www.bisti.nih.gov/
The earliest Medline reference I've found to bioinformatics is William Bain's
"Bioinformatics in Europe - the federation strikes back" in
Trends in Biotechnology 11(6): 217- 218 June 1993.
We have coined the term Bioinformatics for the study of informatic processes
in biotic systems. Our Bioinformatic approach typically involves spatial, multi-
leveled models with many interacting entities whose behavior is determined
by local information. [Theoretical Biology Group, Univ. of Utrecht, Netherlands,
Paulien Hogeweg Director] http://www-binf.bio.uu.nl/
Original definition was “the study of informatic processes in biotic
systems” Paulien Hogeweg MIRROR beyond MIRROR, puddles of LIFE, in Artificial
Life, ed. C.G. Langton, Addison Wesley, 297-316, 1988 [Nick Saville's
homepage, Theoretical Biology and Bioinformatics, Utrecht Univ., Netherlands,
1997]
Narrower terms:
bacterial bioinformatics, comparative bioinformatics, functional bioinformatics,
glycobioinformatics, medical bioinformatics, molecular bioinformatics, pharmaceutical bioinformatics,
protein bioinformatics; Structural
genomics structural bioinformatics; Related terms: European
Bioinformatics Institute EBI, Open Bioinformatics Foundation; Algorithms data mining
bioinformatics
visualization: Special issue of
Informatics Visualisation, vol. 4 no. 3, Sept. 2005 guest editors Chris North
& Theresa-Marie Rhyne http://people.cs.vt.edu/~north/BioVisCFP.html
biojava.org: An open-source project dedicated to providing Java
tools for processing biological data. This will include objects for manipulating
sequences, file parsers, CORBA interoperability, access to ACeDB,
dynamic programming, and simple statistical routines. The BioJava library
is useful for automating those daily and mundane bioinformatics tasks.
http://www.biojava.org/
BioLisp.org: A public resource supporting scientists who use Lisp to develop intelligent
applications in the biological sciences.
biological computing: Computers
& computing
biological
databases:
Biological databases have inherent complications stemming from
the nature of the information they contain and the dependence of computational
methods on these data. Most biological data are not digital, making machine-
readability of the data (for automated data- mining) impossible. In addition,
the lack of standardized nomenclature and ontology, the use of protein aliases
(leading to ambiguity), the lack of interoperability across databases, and the
presence of errors in database annotations have hindered and complicated the use
of computational methods. Defining the Mandate of Proteomics in the Post-
Genomics Era, Board on International Scientific Organizations, National Academy
of Sciences, 2002 http://www.nap.edu/books/NI000479/html/R1.html
biomedical computing: Information
management & interpretation
bioMOBY:
An international group of biological data hosts, biological
data service providers, and coders whose aim is to set standards for biological
data representation, distribution, and discovery. http://biomoby.org/
BIONLP.org:
Natural language processing of biology text. [Bob
Futrelle, Computer Science, Northeastern Univ., US, 2002] http://www.ccs.neu.edu/home/futrelle/bionlp/
BioPax:
Biological Pathways Exchange. A collaborative effort to
create a data exchange format for biological pathway data. http://www.biopax.org/
Related terms: metabolic
pathways
bioperl.org:
An international association of developers of open
source Perl tools for bioinformatics, genomics and life science research.
We work closely with our friends and colleagues at biojava.org, biopython.org
and
bioxml.org. The Bioperl server provides an online resource for modules,
scripts, and web links for developers of Perl- based software for
life science research. http://bio.perl.org/
biopython.org:
An international association of developers of
freely available Python tools for computational molecular biology. biopython.org
provides an online resource for modules, scripts, and web links for developers
of Python- based software for life science research.
http://www.biopython.org/
biosemiotics:
http://www.gypsymoth.ento.vt.edu/~sharov/biosem/biosem.html#topics
BioWidget Consortium Home Page, Computation Biology & Informatics
Lab, Univ. of Pennsylvania, US. The bioWidgets toolkit is a collection
of Java Beans (used for development of graphics applications and/or applets
in the genomics domain). http://www.cbil.upenn.edu/bioWidgets/
bioxml.org:
This site was created to be a center for development for
open source biological
DTDs. http://www.xml.com/pub/r/1118
BISTI Consortium:
Established in May 2000 to serve as the focus
of biomedical computing issues at the NIH and to facilitate implementation
of the BISTI recommendations. The Consortium is composed of senior-level
representatives from the NIH centers and institutes and representatives
of other Federal agencies concerned with bioinformatics and computational
applications. The mission of the BISTI Consortium is to make optimal use
of computer science and technology to address problems in biology and medicine
by fostering new basic understandings, collaborations, and transdisciplinary
initiatives between the computational and biomedical sciences. http://www.bisti.nih.gov/bistic2.cfm
CORBA: Computers & computing
cellular bioinformatics:
The lesser developed branch of bioinformatics that focuses on the understanding
of the functioning living cell. As such it has to integrate DNA, mRNA, protein
and metabolic data. Because of the complexity of the problem, it also needs to
invoke mathematical modeling. ... The branch of cellular bioinformatics that
focuses on understanding on the basis of all the know experimental data is also
called computational
biochemistry. Hans Westerhoft, Vrije Universiteit Netherlands http://www.bio.vu.nl/hwconf/papers/cellbioinf.html
comparative bioinformatics: The
genome sequences from several chordates are being completed; the bioinformatics
largely exists in the research community to discover the protein-coding
potential of those genomes. However, the bioinformatics to elucidate gene
regulation encoded in genomes and gene regulatory networks is not so developed.
New bioinformatics, new model organism resources, new experimental approaches,
and new collaborations are needed if the community is to understand the gene
networks that help create phenotypes of interest. A research team at ORNL and
the University of Tennessee are developing some needed bioinformatics. The
overall projects include 1) supplying several web services and collaborative
bioinformatics that supports large consortia of experimental researchers and 2)
developing comparative bioinformatics and new data mining environments that can
ultimately help understand the nature and evolution of gene regulatory networks.
J. Snoddy et. al. Univ of Tennessee, ORNL, International Mammalian Genome 17
Nov. 2002 http://imgs.org/abstracts/2002abstracts/file192.htm
comparative genome annotation:
The major immediate interests of
the genome projects are in the identification of protein coding regions.
However, a complete description of gene structure necessitates identification of
the associated sites which signal the different processes in the gene to protein
pathway. Such sites include promoters, transcription start and end
points, poly-adenylation sites, splice sites, and translation
start and stop sites. In addition, regulatory regions form an important
functional component of gene structure. Indeed, gene regulation may utilise
alternatives in promoters, splice sites and translation start sites. Accurate
identification of coding regions is aided by the identification of such sites,
and vice versa3. Identification of
regulatory sites is more accurate when they are viewed in the context of other
surrounding elements. [Briefings in Bioinformatics" special
issue, proceedings from the symposium on "Genome Based Gene Structure
Determination" conducted at the EMBL European Bioinformatics Institute
(EBI) during June 1-2, 2000] http://industry.ebi.ac.uk/~thanaraj/BIB_Editorial.htm
Broader term: genome annotation
Related term: Functional genomics
comparative genomics
computational annotation:
The workshop began with a series of presentations on computational annotation
and experimental approaches to biological confirmation of functional elements in
the genomes of both model organisms and the human. Subsequent to those
discussions, NHGRI outlined its proposal for a pilot project to exhaustively
determine all functional elements in a small fraction (~1 percent) of the human
genome, Initial Inventory of Functional Elements to Identify: The participants
recommended that both protein- coding genes and non- protein- coding genes need
to be identified. For each of these, the complete (full- length) coding sequence
and all variants, as well as the transcriptional regulatory elements (e.g.,
promoters and enhancers) and post- transcriptional regulatory elements (e.g.
cis- acting RNA elements) should be described. All pseudogenes should be
identified. A number of global sequence features, such as sites of methylation,
sequence variation, evolutionary history of sequence blocks and repetitive
elements were suggested for inclusion, as were a number of chromosomal elements,
such as origins of replication, nuclease hypersensitive sites, matrix attachment
sites and histone modifications. Workshop on the Comprehensive Extraction of
Biological Information from Genomic Sequence, Bethesda, Md. July 23-24, 2002, http://www.genome.gov/10005568
computational annotation technologies: Several
‘wet bench’ technologies and resources were discussed. These included DNA
array studies, RT-PCR/ cDNAs, in situ hybridization, chromatin
immunoprecipitation, RNAi, knockout mice, and antibody analysis of protein
function. A broad range of computational approaches were also considered to be
critical for inclusion. These included both comparative sequence analysis of
multiple genomic sequences to identify conserved elements and automated
prediction of functional elements, including coding sequences, promoters,
alternative splice variants and other highly conserved regions. The importance
of ensuring close collaboration between experimental and computational
approaches was stressed. Workshop on the Comprehensive Extraction of Biological
Information from Genomic Sequence, Bethesda, Md. July 23-24, 2002, http://www.genome.gov/10005568
computational biology: The development and application of data -
analytical and theoretical methods, mathematical modelling and computational
simulation techniques to the study of biological, behavioral, and social
systems. Biomedical Information Science and Technology
Initiative BISTI Bioinformatics at the NIH, 2000 http://www.bisti.nih.gov/
I find that people use "computational biology" when discussing that subset of bioinformatics (in the broadest sense) closest to the field of classical general biology.
Computational biologists interest themselves more with evolutionary, population and theoretical biology rather than cell and molecular biomedicine. It is inevitable that molecular biology is profoundly important in computational biology, but it is certainly not what computational biology is all about (see next paragraph). In these areas of computational biology it seems that computational biologist's have tended to prefer statistical models for biological phenomena over
physico- chemical ones. This is often wise...
One computational biologist (Paul J Schulte) did object to the above and makes the entirely valid point that this definition derives from a popular use of the term, rather than a correct one. Paul works on water flow in plant cells and points out that biological fluid dynamics is a field of computational biology in
itself - and this, like any application of computing to biology, can be described as computational biology...
Where we disagree, perhaps, is in his conclusion from
this - which I reproduce in full: "Computational biology is not a "field", but an "approach" involving the use of computers to study biological processes and hence it is an area as diverse as biology itself."
Richard Durbin, Head of Informatics at the Wellcome Trust Sanger Institute, expressed an interesting opinion on this distinction in an interview on this distinction:
"I do not think all biological computing is bioinformatics, e.g. mathematical modelling is not bioinformatics, even when connected with
biology- related problems. In my opinion, bioinformatics has to do with management and the subsequent use of biological information, particular genetic information."
[Damian Counsell, bioinformatics.org FAQ,
2001] https://bioinformatics.org/faq/#definitionOfCompbiol
A field of biology concerned with
the development of techniques for the collection and manipulation of
biological data, and the use of such data to make biological discoveries
or predictions. This field encompasses all computational methods and theories
applicable to molecular biology and areas of computer- based techniques
for solving biological problems including manipulation of models and datasets.
MeSH, 1997
Computational biology FAQ, Robert D. Phair, US, 2000 http://www.bioinformaticsservices.com/bis/resources/faq/faq.html
conceptual
biology: As we see it, is not a distinct type of science, but rather
it has a different source: the information in databases... By logical, critical
analysis of existing facts and models, one can generate a hypothesis in which
predictions are formulated in testable terms, and then search for relevant
information among published reports of experiments that may have had a different
purpose altogether. MG Blagosklonny and AB Pardee, Unearthing the gems:
Conceptual Biology, Nature 416 (6879): 373, 28 March 2002
The iterative process
of analysing existing facts and models available in published literature to
generate new hypotheses. Julie C. Barnes, Conceptual
biology: a semantic issue and more, Nature 417(6889): 587-588, 6 June 2002
Related terms: Research
meta-analyses, meta- analysis
controlled vocabulary: Information
management & interpretation
curated databases:
Often less complete than primary databases, but
they have less redundancy and the added value of scientific annotation;
therefore, a biologically significant sequence should be easier to find in such
a database and of greater value. Naturally, the degree of redundancy and
annotation in such a database depends on the experience, skills, aims, and
devotion of its curators. ... The only proper way to curate databases is the way groups like those that
developed OMIM [Online Mendelian Inheritance
in Man], SWISS- PROT and most commercial databases have done it — that
is, through making scientific judgments as data are cleaned up and merged. [CHI Bioinformatics
report]
Under the supervision of a curator. Other curated databases include LocusLink,
RefSeq, & SGD (Saccharomyces cerevisae Genome Database) and
data mining: Algorithms
& data analysis
databases:
Collections of data in machine- readable form, which
can be manipulated by software to appear in varying arrangements and subsets.
[CHI Bioinformatics report]
Genetic information is stored in different ways in
different databases, which makes it hard to compare their holdings. So
while computational biologists are trying to improve the quality of the
databases, they are also working to build bridges between them. So
far, they have had only limited success … each database has its own Web
site with unique navigation tools and data storage formats that make such
searching difficult … programs can’t easily recognize data that are not
stored in a uniform way. [Elizabeth Pennisi “Seeking Common language in a Tower
of Babel” Science: 449 Oct. 15 1999]
How can the
databases be made most useful?
Science Functional Genomics Weblog, 2004 http://sciencemag.blogs.com/sfgblog/2004/10/how_can_the_dat.html
How do we fund the databases? Science Functional Genomics Weblog, 2004 http://sciencemag.blogs.com/sfgblog/2004/10/how_do_we_fund_.html
Databases
& software directory describes and provides links
to around 200 databases and about 30 software tools.
Narrower terms:
annotated
databases, curated databases, federated databases, integrated databases,
interoperability, non- redundant databases, proprietary databases, redundant
databases, relational databases, flat files, indexed flat files.
distributed sequence annotation:
The pace of human genomic sequencing has
outstripped the ability of sequencing centers to annotate and understand the
sequence prior to submitting it to the archival databases. Multiple third-party
groups have stepped into the breach and are currently annotating the human
sequence with a combination of computational and experimental methods. Their
analytic tools, data models, and visualization methods are diverse, and it is
self-evident that this diversity enhances, rather than diminishes, the value of
their work. Lincoln Stein, et. al. Distributed Sequence Annotation, 2000 http://biodas.org/documents/rationale.html
distributed annotation system: A client- server system in
which a single client integrates information from multiple servers. It allows a
single machine to gather up genome annotation information from multiple distant
web sites, collate the information, and display it to the user in a single view.
Little coordination is needed among the various information providers.
[Biodas.org] http://biodas.org/
EBI: European Bioinformatics Institute,
Hinxton, Cambridge, UK.
An EMBL outstation. http://www.ebi.ac.uk/
Ensembl:
A joint project between EMBL- EBI and the Sanger Centre
(UK) to develop a software system which produces and maintains automatic
annotation
on eukaryotic genomes. Human data are available now; they hope to add mouse data
soon. http://www.ensembl.org/index.html
federated databases:
An integrated repository data from of multiple, possibly heterogeneous, data sources presented with consistent and
coherent semantics. They do not usually contain any summary data, and all of the data resides only at the data source (i.e. no local storage).
[Lawrence Berkeley Lab "Advanced Computational
Structural Genomics" Glossary]
Related term: Information management
& interpretation semantic data integration
federated information systems.
Their main characteristic is that they
are constructed as an integrating layer over existing legacy applications and
databases. They can be broadly classified in three dimensions: the degree of
autonomy they allow in integrated components, the degree of heterogeneity
between components they can cope with, and whether or not they support
distribution. Whereas the communication and interoperation problem has come into
a stage of applicable solutions over the past decade, semantic data integration
has not become similarly clear. Susanne Busse et. al "Federated
Information Systems: Concepts, Terminology and Architecture"
Computergestützte Informations Systeme CIS, Berlin, Germany 1999 http://citeseer.ist.psu.edu/busse99federated.html
flat files:
Pure text documents that are totally unstructured. This
type of file generally does not provide very specific search answers, but it is
the most popular type of file on the Web and is now a bit easier to search,
thanks to the use of hyperlinks.
Narrower term: indexed
flat files. Related term: relational databases
functional
bioinformatics: The emerging field of functional
bioinformatics focuses on the development of ontologies or concept
classifications fed into algorithms
used to perform computations of the functions
of biomolecules .["About
bioinformatics" George Washington Univ. Medical Center, 2002] http://www.gwumc.edu/bioinformatics/about/bioinfo.htm
more... Ontologies & taxonomies
Related terms: Functional genomics,
Metabolic Engineering
functional
informatics:
Gene OntologyTM (GO): Functional
genomics Broader term Information management
& interpretation ontology.
genome annotation:
It is now apparent that the bottleneck in genomics
is no longer in sequencing the genomes, but lies in their annotation. Large-
scale annotation efforts require handling massive amounts of genome data
through automated pipelines, with a need to combine diverse sources of data and
methods. In addition, it requires visualisation tools to manually examine the
automatic annotation, since integration of human expertise to assess the
validity and authenticity of all computational results goes a long way to
improve the quality of gene annotation. The "Annotation Jamboree", a
collaboration between Celera, the Berkeley Drosophila Genome Project, and
a team of experts on the annotation of the Adh region of Drosophila,
is an exemplary attempt on how to transform the process of manual annotation
into a high- throughput operation. [Paradigm Shifts in the Approaches for Gene
Annotation, a special issue of "Briefings in Bioinformatics" which
reports on the proceedings from the recently concluded symposium on "Genome
Based Gene Structure Determination" conducted at the EMBL European
Bioinformatics Institute (EBI) during June 1- 2, 2000.] http://industry.ebi.ac.uk/~thanaraj/BIB_Editorial.htm
Narrower term: comparative genome annotation
Genome Annotation Data Warehouse: Databases
& software directory
genome browser, genomic data: Genomics glycobioinformatics,
glycoinformatics: Glycosciences
high- throughput bioinformatics:
Bioinformatics is currently undergoing dramatic changes, as
high- throughput laboratory methods lead to changes in key approaches, including
sequence analysis,
gene expression analysis, protein expression analysis, and
protein structure prediction and
modeling. [CHI Bioinformatics
report press release]
Related terms: Assays & screening throughput Functional genomics; systems biology Structural
genomics structural proteomics
I2B2 Informatics for Integrating Biology
& the Bedside: An NIH- funded National
Center for Biomedical Computing based at Partners HealthCare System.
[Boston] http://www.i2b2.org/
indexed flat files IFFs:
Partially structured databases, which may
include a thesaurus (adding the ability to search synonyms) or other basic
search tools. ... IFFs, meanwhile, allow users to interactively navigate among
entries in several different databases by means of hypertext links. IFFs do not,
however, allow true database integration, and gathering information from these
types of files is often haphazard: Because the data are not really structured,
researchers may end up with many incorrect matches to their queries. The
principal advantage of this technology is that it is cheap and easy to
understand. [CHI Bioinformatics
report]
integrated databases:
Integration [of databases] typically is
accomplished by creating small, object- oriented software elements, or “wrappers”
that let a single overlaying, often browser like, desktop application interact
with all the pieces. The original separate systems are intact and
functional, and new ones can be added, while the underlying complexity
is transparent to users. There are still many challenges … but computing
environments are becoming more unified, flexible and expandable. [A. Thayer
“Bioinformatics for the Masses” Chemical & Engineering News 78(6):
19-32 Feb. 7, 2000]
Information in OMIM [Online Mendelian Inheritance in Man] and the published working draft of the International
Human Genome Sequencing Consortium (Nature 15 Feb. 2001) has been facilitated
by ties to NCBI's RefSeq and LocusLink databases. Are there other good
examples of integrated databases?
Related terms: Bio-Ontology Standards
Group, Data Model Standards Group; Functional
genomics Gene Ontology
integration:
Integration of the various types of large- scale data is currently receiving
much attention. There appears, however, to be little agreement on what exactly
is meant by "integration", not to mention how to achieve it. The word
"integration" is being attached to almost any analysis that involves
the combined use of two or more large datasets. Lars J.
Jensen, Peer Bork, Quality analysis and integration of large- scale molecular
data sets. Drug Discovery Today: Targets, 3(2): 51-56
integration (of databases):
Allows researchers to increase the value
they get from the data, because it increases the base of information they can
access and allows for more robust searching. [CHI Bioinformatics
report]
Related
terms:
Computers & computing middleware, Object Oriented modeling OOM, object
protocol model OPM; Maps genomic & genetic memory mapped data structures
interoperability: Information
management & interpretation
Interoperable Informatics Infrastructure Consortium I3C:
Is this active? still in existence? http://www.consortiuminfo.org/links/detail.php?ID=186
LSID Life Sciences
Identifiers: ;
Cover pages http://xml.coverpages.org/lsid.html
medical bioinformatics:
Linking clinical data to patient gene profiling. Covers haplotyping,
genotyping, population genomics, gene expression profiling,
particularly for use in diagnosis, prognosis and therapeutic stratification of
patients. Google
= about 512, Oct. 15, 2003 Related
terms: Biomarkers, Expression,
Microarrays and protein chips memory-mapped data structures: Computers
& computing
metadata: Information management
& interpretation middleware, modularity: Computers & computing
molecular bioinformatics:
Conceptualizing biology in terms of
molecules (in the sense of physical- chemistry) and then applying
"informatics" techniques (derived from disciplines such as applied
math, CS [computer science] and statistics to understand and organize the
information associated with these molecules on a large- scale. [Mark Gerstein
"What is Bioinformatics?" MB&B 474b3, 2001] http://bioinfo.mbb.yale.edu/what-is-it.html
myExperiment:
a collaborative environment where scientists can safely publish their workflows
and experiment plans, share them with groups and find those of others.
Workflows, other digital objects and collections (called Packs) can now
be swapped, sorted and searched like photos and videos on the Web. ...
myExperiment makes it really easy for the next generation of scientists to
contribute to a pool of scientific workflows, build communities and form
relationships. It enables scientists to share, reuse and repurpose workflows and
reduce time-to-experiment, share expertise and avoid reinvention.
http://www.myexperiment.org/
myGrid:
The myGrid team produce and use
a suite of tools designed to
“help e-Scientists get on with science and get on with scientists”. The
tools support the creation of e-laboratories
and have been used in domains as diverse as systems
biology, social
science, music,
astronomy,
multimedia
and chemistry.
http://www.mygrid.org.uk/
NCBI National Center for Biotechnology Information:
Established
in 1988 as a national resource for molecular biology information,
NCBI creates public databases, conducts research in computational
biology, develops software tools for analyzing genome data, and disseminates
biomedical information - all for the better understanding of molecular
processes affecting human health and disease. Part of NIH. http://www.ncbi.nlm.nih.gov
non-redundant databases:
Researchers at the National Center for
Biotechnology Information (NCBI) coined the term "nr" database
(nonredundant database) to refer to a database in which the obviously
redundant entries have been merged. These entries are typically those that are
100%, character- by- character identical, and algorithms exist that can remove
such redundancy. Although such a database has less redundancy than a primary
database, a substantial amount of redundancy remains, and it can be removed only
by a curator using scientific judgment. [CHI Bioinformatics
report]
Many databases try to be “non-redundant”.
Unfortunately, biological data is too complex to fit a simple definition
of redundancy … Each “non- redundant” database has its own definition of
redundancy. [George Church Lab, Harvard Medical School, US] http://arep.med.harvard.edu/seqanal/db.html
Examples of non- redundant databases include UniGene
and SWISS- PROT,
while
DDBJ/ EMBL/ GenBank are redundant databases.
OMG Object Management Group: Computers &
computing Object- oriented modeling OOM: Computers &
computing ontology:
Ontologies *
Taxonomies
Open Bioinformatics Foundation
OPEN-BIO:
The purpose of the foundation is to act as an umbrella
organization for the various bio*.org projects that grew out of the original BioPerl
project. The goal of the foundation is to provide financial, administrative and
technical assistance for our various open source life science projects. http://open-bio.org/
Narrower terms: biojava.org, bioperl.org, biopython.org, bioxml.org
Related
term: biocorba.org
pharmaceutical bioinformatics:
Bioinformatics
and structure- aided drug design are really part of the same continuum.
Bioinformatics offers a means to get to a structure through sequence; while
structure- aided drug design offers a means to get to a drug through structure.
We plan to combine innovative computational techniques with biochemical and
structural expertise to bring bioinformatics and structure- aided drug design
even closer together. In particular, we intend to blend computational chemistry
with computational biology to create software that will aid protein chemists in
understanding, evaluating and predicting the structure, function and activity of
medically and industrially important proteins. My laboratory is currently
involved in three "bioinformatics" projects. These include: (1) the
development of novel methods to identify remote sequence/ structure
relationships; (2) the creation of a compact, relational database with advanced
bioinformatics functionality; and (3) the development of novel methods for
predicting and evaluating protein secondary and tertiary structure. David
Wishart, Wishart Pharmaceutical Research Group, Univ. of Alberta, Canada http://redpoll.pharmacy.ualberta.ca/projects/bioinfo.html
private databases: See under proprietary databases
proprietary databases: Fee- based, copyrighted databases
(in contrast to public databases such as those at DDBJ/ EMBL/ GenBank).
Examples include Incyte's LifeSeq and Gene Logic's GeneExpress
databases. Some databases charge subscription fees to commercial
organizations, with other arrangements available to non- profits.. Also
referred to as private databases.
Compare:
public databases
protein bioinformatics:
Bioinformatic and experimental analysis of protein superfamilies for
understanding protein structure- function relationships and developing
strategies for protein engineering. Using superfamily analysis to understand how
protein sequence and structure determine protein function. Our computational
approach begins with identifying the sets of divergently related proteins that
comprise enzyme superfamilies and then attempts to correlate their conserved and
variable structural features to similarities and differences in their functions.
This work also requires the development of new
tools in protein bioinformatics to identify and evaluate distant relationships
and to distinguish those elements of structure that provide common function from
those that determine specificity. Designed to take advantage of the huge volumes
of data coming out of the genome projects, this approach provides a much more
contextual picture of the structure- function paradigm than can be achieved by
studying a single protein at a time. This work has been successfully applied to
such problems as the prediction of function for unknown reading frames and
elucidation of enzyme mechanisms. Patricia Babbitt, Dept. of Biopharmaceutical
Sciences, Univ. of California San Francisco, US http://www.ucsf.edu/dbps/faculty/pages/babbitt.html
Very very very short introduction to
protein bioinformatics, Patricia
Babbitt et. al. http://baygenomics.ucsf.edu/education/workshop1/lectures/w1.color2.pdf
See also Proteomics
protein
informatics Is there a difference?
Google = about 690 April 1, 2003
public databases: Freely
accessible databases such as GenBank/ EMBL/ DDBJ, ArrayExpress or BLOCKS.
There has been much debate about public vs. proprietary databases.
redundant databases:
When sequence databanks were first created,
primary [redundant] databases had the advantage of being more comprehensive than
curated databases and more likely to contain recently discovered sequences.
However, redundancy is no longer much of an advantage. In a highly redundant
database, biologically significant results are more likely to be hidden among
large numbers of irrelevant reported matches. [CHI Bioinformatics
report]
Related term:
non-
redundant databases
relational databases: Most or all of the data are structured. These
files are the hardest to set up and maintain, and require specific knowledge by
a searcher, but they are the easiest to use when doing analysis or integration.
Data is categorized by specific fields, and so, by knowing the fields one should
be able to capture all the relevant data, quite easily. The searchability of a
relational database is totally dependent on how well the database has been
structured. [CHI Bioinformatics
report] schema: Algorithms
& data analysis
spatio
temporal dynamics: Local interactions in space can give rise to large
scale spatio temporal patterns (e.g. (spiral) waves, spatio- temporal
chaos (turbulence), stationary (Turing- type) patterns and transitions
between these modes). Their occurrence and properties are largely
independent of the precise interaction structure. They are indeed seen to
occur at many organizational levels of biotic systems. Space can be either
'real' space or a state space, e.g. 'phenotype space' in models of
speciation or 'shape space' in immunological models of shape- based
receptor interactions. We show that such spatio- temporal patterns have
important consequences for fundamental bioinformatic processes. Paulien
Hogeweg, Overview of Research 1993- 1998, Utrecht University, Netherlands,
1999 http://www-binf.bio.uu.nl/overview/node3.html
standards:
Related terms: Bio-ontology Standards Group, CORBA,
Data
Model Standards Group, object protocol model OPM . EBI [European Bioinformatics
Institute] is also
working on standards. Microarrays
MAML, MGED,
MIAMI
structural bioinformatics: Structural
genomics
structured data: The complex and richly structured data from genomics
can be viewed as the greatest encoding problem of all time (e.g. genome ®
organism). From this perspective, the sequencing of the human and other genomes
can be viewed as one of the all- time great opportunities for theorists
interested in information, its structure and analysis. [UCLA Bioinformatics
Institute] http://www.bioinformatics.ucla.edu/index/mission.htm Related
terms: indexed flat files, memory mapped data structures, relational
databases, unstructured data
systems
bioinformatics: With the completion of the Human
Genome Project, the scientific community is now faced with the even greater
challenge of analyzing the resulting data from this and other large-scale genome
projects to better understand the networks underlying biological function. Second
International Computational Systems Bioinformatics Conference To be Held August
11-14, 2003 at Stanford University, IEEE CS Bioinformatics Technical Chair via
BizWire http://quickstart.clari.net/qs_se/webnews/wed/bx/Bca-ieee-cs_csb2003.RMsB_DuP.html Google
= about 1,230 Sept. 2, 2003; about 8,240 May 25, 2005
systems biology: Genetic
manipulation & disruption taxonomies:
Ontologies *
Taxonomies
translational
bioinformatics: AMIA refers to translational
bioinformatics as the development of storage, analytic, and interpretive methods
to optimize the transformation of increasingly voluminous biomedical data, and
genomic data in particular, into proactive, predictive, preventive, and
participatory health. Translational bioinformatics includes research on the
development of novel techniques for the integration of biological and clinical
data and the evolution of clinical informatics methodology to encompass
biological observations. The end product of translational bioinformatics is
newly found knowledge from these integrative efforts that can be disseminated to
a variety of stakeholders, including biomedical scientists, clinicians, and
patients. Issues relating to database management, administration, or policy will
be coordinated through the Clinical Research Informatics domain. American
Medical Informatics Association, AMIA Strategic Plan, 2007 http://www.amia.org/inside/stratplan/
unstructured data: Information management
& interpretation
wrappers: See under integrated databases
Bibliography
Bioinformatics and Genomics Gateway, BioMedCentral http://www.biomedcentral.com/gateways/bioinformaticsgenomics/
Hightower,
Christy, Bioinformatics, Univ. of
California Santa Cruz http://library.ucsc.edu/science/subjects/bioinformatics/
Kahn, Charles E, Jr,
editor, Bioinformatics Glossary, Medical College of Wisconsin, 2005, 3000+ terms http://big.mcw.edu/
Alpha glossary index
How
to look for other unfamiliar terms
|