You are here > Biopharmaceutical
Glossaries & Taxonomies Homepage > Genomics >
Informatics > Genomic informatics
Genomic Informatics glossary & taxonomy
Evolving Terminology for Emerging Technologies
Comments? Questions?
Revisions? Mary Chitty MSLS
mchitty@healthtech.com
Last revised
July 09, 2019
Drug
discovery & development term index Informatics
term index Technologies term
index Biology term index
Chemistry term index
Related glossaries include: Drug discovery & Development
Drug Targets Molecular
Diagnostics
Informatics: Drug discovery informatics
Bioinformatics
Cheminformatics
Ontologies &
Taxonomies Protein Informatics
Technologies Microarrays PCR
Sequencing Biology Genetic variations
covers
both technologies for detecting and informatics for interpreting genetic
variants.
ab initio
gene prediction:
Traditionally, gene prediction
programs that rely only on the statistical qualities of exons have
been referred to as performing ab initio predictions. Ab initio
prediction of coding sequences is an undeniable success by the standards
of the machine- learning algorithm field, and most of the widely used gene
prediction programs belong to this class of algorithms. It is impressive
that the statistical analysis of raw genomic sequence can detect around 77- 98% of the genes present ... This is, however, little consolation
to the bench biologist, who wants the complete sequences of all genes present,
with some certainty about the accuracy of the predictions involved. As
Ewan Birney (European Bioinformatics Institute, UK) put it, what looks
impressive to the computer scientist is often simply wrong to the biologist.
Meeting report "Gene prediction: the end of the beginning" Colin Semple,
Genome Biology 2000 1(2): reports 4012.1-4012.3
All ab initio gene prediction programs have to balance sensitivity
against accuracy. Broader term: gene
prediction.
AI for
genomics :
Personalizing treatments and cures 2018 April 16-18, Boston MA The role of
computer science in modeling cells, analyzing and mapping data networks,
and incorporating clinical and pathological data to determine how diseases
arise from mutations is becoming more important in genomic medicine. We
need to understand where the disease starts and how artificial
intelligence delivers genes and pathways for drug targets and diagnostics.
The Inaugural AI for Genomics track explores case studies that apply deep
learning, machine learning, and artificial intelligence to genomic
medicine. We will discuss data curation techniques, text mining
approaches, and statistical analytics that utilize deep machine learning
to support AI efforts. This will help to integrate omics approaches to
discover disease or drug response pathways and identify personalized and
focused treatments and cures.
http://www.bio-itworldexpo.com/ai-genomics
alignment:
The process of lining up two or more sequences to achieve maximal levels of
identity (and conservation, in the case of amino acid sequences) for the purpose of assessing the degree of similarity and the possibility of homology.
NCBI BLAST Glossary
assembled:
The term used to describe the process of using a computer
to join up bits of sequence into a larger whole. Peer Bork, Richard Copley
"Filling in the gaps" Nature 409: 218-820, 15 Feb. 2001
Related terms: contig assembly,
genome assembly
biocomputing: Biocomputing
could be defined as the construction and use of computers which function like
living organisms or contain biological components, so-called biocomputers (Kaminuma,
1991). Biocomputing could, however, also be defined as the use of computers
in biological research and it is this definition which I am going to use in this
essay. With this interpretation of biocomputing the complicated ethical
questions connected with concepts like artificial life and intelligence are not
dealt with.
Peter Hjelmström, Ethical issues in biocomputing http://www.techfak.uni-bielefeld.de/bcd/ForAll/Ethics/welcome.html
biological computing:
Simson Garfinkel
"Biological computing" Technology Review, May/ June 2000 http://www.technologyreview.com/articles/garfinkel0500.asp
Related terms: biocomputing, DNA computing
BLAST (Basic Local Alignment Search Tool): Software program from
NCBI for searching public databases for homologous sequences or proteins.
Designed to explore all available sequence databases regardless of whether
query is protein or DNA. http://www.ncbi.nlm.nih.gov/BLAST/
comparative genome annotation:
Recent advances in genome sequencing technology and
algorithms have made it possible to determine the sequence of a whole genome
quickly in a cost-effective manner. As a result, there are more than 200
completely sequenced genomes. However, annotation of a genome is still a
challenging task. One of the most effective methods to annotate a newly
sequenced genome is to compare it with well-annotated and closely related
genomes using computational tools and databases. Comparing genomes requires use
of a number of computational tools and produces a large amount of output, which
should be analyzed by genome annotators. Because of this difficulty, genome
projects are mostly carried out at large genome sequencing centers. To alleviate
the requirement for expert knowledge in computational tools and databases, we
have developed a web-based genome annotation system, called CGAS (a comparative
genome annotation system; http://platcom.org/CGAS).
CGAS:
a comparative genome annotation system.
Choi K, Yang Y, Kim S. Methods Mol Biol.
2007;395:133-146 Broader term: genome annotation
Related term: Functional genomics
comparative genomics
complex genomes:
Is there a specific definition of complex
genomes? Or is it a more general category (beyond viral, bacterial,
microbial?)
computational gene recognition:
Interpreting nucleotide sequences
by computer, in order to provide tentative annotation on the location,
structure and functional class of protein- coding genes. JW Fickett 1996
Gene recognition is much more difficult in higher eukaryotes than in
prokaryotes, as coding regions (exons) are often interrupted by
non- coding
regions (introns) and genes are highly variable in size. This
is particularly so for human genes. As someone remarked sometime ago people have
non- coding regions occasionally interrupted by genes.
Broader terms: gene recognition, molecular recognition.
computational genomics:
(often referred to as Computational Genetics) refers to the use of computational
and statistical analysis to decipher biology from genome
sequences and
related data,[1] including
both DNA and RNA sequence
as well as other "post-genomic" data (i.e., experimental data obtained with
technologies that require the genome sequence, such as genomic DNA
microarrays).
These, in combination with computational and statistical approaches to
understanding the function of the genes and statistical association analysis,
this field is also often referred to as Computational
and Statistical Genetics/genomics.
As such, computational genomics may be regarded as a subset of bioinformatics and computational
biology,
but with a focus on using whole genomes (rather than individual genes) to
understand the principles of how the DNA of a species controls its biology at
the molecular level and beyond. Wikipedia accessed 2018 March 21
https://en.wikipedia.org/wiki/Computational_genomics
Related terms:
Expression, Microarrays
concordance: Similarity
of results between different microarray platforms. Related terms: discordance, mismatches
consensus sequence:
A theoretical representative nucleotide or
amino acid sequence in which each nucleotide or amino acid is the one,
which occurs most frequently at that site in the different forms which
occur in nature. The phrase also refers to an actual sequence, which approximates
the theoretical consensus. A known CONSERVED SEQUENCE set is represented by a consensus sequence. Commonly observed supersecondary protein structures (AMINO ACID MOTIFS) are often formed by
conserved sequences. MeSH, 1991
A sequence of DNA, RNA, protein or carbohydrate derived from a number
of similar molecules, which comprises the essential features for a particular
function. IUPAC Bioinorganic
conserved sequence:
A sequence of amino acids in a polypeptide or of nucleotides in DNA or RNA that is similar across multiple species. A known set of conserved sequences is represented by a
CONSENSUS SEQUENCE. AMINO ACID MOTIFS are
often composed of conserved sequences. MeSH, 1993
A "highly conserved sequence" is a DNA sequence that is very similar
in several different kinds of organisms. Scientists regard these cross
species similarities as evidence that a specific gene performs some basic
function essential to many forms of life and that evolution has therefore
conserved its structure by permitting few mutations to accumulate in it.
NHGRI
contig:
A contig is a contiguous stretch of DNA sequence
without gaps that has been assembled solely based on direct sequencing
information. Short sequences (reads) from a fragmented genome are compared
against one another, and overlapping reads are merged to produce one long
sequence. This merging process is iterative: overlapping reads are added to the
merged sequence whenever possible and so the merged sequence becomes even
longer. When no further reads overlap the long merged sequence, then this
sequence - called a contig - has reached its maximum length.
Ensembl Glossary
http://useast.ensembl.org/info/website/glossary.html
Published genome sequence has many gaps and interruptions. Concept of
"contig" is crucial to our understanding of current limitations. David Galas
"Making sense of the sequence" Science 291 (5507):
1257, Feb. 16, 2001
Wikipedia http://en.wikipedia.org/wiki/Contig
contig assembly:
One of the most difficult and critical functions
in DNA sequence analysis is putting together fragments from sets of overlapping
segments. Some programs do this better than others, particularly when dealing
with sequences containing gaps. Laura De Francesco "Some things considered"
Scientist 12[20]:18, Oct. 12, 1999
DDBJ DNA DataBank of Japan:
Shares information daily with EMBL
and GenBank. http://www.ddbj.nig.ac.jp/
deep homology: The principle of homology is central to
conceptualizing the comparative aspects of morphological evolution. The
distinctions between homologous or non-homologous structures have become
blurred, however, as modern evolutionary developmental biology (evo-devo) has
shown that novel features often result from modification of pre-existing
developmental modules, rather than arising completely de novo. With this
realization in mind, the term ‘deep homology’ was coined, in recognition of the
remarkably conserved gene expression during the development of certain animal
structures that would not be considered homologous by previous strict
definitions. At its core, it can help to formulate an understanding of deeper
layers of ontogenetic conservation for anatomical features that lack any clear
phylogenetic continuity Deep homology in the age of next-generation sequencing
Patrick Tschopp, Clifford J. Tabin Published 19 December
2016.DOI: 10.1098/rstb.2015.0475
http://rstb.royalsocietypublishing.org/content/372/1713/20150475
disconcordance: Lack
of standard results among microarray experiments. Related terms: concordance, mismatches
distributed sequence annotation:
The pace of human genomic sequencing has
outstripped the ability of sequencing centers to annotate and understand the
sequence prior to submitting it to the archival databases. Multiple third-party
groups have stepped into the breach and are currently annotating the human
sequence with a combination of computational and experimental methods. Their
analytic tools, data models, and visualization methods are diverse, and it is
self-evident that this diversity enhances, rather than diminishes, the value of
their work. Lincoln Stein, et. al. Distributed Sequence Annotation, 2000 http://biodas.org/documents/rationale.html
DNA
computers:
Seeks to use biological molecules such as DNA and RNA to solve basic
mathematical problems. Fundamentally, many of these experiments recapitulate
natural evolutionary processes that take place in biology, especially during the
early evolution of life and the creation of genes. Laura Landweber, "DNA
Computing" Princeton Univ. Freshman Seminar, 1999. http://www.princeton.edu/~lfl/FRS.html
DNA computing:
An interdisciplinary field that draws together molecular
biology, chemistry, computer science and mathematics. There are currently
several research disciplines driving towards the creation and use of DNA
nanostructures for both biological and non-biological applications. These
converging areas are: The miniaturization of biosensors and biochips into
the nanometer scale regime; The fabrication of nanoscale objects that can be
placed in intracellular locations for monitoring and modifying cell function;
The replacement of silicon devices with nanoscale molecular- based computational
systems, and The application of biopolymers in the formation of novel
nanostructured materials with unique optical and selective transport properties
DNA Computing & Informatics at Surfaces, Univ. of Wisconsin- Madison, June
1-4 2003. http://books.google.com/books?id=B6eUAXmBj8IC&pg=PR5&lpg=PR5&dq=dna+computing+university+of+wisconsin+interdisciplinary&s
Wikipedia
http://en.wikipedia.org/wiki/DNA_computing
Related terms: molecular computing, quantum computing
Or
are these the same/overlapping?
Ensembl:
A joint project between EMBL- EBI and the Sanger Centre
(UK) to develop a software system which produces and maintains automatic
annotation
on eukaryotic genomes. http://www.ensembl.org/index.html
exon parsing:
Identifying precisely the 5' and 3' boundaries of
genes
(the transcription unit) in metazoan genomes, as well as the correct sequences
of the resulting mRNA ("exon parsing") has been a major challenge of
bioinformatics for years. Yet, the current program performances are still
totally insufficient for a reliable automated annotation (Claverie 1997;
Ashburner 2000). It is interesting to recapitulate quickly the research in this
area to illustrate the essential limitation plaguing modern bioinformatics.
Encoding a protein imposes a variety of constraints on nucleotide sequences,
which do not apply to noncoding regions of the genome. These constraints induce
statistical biases of various kinds, the most discriminant of which was soon
recognized to be the distribution of six nucleotide- long "words" or
hexamers. Claverie and Bougueleret 1986; Fickett and Tung 1992). JM
Claverie "From Bioinformatics to Computational
Biology" Genome Res 10: (9) 1277-
1279 Sept. 2000
exon prediction:
Since prokaryotes don't have introns,
exon prediction implies working with eukaryotes. Is exon prediction
equivalent to gene prediction in prokaryotes? Related terms: ab
initio gene prediction; GRAIL Sequencing
exon shuffling theory: Contends that
introns act as spacers where breaks for genetic recombination occur. Under this
scenario, exons - which usually contain instructions for building a protein
subunit - remain intact when shuffled during recombination. In this way,
proteins with new functional repertoires can evolve. Peter Schmidt, "Shuffling,
Recombination, and the Importance of ...Nonsense" Swarthmore College
www.swarthmore.edu/Humanities/pschmid1/array/Gnarl3/exon.html
Wikipedia http://en.wikipedia.org/wiki/Exon_shuffling
Related
terms: DNA shuffling, domain shuffling, gene shuffling, protein shuffling
extreme phenotype selection studies: Systematic
collection of phenotypes and their correlation with molecular data has
been proposed as a useful method to advance in the study of disease.
Although some databases for animal species are being developed, progress
in humans is slow, probably due to the multifactorial origin of many human
diseases and to the intricacy of accurately classifying phenotypes, among
other factors. An alternative approach has been to identify and to study
individuals or families with very characteristic, clinically relevant
phenotypes. This strategy has shown increased efficiency to identify the
molecular features underlying such phenotypes. While on most occasions the
subjects selected for these studies presented harmful phenotypes, a few
studies have been performed in individuals with very favourable
phenotypes. The consistent results achieved suggest that it seems logical
to further develop this strategy as a methodology to study human disease,
including cancer. The identification and the study with high-throughput
techniques of individuals showing a markedly decreased risk of developing
cancer or of cancer patients presenting either an unusually favourable
prognosis or striking responses following a specific treatment, might be
promising ways to maximize the yield of this approach and
to reveal the molecular causes that explain those phenotypes and thus
highlight useful therapeutic targets.
Selection of extreme phenotypes; the role of clinical
observation in translational research
José Luis Pérez-Gracia
Clinical and Translational Oncology 2010 Mar;12(3):174-80.
Broader term: phenotype
false negative:
The chance
of declaring an expression change (e.g., in gene expression) to be insignificant
when in fact a change has occurred. The opposite situation is the false
positive.
false positive:
The chance
of declaring an expression change to be significant when in fact no change has
occurred. This tends to be a more pressing concern than false negatives in
microarray experiments.
FGED The Functional Genomics Data Society works with
other organizations to accelerate and support the effective sharing and
reproducibility of functional genomics data. We facilitate the creation
and use of standards and software tools that allow researchers to annotate
and share their data easily. We promote scientific discovery that is
driven by genome wide and other biological research data integration
and meta-analysis
http://fged.org/
Founded as MGED Defined
FGED standards
·
MIAME
·
MINSEQE
·
MAGE-TAB
·
MAGE
filtering:
A process whose
aim is to reduce a microarray dataset to a more manageable size, by getting rid of genes
that show no significant expression changes across the experiment or that are
uninteresting for biological reasons.
finished sequence - human:
Sequence in which bases are identified to
an accuracy of no more than 1 error in 10,000 and are placed in the right
order and orientation along a chromosome with almost no gaps. History
of the Human Genome Project" A Genome Glossary" Science 291: pullout chart
Feb. 16, 2001
At some level it’s a little arbitrary when you declare a sequence essentially
complete." says NHGRI Director Francis Collins…The definition
of finished is evolving. Our definition today is different from
10 years ago. Ten years ago we didn’t even think at the level of genomes."
says Laurie Goodman, editor of Genome Research. "I think the community
at large should define done. Not everyone is going to agree, but
when you’re using the word you should define what it means." Francis Collins
says "You’re done when you’ve exhausted the standard methods for closing
the gaps. There should be some biological reason why those last bits of
sequence eluded you – not because you just didn’t bother." "Are we there
yet?" The Scientist :12 July 19, 1999
fold change:
A way of
describing now much larger or smaller one number is compared with another. When
the first number is larger than the second, it is simply the ratio of the first
to the second. When the first number is smaller than the second, it is the ratio
of the second to the first with a minus sign in the front. When the numbers are
equal, it is 1. For example, the fold change of 50 versus 10 is 50/10 = 5, while
the fold change of 10 versus 50 is -5.
gap:
A space introduced into an alignment to compensate for insertions and
deletions in one sequence relative to another. To prevent the accumulation of too many gaps in an alignment, introduction of a gap causes the deduction of a fixed amount (the gap score) from the alignment score. Extension of the gap to encompass additional nucleotides or amino acid is also penalized in the
scoring of an alignment. NCBI BLAST Glossary
GenBank:
Located at NCBI, shares information daily with DDBJ
and EMBL. NIH genetic sequence database, an annotated collection of all
publicly available DNA sequences. http://www.ncbi.nlm.nih.gov/Genbank/index.html
Now accommodates > 10 10 nucleotides
and more than doubles in size every year. David Roos "Bioinformatics --
Trying to Swim in a Sea of Data" Science 291:1260-1261 Feb. 16, 2001
GenBank and WGS Statistics
https://www.ncbi.nlm.nih.gov/genbank/statistics/
gene finding programs:
http://cmgm.stanford.edu/classes/genefind/
Bioinformatics Resource, Center for Molecular and Genetic Medicine, Stanford
Univ. School of Medicine. List of programs has been compiled and updated from
James W. Fickett, "Finding genes by computer: the state of the art"
Trends in Genetics, August 1996, 12 (8) 316- 320
gene identification:
The effectiveness of finding genes by similarity
to a given sequence segment is determined by a much simpler statistic,
the total coverage of the genome by the collective set of sequence
contigs. As the overall coverage of the genome is virtually complete (>
90%), there is a strong likelihood that every gene is represented, at least
in part, in the data. Thus, finding any gene by sequence similarity
searches using sufficient sequence to ensure significance is almost always
possible using the data published this week. Caution must be exercised,
however, as the identification of the gene may still be ambiguous. This
is because a highly similar sequence from a receptor gene from Drosophila,
for example, could be found in several different, homologous genes,
which may have similar or entirely different functions or are nonfunctioning pseudogenes. In other words, common domains or motifs can be present
in many different genes. The use of the approximate similarity search tool
BLAST is probably still the best way to find similar sequences. David
Galas "Making Sense of the Sequence" Science 291: 12257-1260 Feb. 16, 2001
There are two basic approaches to gene identification: by homology
and ab initio approaches. Using marker SNPs
to hone in on otherwise hard to
find genes.
gene parsing:
Initial gene parsing methods were then simply
based on word frequency computation, eventually combined with the detection of
splicing consensus motifs. The next generation of software implemented the same
basic principles into a simulated neural network architecture (Uberbacher and
Mural 1991). Finally, the last generation of software, based on Hidden Markov
Models, added an additional refinement by computing the likelihood of the
predicted gene architectures (e.g., favoring human genes with an average of
seven coding exons, each 150 nucleotides long) is added (Kulp et al. 1996; Burge
and Karlin, 1997)). These ab initio methods are used in conjunction with a
search for sequence similarity with previously characterized genes or expressed
sequence tags (EST). JM Claverie "From
Bioinformatics to Computational Biology" Genome
Res 10: (9) 1277- 1279.Sept. 2000 http://genome.cshlp.org/content/10/9/1277.full
gene prediction:
Wikipedia http://en.wikipedia.org/wiki/Gene_finding
One of the first useful products from the human genome will
be a set of predicted genes. Besides its intrinsic scientific interest, the
accuracy and completeness of this data set is of considerable importance for
human health and medicine. Though progress has been made on computational gene
identification during the past decade, the accuracy of gene prediction tools is
not sufficient to locate the genes reliably in higher eukaryotic genomes. Thus,
while the precise sequence of the human genome is increasingly deciphered, gene
number estimations are becoming increasingly variable. ... In 1996 we published
a comprehensive
evaluation of gene prediction programs accuracy (Burset and Guigó, 1996).
... Recently we have published a revised
version of this evaluation (Guigó et al., 2000). This revised evaluation
suggest that though gene prediction will improve with every new protein that is
discovered and through improvements in the current set of tools, we still have a
long way to go before we can decipher the precise exonic structure of every gene
in the human genome using purely computational methodology. Genome
Bioinformatics Research Lab, Center for Genomic Regulation (Centre de Regulació
Genòmica - CRG, Barcelona, 2004 http://genome.imim.es/research/eval.html
Many methods for predicting genes are based
on compositional signals that are found in the DNA sequence. These methods
detect characteristics that are expected to be associated with genes, such
as splice sites and coding regions, and then piece this information together
to determine the complete or partial sequence of a gene. Unfortunately,
these ab initio methods tend to produce false positives, leading
to overestimates of gene numbers, which means that we cannot confidently
use them for annotation. They also do not work well with unfinished sequence
that has gaps and errors, which may give rise to frameshifts, when the
reading frame of the gene is disrupted by the addition or removal of bases.
... The most effective algorithms integrate gene- prediction methods with
similarity comparisons.... The
most powerful tool for finding genes may be other vertebrate genomes. Comparing
conserved sequence regions between two closely related organisms will enable
us to find genes and other important regions in both genomes with no previous
knowledge of the gene content of either. Ewan Birney et. al "Mining
the draft human genome" Nature 409: 827-828 15 Feb. 2001 http://www.nature.com/nature/journal/v409/n6822/full/409827a0.html
Sadly, it is often claimed that matching back cDNA to genomic sequences is
the best gene identification protocol; hence, admitting that the best way to
find genes is to look them up in a previously established catalog! Thus, the two
main principles behind state- of- the- art gene prediction software are (1) common
statistical regularities and (2) plain sequence similarity. From an
epistemological point of view, those concepts are quite primitive. JM
Claverie "From Bioinformatics to Computational
Biology" Genome Res 10: (9) 1277-
1279.Sept. 2000 http://genome.cshlp.org/content/10/9/1277.full
Algorithms
have been developed and are combined to recognize gene structural components.
Narrower/synonymous? term: ab initio gene prediction Related term: comparative
genomics
gene recognition:
Principally used for finding open reading
frames, tools of this type also recognize a number of features of
genes, such as regulatory regions, splice junctions, transcription and
translation stops and starts, GC islands, and poly adenylation sites. Laura De Francesco "Some things considered" Scientist 12[20]:18, Oct.
12, 1998
genetic
association studies:
The analysis of a sequence such as a region of a
chromosome, a haplotype, a gene, or an allele for its involvement in controlling
the phenotype of a specific trait, metabolic pathway, or disease. MeSH
2010 See also Genome Wide Association Studies GWAS
genetic models:
Theoretical
representations that simulate the behavior or activity of genetic processes or
phenomena. They include the use of mathematical equations, computers, and other
electronic equipment. MeSH 1980
genome annotation:
is a multi-level process that includes
prediction of protein-coding genes, as well as other functional genome units
such as structural RNAs, tRNAs, small RNAs, pseudogenes, control regions, direct
and inverted repeats, insertion sequences, transposons and other mobile
elements. NCBI Prokaryotic Genome Annotation Pipeline
https://www.ncbi.nlm.nih.gov/genome/annotation_prok/
Narrower term: comparative genome
annotation
genome assembly:
simply the genome
sequence produced
after chromosomes have been fragmented, those fragments have been
sequenced, and the resulting sequences have been put back together.
Ensembl FAQ
https://www.ensembl.org/Help/Faq?id=216
genome
misassembly:
We present the first collection of tools aimed at automated genome
assembly validation. This work formalizes several mechanisms for detecting
mis-assemblies, and describes their implementation in our automated
validation pipeline, Genome assembly
forensics: finding the elusive mis-assembly Genome Biology, 2008, Volume
9, Number 3, Page 1
Adam M Phillippy, Michael C Schatz, Mihai Pop
https://genomebiology.biomedcentral.com/articles/10.1186/gb-2008-9-3-r55
genome informatics:
Genome informatics is the field in which computer and statistical techniques are
applied to derive biological information from genome sequences. Genome
informatics includes methods to analyse DNA sequence information and to
predict protein sequence and structure. Nature latest research and news
https://www.nature.com/subjects/genome-informatics
genomic
computing: A genomic computing
network is a variant of a neural network for which a genome encodes all aspects,
both structural and functional, of the network. The genome is evolved by a
genetic algorithm to fit particular tasks and environments. The genome has three
portions: one for specifying links and their initial weights, a second for
specifying how a node updates its internal state, and a third for specifying how
a node updates the weights on its links. Preliminary experiments demonstrate
that genomic computing networks can use node internal state to solve POMDPs more
complex than those solved previously using neural networks. Association for
Computing Machinery, ACM Digital Library, Guide to Computing Literature http://portal.acm.org/citation.cfm?id=1143997.1144037&coll=&dl=&type=series&idx=1143997&part=Proce
genomic data:
The strength of genomic studies lies in the global comparisons between
biological systems rather than detailed examination of single genes or proteins.
Genomic information is often misused when applied exclusively to individual
genes. If one is interested only in one particular genes, there are many more
conclusive experiments that should be consulted before using the results from
genomic datasets. Therefore, genomic data should not be used in lieu of
traditional biochemistry, but as an initial guidelines to identify areas for
deeper investigation and to see how those results fit in with the rest of the
genome. Moreover, most genomics datasets give relative rather than
absolute information, which means that information about a single gene has
little meaning in isolation. Dov Greenbaum, Mark Gerstein et. al. "Interrelating Different Types of
Genomic Data" Dept. of Biochemistry and Molecular Biology, Yale Univ., 2001
http://bioinfo.mbb.yale.edu/e-print/omes-genomeres/text.pdf
Related terms:
Expression genes & proteins -Omes
& -Omics interactome; Proteomics
genomic
datasets:
The Integrative
Genomics Viewer (IGV) is
a high-performance visualization tool for interactive exploration of
large, integrated genomic datasets. It supports a wide variety of data
types, including array-based and next-generation sequence data, and
genomic annotations.
Broad Institute, Integrative Genomics Viewer
https://software.broadinstitute.org/software/igv/home
GRAIL:
Gene Recognition and Assembly Internet Link:
Major GRAIL Site updates, Broad Institute
https://software.broadinstitute.org/mpg/grail/faq.html
The
GRAILexp FAQ [no longer on web?]
with references to Perceval, an exon prediction program; Galahad, a gene message
alignment program and Gawain, a gene assembly program clearly has scientific and
literary finesse. Does this name relate in any way to Walter Gilbert's description of the Human
Genome Project as the "Holy Grail" of molecular biology? I
should investigate further
global normalization
or mean scaling:. The standard solution
for errors that effect entire arrays is to scale the data so that the average
measurement is the same for each array (and each color). The scaling is
accomplished by computing the average expression level for each array,
calculating a scale factor equal to the desired average divided by the actual
average, and multiplying every measurement from the array by that scale factor.
The desired average can be arbitrary, or computed from the average of a group of
arrays.
GWAS Genome Wide
Association Sequencing: An
analysis comparing the allele frequencies of all available (or a whole GENOME
representative set of) polymorphic markers in unrelated patients with a specific
symptom or disease condition, and those of healthy controls to identify markers
associated with a specific disease or condition. MeSH 2009
A genome-wide association study (GWAS)
is an approach used in genetics research to associate specific genetic
variations with particular diseases. The method involves examining genetic
variations (genotypes) across the complete sequences of DNA, or genomes, of many
different people to find genetic variants associated with a disease or trait
(phenotypes). Researchers can use the information to better understand how
genetic variation affects the normal function of genes, in addition to helping
develop better prevention and treatment strategies. US NIH, Genome Wide
Association Studies GWAS Policy
https://report.nih.gov/nihfactsheets/ViewFactSheet.aspx?csid=28
Pronounced gee-wahs Related term: next generation sequencing
high throughput
nucleotide sequencing: [analysis] Techniques of nucleotide sequence analysis that
increase the range, complexity, sensitivity, and accuracy of results by greatly
increasing the scale of operations and thus the number of nucleotides, and the
number of copies of each nucleotide sequenced. The sequencing may be done by
analysis of the synthesis or ligation products, hybridization to preexisting
sequences, etc. MeSH 2011
homologue, homologous:
Used by geneticists in two different senses: (1) one member of a
chromosome pair in diploid organisms, and (2) a
gene from
one species - -for example, the mouse - -that has a common origin and
functions the same as a gene from another species -- for example, humans,
Drosophila, or yeast. [NHLBI] Related terms:
Phylogenomics
lateral genomics, ortholog, orthologous, paralog, paralogous, synologous,
xenolog, xenologous;
Model organisms;
Protein
informatics homology
modeling
homology:This
is different from homologue as defined in the
Pharmaceutical biology
homology: The relationship among sequences due to descent from a common
ancestral sequence. An important organizing principle for genomic studies
because structural and functional similarities tend to change together
along the structure of homology relationships. When applied to nucleotide
or protein sequences, means relationship due to descent from a common
ancestral sequence. Two DNA molecules (or regions thereof) are homologous
if they both "descended" through a series of replication from a single DNA
strand … The terms "homology" and "similarity" are often, incorrectly,
used interchangeably. Homology has been used by various people with
different meanings, even though similarity was a common denominator among
these meanings. The two most important of these meanings related homology
to similar structures and/ or to similar functions. By structures I mean
both molecular
sequences
and morphology. Life would have been simple had phylogenetic homology
necessarily implied structural homology or either of them necessarily
implied functional homology. However, they map onto each other imperfectly
and my definition of homology includes all forms of characters. We could
reduce confusion by always indicating the kind of homology we are
referring to when using the tern. Walter Fitch "Homology a personal view
on some of the problem" Trends in Genetics 16 (5): 227-231 May 2000
Note that homology
can be genic, structural, functional or behavioral. Related
terms: Drug targets
target homology
Phylogenomics
evolutionary homology, orthology, paralogy, similarity;
Proteomics;
regulatory homology;
Narrower terms:
deep homology,
Sequencing sequence homology, sequence
homology- nucleic acid; Related terms homolog (homologue), similarity, ortholog,
paralog, xenology Homology Site
Guide, NCBI
https://www.ncbi.nlm.nih.gov/guide/homology/
Wikipedia
http://en.wikipedia.org/wiki/Homology_%28biology%29
International Nucleotide Database:
Composed of DDBJ, EMBL
and
GenBank.
local alignment:
The alignment of some portion of two nucleic acid or protein sequences.
NCBI BLAST glossary
Best alignment method for sequences for whom
no evolutionary relatedness is known. See Smith- Waterman alignment.
Compare global alignment.
log ratios: DNA
microarray assays typically compare two biological samples and present the
results of those comparisons gene-by-gene as the logarithm base two of the ratio
of the measured expression levels for the two samples. The limits of log ratios,
Vasily Sharov,1 Ka Yin Kwong,1 Bryan Frank,1
Emily Chen,1 Jeremy Hasseman,1 Renee Gaspard,1
Yan Yu,1 Ivana Yang,1 and John Quackenbush BMC
Biotechnology 4, 2004 doi: 10.1186/1472-6750-4-3. http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=400743
MAGE Microarray
and Gene Expression: See under FGED Standards
MAML Microarray Markup Language:
MAML (Microarray
Markup Language) is no longer supported by MGED and has been replaced by
MAGE-ML. Broader term: standards;
Related terms: data analysis - microarray, MGED, MIAME
MGED
Microarray Gene Expression Database group:
The MGED group was a grass- roots
movement whose goal was to facilitate the adoption of standards for DNA- array
experiment annotation and data representation, as well as the introduction of
standard experimental controls and data normalization methods. The group was
founded at the Microarray Gene Expression Database meeting MGED1
(November, 1999, Cambridge, UK).
MIAME
Minimum Information About a Microarray Experiment:
See under FGED standards
MIAME/MAGE-OM: See under FGED Standards
microarray
analysis techniques:
Wikipedia http://en.wikipedia.org/wiki/Microarray_analysis_techniques
microarrays - data analysis:
Microarrays have revolutionized molecular biology. The
numbers of applications for microarrays are growing as quickly as their probe
density. Paradoxically, microarray data still contains a large number of
variables and a small number of replicates creating unique data analysis sets.
Still, the first and most important goal is to design microarray experiments
that yield statistically defensible results. Related terms: image analysis - microarrays; standards; cluster analysis,
pattern recognition Algorithms & data
management glossary
microarrays
image
analysis: Although the visual image of a microarray panel is alluring, its information
content, per se, is minimal without significant image processing. To mine
its lode effectively, quantitative signal must be determined optimally,
which means subtracting background, calculating confidence intervals -
outside of which a difference in signal ratio is deemed to be significant - and calibrated.
Editorial “Getting hip to the chip” Nature Genetics
18(3): 195- 197 March 1998
This process starts with the image of a microarray
that is produced in the laboratory and produces intensity information indicating
the amount of light emitted by each probe. In particular, after the array has
been hybridized, it is scanned to obtain an image that shows the amount of light
emitted across the surface of the microarray. The image is then analyzed to
identify the "spots" (i.e., the parts of the image corresponding to
the DNA probes on the microarray) and the amount of light that can be attributed
to target molecules bound to each probe. Related term: normalization
microarray informatics:
The microarray field is experiencing an
overwhelming push toward robust statistics and mathematical analytic methods
that go far beyond the simple fold analysis and basic clustering that were once
the mainstays of researchers in this area. This push toward better statistics is
also driving the recognition of the need for more replication of experiments.
These stronger analytical techniques also help researchers identify problem
areas in the technology and laboratory processes, and these improvements, in
turn, greatly improve the quality of results that can be provided. Related
terms microarray analysis, microarray data analysis
mismatches: Gene
expression microarray data is notoriously subject to high signal variability.
Moreover, unavoidable variation in the concentration of transcripts applied to
microarrays may result in poor scaling of the summarized data which can hamper
analytical interpretations. This is especially relevant in a systems biology
context, where systematic biases in the signals of particular genes can have
severe effects on subsequent analyses. Conventionally it would be necessary to
replace the mismatched arrays, but individual time points cannot be rerun and
inserted because of experimental variability. It would therefore be necessary to
repeat the whole time series experiment, which is both impractical and
expensive. Correction of scaling mismatches in oligonucleotide microarray data,
Mrtino Barenco, Jaroslav Stark3 ,1, Daniel Brewer2 ,1,
Daniela Tomescu1, Robin Callard1 ,2 and Michael Hubank1
BMC bioinformatics 2006, 7:251 doi:10.1186/1471-2105-7-251 http://www.biomedcentral.com/1471-2105/7/251
molecular
sequence annotation: The addition of descriptive information about the
function or structure of a molecular sequence to its MOLECULAR SEQUENCE DATA
record. MeSH 2011
noise
characterization: Noise is a big problem in analyzing gene expression
microarray data. Of course noise is a problem with biological data in
general.
normality:
Related term:
Molecular Medicine normal
normalization, microarray: Underlying
every microarray experiment is an experimental question that one would like to
address. Finding a useful and satisfactory answer relies on careful experimental
design and the use of a variety of data-mining tools to explore the
relationships between genes or reveal patterns of expression. …
this review focuses on the much more mundane
but indispensable tasks of 'normalizing' data from individual hybridizations to
make meaningful comparisons of expression levels, and of 'transforming' them to
select genes for further analysis and data mining.
Microarray data normalization and transformation
John Quackenbush
Nature Genetics volume32, pages496–501 (2002)
doi:10.1038/ng1032 Published online: 01 December 2002
The conversion of intensity
information (from image analysis) into estimates of gene expression
levels. For researchers who are using statistical methods, this process also
characterizes the uncertainty in the measurements. The goal of normalization is
to convert the intensity measurements generated by image analysis into estimates
of gene expression levels in the original biological source. Concretely,
the challenge is to compensate for as many sources of error as possible. Related terms: fold changes, image analysis, log
ratios; See also normalization: Algorithms
oligonucleotide array sequence analysis:
Hybridization of a nucleic acid sample to a very large set of
oligonucleotide probes, which are attached to a solid support, to determine sequence or to detect variations in a gene
sequence or
expression or for gene mapping.
MeSH, 1999
Useful to know this MeSH heading for microarrays, but use free- text as
well to search PubMed.
ORF prediction:
Related terms: exon prediction, gene prediction,
gene recognition.
Phred:
Base calling program for DNA sequence traces;
... developed by Drs. Phil Green and Brent Ewing, and is distributed under
license from the University of Washington. http://www.phrap.org/
Phred base calling: http://en.wikipedia.org/wiki/Phred_base_calling
reverse
transfection:
a technique for the
transfer of genetic
material into cells.
As DNA is printed on a glass slide for the transfection process
(the deliberate introduction of nucleic
acids into cells) to occur before the
addition of adherent cells, the order of addition of DNA and adherent cells is
reverse that of conventional transfection.[1] Hence,
the word “reverse” is used. Wikipedia accessed 2018 Aug 29
https://en.wikipedia.org/wiki/Reverse_transfection
Forward and reverse transfection protocols each
have their significant uses in research. The main protocol difference
between forward and reverse transfection is whether or not the cells are
plated the day before transfection (as in forward transfection) or seeded
at the same time of the transfection. Forward transfection is commonly
used in situations where the cells need to be already attached and in a
growth phase prior to the nucleic acid + transfection reagent complex is
applied. In contrast, a reverse transfection is the process in which the
nucleic acid + transfection reagent complex is assembled in the tissue
culture plate and then the cells are seeded into the wells. There are
many benefits to the reverse transfection method, including: The method is
ideal for high-throughput screening since reverse transfection is
compatible with automated robots The lack of needing to pre-plate cells
saves a dayHigh efficiency of the reverse transfection decreases the
amount of nucleic acid usedUnlike forward transfection, the transfection
reagent can remain in contact with the cells for 24-72 hours. Altogen
Biosystems, Forwartd transfection or reverse transfection?:
https://altogen.com/forward-transfection-reverse-transfection/
RNA
sequence analysis:
A multistage process that includes cloning, physical mapping, subcloning,
sequencing, and information analysis of an RNA SEQUENCE MeSH 1993
scaffolds:
A series of contigs that are in the right order but are not necessarily
connected in one continuous stretch of sequence. History of the Human
Genome Project" A Genome Glossary" Science 291: pullout chart Feb. 16, 2001
Contig sequences
separated by gaps NCBI Whole Genome Shotgun Submissions http://www.ncbi.nlm.nih.gov/genbank/wgs.html
The definition of a scaffold appears to be quite different in the Science
and Nature draft published sequences. David Galas "Making sense of sequence"
Science 291: 1257- Feb. 16, 2001 This is also different from the scaffold defined in Drug
discovery and development.
sequence alignment:
The arrangement of two or more amino acid or base
sequences from an organism or organisms in such a way as to align areas of the sequences sharing common properties. The degree of relatedness or
homology between the sequences is predicted computationally or statistically based on weights assigned to the elements aligned between the sequences. This in turn can serve as a potential indicator of the genetic relatedness between the organisms.
MeSH, 1991 Broader term? alignments.
sequence homology:
The degree of similarity between sequences. Studies of
amino acid and nucleotide sequences provide useful
information about the genetic relatedness of certain species. MeSH, 1993 Broader
term Functional genomics homology;
Related terms Functional
genomics evolutionary homology; Proteomics
regulatory homology;
sequence homology - nucleic acid:
The sequential correspondence
of nucleotide triplets in a nucleic acid molecule which permits nucleic
acid hybridization. Sequence homology is important in the study of mechanisms
of oncogenesis and also as an indication of the evolutionary relatedness
of different organisms. The concept includes viral homology. MeSH, 1991 Broader term sequence homology
Sequence Ontology
Project:
The Sequence Ontology is a set of terms and
relationships used to describe the features and attributes of biological
sequence.
http://www.sequenceontology.org/
sequencing algorithms: See
BLAST, FASTA, Needleman - Wunsch,
Smith - Waterman
similarity search: BLAST, FASTA
and Smith- Waterman are examples of similarity search
algorithms.
Smith-Waterman alignment:
https://en.wikipedia.org/wiki/Smith%E2%80%93Waterman_algorithm
Genomic informatics resources
Ensembl glossary
https://www.ensembl.org/Multi/Help/Glossary?db=core
NHGRI (National Human Genome Research Institute),
Talking Glossary of Genetic Terms, 100+ definitions.
https://www.genome.gov/genetics-glossary Includes
extended audio definitions. Schlindwein Birgid, Hypermedia Glossary of
Genetic Terms, 2006, 670 definitions.
http://www.weihenstephaon.de/~schlind/index.html
Informatics
Conferences
http://www.healthtech.com/conferences/upcoming.aspx?s=NFO
BioIT World Expo
http://www.bio-itworldexpo.com/
Molecular Medicine Tri Conference
http://www.triconference.com/
Informatics Short courses
http://www.healthtech.com/Conferences_Upcoming_ShortCourses.aspx?s=NFO
BioIT World magazine
http://www.bio-itworld.com/
BioIT World archives
http://www.bio-itworld.com/BioIT/BioITArchive.aspx
How
to look for other unfamiliar terms
IUPAC definitions are reprinted with the permission of the International
Union of Pure and Applied Chemistry.
|