You are here Biopharmaceutical/ Genomic Glossary Homepage > Informatics >Information management & interpretation

Information management & interpretation glossary & taxonomy
Evolving Terminologies for Emerging Technologies
Comments? Questions? Revisions?
Mary Chitty MSLS
Last revised January 09, 2020

The dividing line between this glossary and Algorithms & data analysis is very fuzzy. In general this one focuses primarily on unstructured data (or a combination of structured and unstructured), while Algorithms centers on structured data  Finding guide to terms in these glossaries Informatics term index   Informatics includes Bioinformatics  Clinical informatics   Drug discovery informatics   IT infrastructure    Ontologies & Taxonomies are subsets of, and critical tools for Information management & interpretation  Technologies Microarrays & protein chips    Sequencing 

3D technologies: Visual communications are pervasive in information technology and are a key enabler of most new emerging media. In this context, the NRC Institute for Information Technology (NRC-IIT) performs research, development and technology transfer activities to enable access to 3D information of the real world. Research in the 3D Technologies program focuses on three main areas: Virtualizing Reality and Visualization, Collaborative Virtual Environments, 3D Data Mining and Management Institute for Information Technology, National Research Council, Canada, 3D Technologies

artificial intelligence: Data science & machine learning 

bias: One of the two components of measurement error (the other one being variance). Bias is a systematic error that causes the measurement to differ from the correct value. Since bias is systematic, it affects all experiment replicas the same way. 

collaborative filtering:  Drawing upon the collective navigation and purchasing behavior of users creates a highly distributed, adaptive solution. Amazon is the reigning champion, featuring People who bought this item also bought and Purchase Circles. Other examples include Microsoft’s Top Downloads and’s Weekly Bottom 40.  Peter Morville  Innovation Architecture

collaborative metadata: A robust increase in both the amount and quality of metadata is integral to realizing the Semantic Web. The research reported on in this article addresses this topic of inquiry by investigating the most effective means for harnessing resource authors' and metadata experts' knowledge and skills for generating metadata. Jane Greenberg, W. Davenport Robertson, Semantic web construction: An Inquiry of Authors' Views on Collaborative Metadata Generation, International Conference DC 2002, Metadata for e-Communities, Oct. 13- 17, 2003, Florence Italy 

computational linguistics:  Computational Linguistics, or Natural Language Processing (NLP), is not a new field. As early as 1946, attempts have been undertaken to use computers to process natural language. These attempts concentrated mainly on Machine Translation ... the limited performance of these systems made it clear that the underlying theoretical difficulties of the task had been grossly underestimated, and in the following years and decades much effort was spent on basic research in formal linguistics. Today, a number of Machine Translation systems are available commercially although there still is no system that produces fully automatic high- quality translations (and probably there will not be for some time). Human intervention in the form of pre- and/ or post-editing is still required in all cases.  Another application that has become commercially viable in the last years is the analysis and synthesis of spoken language, i.e. speech understanding and speech generation. ... An application that will become at least as important as those already mentioned is the creation, administration, and presentation of texts by computer. Even reliable access to written texts is a major bottleneck in science and commerce. The amount of textual information is enormous (and growing incessantly), and the traditional, word- based, information retrieval methods are getting increasingly insufficient as either precision or recall is always low (i.e. you get either a large number of irrelevant documents together with the relevant ones, or else you fail to get a large number of the relevant ones in the collection). Linguistically based retrieval methods, taking into account the meaning of sentences as encoded in the syntactic structure of natural language, promise to be a way out of this quandary. Computational Linguistics FAQ,
Linguistics, natural language, and computational linguistics Meta- Index
, Stanford Univ. US 

data cleaning, data integration: Algorithms & data analysis 

data management methods: Algorithms & data analysis  has automated methods, methods in this glossary generally combine human and automated methods.

data mapping: Wikipedia 

data management: Each new generation of DNA sequencers, mass spectrometers, microscopes, and other lab equipment produces a richer, more detailed set of data. We’re already way beyond gigabytes (GB): a single next-generation sequencing experiment can produce terabytes (TB) of data in a single run. As a result, any organization running hundreds of routine experiments a month or year, or trying to handle the output of next-generation sequence instruments, quickly finds itself with a massive data management problem.  Data Management: The Next Generation, Salvatore Salamone, BioIT World, Oct 2007  

data mining: Nontrivial extraction of implicit, previously unknown and potentially useful information from data, or the search for relationships and global patterns that exist in databases.  W. Frawley and G. Giatetsky-Schapiro and C. Matheus, “Knowledge Discovery in Databases: An Overview.” AI Magazine,  213- 228, Fall 1992 Exploration and analysis, by automatic or semi- automatic means, of large quantities of data in order to discover meaningful patterns or rules. Berry, MJA, Data Mining Techniques for Marketing, Sales and Customer Support John Wiley & Sons, New York 1997 cited in Nature Genetics 21(15): 51-55 ref 11, 1999

data quality:  A vital consideration for data analysis and interpretation.  While people are still reeling from the vast amount of data becoming available, they need to brace themselves to both discard low quality data and handle much more at the same time.  
Data quality glossary, Graham Rind, GRC Data Intelligence,  6,700 terms. 

data science: Data science & machine learning

Data Visualization & Exploration ToolsOmics, Drug Discovery, and Clinical Development 2020 April 21-23 Boston MA With a sharp increase in the volume and complexity of big data sets for research and drug discovery labs, data visualization is needed to clearly express the complex patterns. It is more important than ever to develop data visualization and exploration tools alongside the rest of the analytics, as opposed to later in the game. The Data Visualization & Exploration Tools track will address ways to not only develop, design, and implement visualization tools in genomics, drug discovery, clinical development, and translational research, but also address real-world case studies where these tools have been successfully used.

The classical definition of visualization is as follows: the formation of mental visual images, the act or process of interpreting in visual terms or of putting into visual form. A new definition is a tool or method for interpreting image data fed into a computer and for generating images from complex multi-dimensional data sets Definitions and Rationale for Visualisation, D. Scott Brown, SIGGRAPH, 1999    Related term: information visualization; Broader term: visualization

databases: Bioinformatics;  Databases & software directory

deep web:  Related term:  invisible web

description logic: Has existed as a field for a few decades yet only somewhat recently has appeared to transform from an area of academic interest to an area of broad interest. This paper provides a brief historical perspective of description logic developments that have impacted DL usability to include communities beyond universities and research labs.  Deborah L. McGuinness. ``Description Logics Emerge from Ivory Towers''. Stanford Knowledge Systems Laboratory Technical Report KSL-01-08 2001. In the Proceedings of the International Workshop on Description Logics. Stanford, CA, August 2001.

The main effort of the research in knowledge representation is providing theories and systems for expressing structured knowledge and for accessing and reasoning with it in a principled way. Description Logics are considered the most important knowledge representation formalism unifying and giving a logical basis to the well known traditions of Frame- based systems, Semantic Networks and KL- ONE-like languages, Object- Oriented representations, Semantic data models, and Type systems. Description Logic Knowledge Representation

digitization,[1][2] less commonly digitalization,[3][4][5] is the process of converting information into a digital (i.e. computer-readable) format, in which the information is organized into bits.[1][2] The result is the representation of an object, imagesounddocument or signal (usually an analog signal) by generating a series of numbers that describe a discrete set of its points or samples. The result is called digital representation or, more specifically, a digital image, for the object, and digital form, for the signal. In modern practice, the digitized data is in the form of binary numbers, which facilitate computer processing and other operations, but, strictly speaking, digitizing simply means the conversion of analog source material into a numerical format; the decimal or any other number system that can be used instead.  Digitization is of crucial importance to data processing, storage and transmission, because it "allows information of all kinds in all formats to be carried with the same efficiency and also intermingled".[6] Unlike analog data, which typically suffers some loss of quality each time it is copied or transmitted, digital data can, in theory, be propagated indefinitely with absolutely no degradation. This is why it is a favored way of preserving information for many organisations around the world.  Wikipedia accessed 2019 July 8

disambiguate: Make less ambiguous, clarify, elucidate. 

domain expertise: Wikipedia 

Dublin Core:  The Dublin Core Metadata Initiative, or "DCMI", is an open organization supporting innovation in metadata design and best practices across the metadata ecology. DCMI's activities include work on architecture and modeling, discussions and collaborative work in DCMI Communities and DCMI Task Groups, global conferences, meetings and workshops, and educational efforts to promote widespread acceptance of metadata standards and best practices.

evolvability:   Tim Berners Lee defines 
Wikipedia      See also under interoperability

federated databases: An integrated repository data from of multiple, possibly heterogeneous, data sources presented with consistent and coherent semantics. They do not usually contain any summary data, and all of the data resides only at the data source (i.e. no local storage).   Lawrence Berkeley Lab "Advanced Computational Structural Genomics" Glossary 

federated information systems. Their main characteristic is that they are constructed as an integrating layer over existing legacy applications and databases. They can be broadly classified in three dimensions: the degree of autonomy they allow in integrated components, the degree of heterogeneity between components they can cope with, and whether or not they support distribution. Whereas the communication and interoperation problem has come into a stage of applicable solutions over the past decade, semantic data integration has not become similarly clear. Susanne Busse et. al "Federated Information Systems: Concepts, Terminology and Architecture"  Computergestützte Informations Systeme CIS, Berlin, Germany

fractal nature of the web: Tim Berners- Lee, Commentary on architecture, Fractal nature of the web, first draft  This article was originally entitled "The Fractal nature of the web". Since then, i have been assured that while many people seem to use fractal to refer to a Zipf (1/f) distribution, it should really only be used in spaces of finite dimension, like the two-dimensional planes of MandelBrot sets. The correct term for the Web, then, is scale-free.

Society has to be fractal - people want to be involved on a lot of different levels. The need for things that are local and special will create enclaves. And those will give us the diversity of ideas we need to survive. Tim Berners Lee, in "The father of the web", Evan Schwartz, Wired Mar. 1997

granularity: <jargon, parallel> The size of the units of code under consideration in some context The term generally refers to the level of detail at which code is considered, e.g. "You can specify the granularity for this profiling tool". The most common computing use is in parallelism where "fine grain parallelism" means individual tasks are relatively small in terms of code size and execution time, "coarse grain" is the opposite. You talk about the "granularity" of the parallelism. The smaller the granularity, the greater the potential for parallelism and hence speed- up but the greater the overheads of synchronisation and communication. FOLDOC

The extent to which a system contains separate components (like granules). The more components in a system - or the greater the granularity - the more flexible it is. Webopedia

Level of detail seems to be the essence of granularity.  

informatics: a branch of information engineering. It involves the practice of information processing and the engineering of information systems, and as an academic field it is an applied form of information science. The field considers the interaction between humans and information alongside the construction of interfaces, organisations, technologies and systems. As such, the field of informatics has great breadth and encompasses many subspecialties, including disciplines of computer scienceinformation systemsinformation technology and statistics. Since the advent of computers, individuals and organizations increasingly process information digitally. This has led to the study of informatics with computational, mathematical, biological, cognitive and social aspects, including study of the social impact of information technologies.

Narrower terms: bioinformatics;    cheminformatics;   clinical informatics: molecular informatics,  Biomaterials: matinformatics   research informatics; Drug discovery & development: life sciences informatics, Intellectual property & legal;  patinformatics; Molecular imaging: image informatics;  pharmacoinformatics,   Protein informatics 

information architecture:  The art and science of organizing information to help people effectively fulfill their information needs. Information architecture involves investigation, analysis, design and implementation. Top-down and bottom-up are the two main approaches to developing information architectures; these approaches inform each other and are often developed simultaneously. See also: bottom-up information architecture, top-down information architecture, user investigation.  Information architecture glossary, Kat Hagedorn, Argus Associates, 2000, 60 + definitions

information ecology: Wikipedia 

information extraction: Automated ways of extracting unstructured or partially structured information from machine readable files. Compare with information retrieval.   Related terms: natural language processing, term extraction

information harvesting: See under Knowledge Discovery in Databases KDD 

information integration: Our research group is developing intelligent techniques to enable rapid and efficient information integration. The focus of our research has been on the technologies required for constructing distributed, integrated applications from online sources. This research includes: Information Extraction: Machine learning techniques for extracting information from online sources; Source Modeling: Constructing a semantic model of wrapped sources so that they can be automatically integrated with other sources; Record Linkage: Learning how to align records across sources; Data Integration: Generating plans to automatically integrate data across sources; Plan Execution: Representing, defining, and efficiently executing integration plans in the Web environment; Constraint-based Integration  Interactive constraint-based planning and integration for the Web environment. Information Integration Research Group, Intelligent Systems Division, Information Sciences Institute (ISI), University of Southern California 

information overload: Biomedicine is in the middle of revolutionary advances. Genome projects, microassay methods like DNA chips, advanced radiation sources for crystallography and other instrumentation, as well as new imaging methods, have exceeded all expectations, and in the process have generated a dramatic information overload that requires new resources for handling, analyzing and interpreting data. Delays in the exploitation of the discoveries will be costly in terms of health benefits for individuals and will adversely affect the economic edge of the country. Opportunities in Molecular Biomedicine in the Era of Teraflop Computing: March 3 & 4, 1999, Rockville, MD, NIH Resource for Macromolecular Modeling and Bioinformatics Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana- Champaign Molecular Biomedicine in the Era of Teraflop Computing -

Many of today's problems stem from information overload and there is a desperate need for innovative software that can wade through the morass of information and present visually what we know.  The development of such tools will depend critically on further interactions between the computer scientists and the biologists so that the tools address the right questions, but are designed in a flexible and computationally efficient manner.  It is my hope that we will see these solutions published in the biological or computational literature.  Richard J. Roberts, The early days of bioinformatics publishing, Bioinformatics 16 (1): 2-4, 2000

"Information overload" is not an overstatement these days. One of the biggest challenges is to deal with the tidal wave of data, filter out extraneous noise and poor quality data, and assimilate and integrate information on a previously unimagined scale   

Where's my stuff? Ways to help with information overload, Mary Chitty, SLA presentation June 10, 2002, Los Angeles CA

information retrieval:  Wikipedia 

information visualization:  Wikipedia    Related term: data visualization; Broader term: visualization

invisible web: 
Unlike pages on the visible Web (that is, the Web that you can access from search engines and directories), information in databases is generally inaccessible to the software spiders and crawlers that create search engine indexes. Users are able to access most of this information, but only through specific searches that unlock where this information lives.
Invisible web medical information  2017  Related terms: deep web, semantic web

knowledge integration:
Wikipedia   Related terms: ontologies, semantics

knowledge management:  Systematic approach to acquiring, analyzing, storing, and disseminating information related to products, manufacturing processes, and components ICH Q10   Related terms: ontologies, paraphrase problem, taxonomies 
Virtual Library: Knowledge Management, May 2000 Definition, articles, white papers, interviews, business and technology library, periodicals and publications, “out of box thinking”, “movers and shakers”, “think tank”, calendar of events, emerging topics.

lexical semantics: 

lexicon: A machine- readable dictionary that may contain a good deal of additional information about the properties of the words, notated in a form that parsers can utilize. Bob Futrelle, A brief introduction to NLP,, , Computer Science, Northeastern Univ., US, 2002

A linguistics term (words and their definitions), an artificial intelligence term.  Sometimes a synonym for glossary or dictionary. 

linked data:  Linked Data is about using the Web to connect related data that wasn’t  previously linked, or using the Web to lower the barriers to linking data currently linked using other methods.  

Linked data glossary  

machine-readable: See under metadata 
machine-understandable: See under metadata 

markup languages: Computers & computing 


metadata: Ontologies & Taxonomies

NISO Journal Article Tag Suite (JATS) project at NCBI is a continuation of the work done here to create and support the "Archiving and Interchange Tag Suite" or the "NLM DTDs". The current version (NISO JATS 1.0) reflects the changes made to v0.4 based on public comments and is fully backward compatible with NLM version 3.0.

precision: Percentage of unrelated material excluded by a specific query or search statement. Related terms: Genetic testing analytical specificity, clinical specificity Compare recall  

query contraction: Needed when a search engine retrieves thousands of citations. May consist of additional (Boolean AND terms) or different (Boolean OR). 

query expansion: Adding new and/ or different terms to a search statement (particularly when a search engine or database retrieve no hits). Often uses Boolean OR.  Related terms: ontologies, taxonomies

recall: The percentage of applicable material retrieved by a specific query or search statement. Compare precision. Related term: Genetic testing  sensitivity 

relevance: Percentage of truly related material retrieved by a specific query or search statement. Related terms: precision Genetic testing & diagnostics analytical specificity, clinical specificity. Compare recall 

remembrance agents:  Bradley Rhodes, The Remembrance Agent (RA) is a program which augments human memory by displaying a list of documents which might be relevant to the user's current context. Unlike most information retrieval systems, the RA runs continuously without user intervention. Its unobtrusive interface allows a user to pursue or ignore the RA's suggestions as desired.
Related terms: collaborative filtering, just in time information

schema: (databases) A formal description of the structure of a database: the names of the tables, the names of the columns of each table, and the data type and other attributes of each column. (markup languages) A formal description of data, data types, and data file structures, such as XML schemas for XML files.  Wiktionary

Schema may refer to: SCHEMA (bioinformatics), an algorithm used in protein engineering Schema (genetic algorithms), a set of programs or bit strings that have some genotypic similarity Wikipedia disambiguation accessed 2018 Oct 27

semantic: Ontologies & taxonomies

social informatics: Wikipedia

soft computing:
The principal aim of soft computing is to exploit the tolerance for imprecision and uncertainty to achieve tractability, robustness and low solution cost. At this juncture, the principal constituents of soft computing (SC) are fuzzy logic (FL), neural network theory (NN) and probabilistic reasoning (PR), with the latter subsuming genetic algorithms, belief networks, chaotic systems, and parts of learning theory. In the triumvirate of SC, FL is concerned in the main with imprecision, NN with learning and PR with uncertainty. In large measure, FL, NN and PR are complementary rather than competitive. It is becoming increasingly clear that in many cases it is advantageous to employ FL, NN and PR in combination rather than exclusively.; Zadeh L.A. (1993) The role of fuzzy logic and soft computing in the conception and design of intelligent systems. In: Klement E.P., Slany W. (eds) Fuzzy Logic in Artificial Intelligence. FLAI 1993. Lecture Notes in Computer Science (Lecture Notes in Artificial Intelligence), vol 695. Springer, Berlin, Heidelberg  

SOAP Simple Object Access Protocol:

Software Applications & ServicesTools that Best Utilize Data to Drive Scientific Decision Making April 21-23, 2020 Boston MA As data generation increases, there is a need for workflows that are reproducible across infrastructures, able to empower scientists and researchers to apply cutting-edge analysis methods. A main challenge is scientific data is not centralized or standardized and is fragmented – from instrumentation to clinical research to legacy software. Explores how biopharma companies are utilizing software tools to leverage data platforms to advance data strategies. Case studies will focus on data analytics approaches, data methods and standards, approaches, transparency, efficiency, security, and cost-effective solutions.

syntactic, syntax: Ontologies & taxonomies

term extraction: Robert Futrelle, Northeastern Univ., 2001  See related information extraction

term mining:  Term Mining in Biomedicine, Sophia Ananiadou - University of Manchester, 2007 

text categorisation: See Algorithms & data analysis under support vector machines 

text mining:  Text mining is the process of analyzing collections of textual materials in order to capture key concepts and themes and uncover hidden relationships and trends without requiring that you know the precise words or terms that authors have used to express those concepts. Although they are quite different, text mining is sometimes confused with information retrieval. While the accurate retrieval and storage of information is an enormous challenge, the extraction and management of quality content, terminology, and relationships contained within the information are crucial and critical processes. h ttps://

Using data mining on unstructured data, such as the biomedical literature.  Related terms:  natural language processing; Algorithms & data analysis: support vector machines 
Text Mining Glossary, ComputerWorld, 2004   Includes Categorization, clustering, extraction, keyword search, natural language processing, taxonomy, and visualization.

unstructured data: Generally free text, natural language.  Related term: natural language processing. Compare structured.   

variance: One of the two components of measurement error (the other one being bias). Variance results from uncontrolled (or uncontrollable) variation that occurs in biological samples, experimental procedures, and arrays themselves;  

visualization:   A method of computing by which the enormous bandwidth and processing power of the human visual (eye- brain) system becomes an integral part of extracting knowledge from complex data.  It utilizes graphics and imaging techniques as well as knowledge of both data management and the human visual system.  Lloyd Trenish, Visualization for Deep Thunder, IBM Research, 2002

Use of computer- generated graphics to make the information more accessible and interactive. Related term data mining  Narrower terms: data visualization, information visualization; Algorithms & data analysis  dendogram, heat map, profile chart
Definitions and Rationale for Visualisation,
D. Scott Brown, SIGGRAPH, 1999 

W3C World Wide Web Consortium: Develops interoperable technologies (specifications, guidelines, software, and tools) to lead the Web to its full potential. W3C is a forum for information, commerce, communication, and collective understanding.

web: The genome community was an early adopter of the Web, finding in it a way to publish its vast accumulation of data, and to express the rich interconnectedness of biological information. The Web is the home of primary data, of genome maps, of expression data, of DNA and protein sequences, of X-ray crystallographic structures, and of the genome project's huge outpouring of publications. ... However the Web is much more than a static repository of information. The Web is increasingly being used as a front end for sophisticated analytic software. Sequence similarity search engines, protein structural motif finders, exon identifiers, and even mapping programs have all been integrated into the Web. Java applets are adding rapidly to Web browsers' capabilities, enabling pages to be far more interactive than the original click- fetch- click interface. Lincoln D. Stein "Introduction to Human Genome Computing via the World Wide Web", Cold Spring Harbor Lab, 1998 Related terms: fractal nature of the web, weblike Narrower terms:  semantic web, web portals, web services  

web services:  The goal of the Web Services Activity is to develop a set of technologies in order to bring Web services to their full potential.  W3C "Web Services Activity  
Web services glossary

web services interoperability: Web services technology has the promise to provide a new level of interoperability between software applications. It should be no wonder then that there is a rush by platform providers, software developers, and utility providers to enable their software with SOAP, WSDL, and UDDI capabilities.

webizing: "Webizing Existing Systems" Tim Berners-Lee, last updated 2001

weblike: Tim Berners- Lee, Ralph Swick, Semantic web Amsterdam, 2000 May 16

Tim Berners- Lee writes in his account of coming up with the idea of the web Weaving the Web about "learning to think in a weblike way".

I don't know that I can claim to approach this yet, but the more that I write and research this glossary on and for the web, the more insight I'm getting into what he might mean. Metaphors like "shooting at a moving target" and like Wayne Gretzky "skating to where the puck is going to be" are helpful images.   

workflows:  A collaborative environment where scientists can safely publish their workflows and experiment plans, share them with groups and find those of others. Workflows, other digital objects and collections (called Packs) can now be swapped, sorted and searched like photos and videos on the Web. ...  myExperiment makes it really easy for the next generation of scientists to contribute to a pool of scientific workflows, build communities and form relationships. It enables scientists to share, reuse and repurpose workflows and reduce time-to-experiment, share expertise and avoid reinvention. myExperiment 

Information Management Resources
Jinfo blog
W3C Glossary and Dictionary 
Webopedia Information Technology encyclopedia. About 3,000 + definitions.

IUPAC definitions are reprinted with the permission of the International Union of Pure and Applied Chemistry.

How to look for other unfamiliar  terms

Contact | Privacy Statement | Alphabetical Glossary List | Tips & glossary FAQs | Site Map