You are here Biopharmaceutical/ Genomic Glossary Homepage/Search > Informatics >Information management & interpretation

Biopharmaceutical information management & interpretation glossary & taxonomy
Evolving Terminologies for Emerging Technologies
Comments? Questions? Revisions? Mary Chitty 
mchitty@healthtech.com
Last revised April 16, 2010


New Page 1

Please register for CHI's Genomics Glossaries & Taxonomies website. This sign-in box with then disappear from each page, if you accept cookies. Use of this site will continue to be free, but better demographic data on who is accessing this material helps us to justify the expense of maintaining this resource. Registration policy has details.

Registered users of the Genomics Glossaries & Taxonomies will automatically be signed up for CHI's complimentary email monthly newsletter, GenomeLink, unless you choose to opt out of receiving it.

Mr.     Ms.     Mrs.     Dr.     Prof.

First:

         

Last:

Title:

Dept.:

Company:

Address:

City:

State:

Zip:

Country:

Email:

Opt-out of Email

YES    NO

Telephone:

Would you like to receive CHI event updates via fax? 
Yes       No 

Fax:


The dividing line between this glossary and Algorithms & data analysis is very fuzzy. In general this one focuses on unstructured data (or a combination of structured and unstructured), while Algorithms centers on structured data  Finding guide to terms in these glossaries Informatics  Map   Site Map
Informatics includes Bioinformatics   Computers & computing    In silico & Molecular Modeling   Ontologies & Taxonomies are subsets of, and critical tools for Information management & interpretation  
Technologies Microarrays & protein chips    Sequencing 

Advances in biology and new high-throughput technologies are generating massive amounts of data that overwhelm the current information technology infrastructure. The challenge is to build a common capability that enables a more efficient translation of data into knowledge that leads to new and effective treatments.   caBigTM and Molecular Medicine, NCI, NIH http://cabig.cancer.gov/molecular/overview.asp   

Google = "data analysis" about 1,420,000 as of July 23, 2002; about 4,480,000 as of Sept. 23, 2004; about 16,000,000 Nov 18, 2009 
 "data interpretation" about 58, 200 July 23, 2002; about 147,000 as of Sept. 23, 2004; about 928,000 Nov 18, 2009

3D technologies: Visual communications are pervasive in information technology and are a key enabler of most new emerging media. In this context, the NRC Institute for Information Technology (NRC-IIT) performs research, development and technology transfer activities to enable access to 3D information of the real world. Research in the 3D Technologies program focuses on three main areas: Virtualizing Reality and Visualization, Collaborative Virtual Environments, 3D Data Mining and Management [Institute for Information Technology, National Research Council, Canada, 3D Technologies] 

artificial intelligence: Algorithms & data analysis

Google = about  1,120,000  July 19, 2002; about 3, 040,000 Oct. 22, 2004

BIRN Biomedical Informatics Research Network: http://www.nbirn.net/ 

bias: One of the two components of measurement error (the other one being variance). Bias is a systematic error that causes the measurement to differ from the correct value. Since bias is systematic, it affects all experiment replicas the same way. 

bibliomining:  The combination of data mining, bibliometrics, statistics, and reporting tools used to extract patterns of behavior- based artifacts from library systems. Scott Nicholson, Bibliomining: Data Mining for Libraries, Syracuse Univ. US http://www.bibliomining.com/ 

bioinformatics visualization: BIoinformatics

biomedical computing: Computers & computing

Google = about 11,800 July 19, 2002; about 20,900 Oct. 22, 2004

biomedical informatics:  caBIG® stands for the cancer Biomedical Informatics Grid®. caBIG® is an information network enabling all constituencies in the cancer community – researchers, physicians, and patients – to share data and knowledge.  The components of caBIG® are widely applicable beyond cancer as well. The mission of caBIG® is to develop a truly collaborative information network that accelerates the discovery of new approaches for the detection, diagnosis, treatment, and prevention of cancer, ultimately improving patient outcomes. National Cancer Institute, NIH, US About caBig, 2008  https://cabig.nci.nih.gov/overview/ 

Google about 66,600 Oct. 22, 2004; about 388,000 Nov 18, 2009

BIONLP.org: Bioinformatics

biopharmaceutical informatics: Drug companies go through a very arduous and regulated discovery, applied research, and development process- typically spanning five years of laboratory research and ten years of clinical studies .. multinational clinical studies, which need to be done with tremendous precision over a very long period of time. The study parameters must be identical for every patient (many times numbering 10,000 patients, followed for five or more years), and all the participating hospitals essentially have to behave in exactly the same way for the trial to be valid. ..  The life science industry is conservative by nature, and therefore it is a late- adopting industry. It is very sensitive to standards because of the legacy according to which these companies have to maintain data and information. Major pharmaceutical companies typically adopt a 100-year minimum document retention policy, ...each of the industry's four industrial sectors - the pharmaceutical, the biotech, the medical device, and the diagnostics sector - has a different set of needs and desires, as well as its own requirements for unique IT solutions.  ... 

Life science companies are dealing with very large computational data sets. Some are now approaching half terabyte sizes and upward Life science companies also immensely concern themselves with security, because their data represent their crown jewels. Other major concerns expressed by this industry include the stability, scalability, and security of an operating environment. Life science companies and regulatory bodies such as the FDA are more concerned than ever with operating environments that decay with use: When under computational stress, these fragile operating systems have a habit of crashing, and when these systems crash, they tend to corrupt data. ...

Post-genomic, proteomic, chemical information, and other data sets have created a major appetite for solutions to deal with this tremendous amount of data. Scientists are now asking their IT professionals for the ability to better conceptualize and interpret the meaning of this vast information. To do this, scientists need tools for 3D visualization with a tremendous degree of high definition and accuracy. The next step is to take disparate data sets, render them into 3D values, see the DNA and RNA interface, watch protein folds, and then put a therapeutic small molecule in there and see how it relates within a virus that environmentally influences a different process. Scientists Are Demanding Solutions for Dealing with the Post-Genomic, Proteomic, and Chemical Data Deluge: An Interview with Howard Asher, Director, Global Life Sciences Group, Sun Microsystems, CHI GenomeLink 30 http://www.healthtech.com/newsarticles/issue30_1.asp 

Biosemantics Group: http://www.biosemantics.org/  Addresses concept identification and disambiguation algorithms, meta-analysis and visualization techniques, and biological applications [interconnect genes and proteins, semi-automated annotations of protein functions.] Medical Informatics department of the Erasmus MC University Medical Center of Rotterdam and the Center for Human and Clinical Genetics of the Leiden University Medical Center

blog: Wikipedia http://en.wikipedia.org/wiki/Blog 

Related terms: blogging, blogosphere, microcontent, nanopublishing, weblog

blogging:  In the beginning - say 1994 - the phenomenon now called blogging was little more than the sometimes nutty, sometimes inspired writing of online diaries. These days, there are tech blogs and sex blogs and drug blogs and onanistic teenage blogs. But there are also news blogs and commentary blogs, sites packed with links and quips and ideas and arguments that only months ago were the near- monopoly of established news outlets. Poised between media, blogs can be as nuanced and well- sourced as traditional journalism, but they have the immediacy of talk radio.  Andrew Sullivan, "The blogging revolution" Wired Magazine, May 2002 http://www.wired.com/wired/archive/10.05/mustread.html?pg=2

CML Chemical Markup Language: Chemoinformatics

classification: Involves the development and use of a scheme for the systematic organization of knowledge. (Taylor p 576) Arlene Taylor identified three approaches to classification: enumerative, hierarchical, and analytico- synthetic. Enumerative classification attempts to assign headings for every subject and alphabetically enumerates them. Hierarchical classification uses a more philosophical approach based on the inherent organization of the subject being classified, and establishes logical rules for dividing topics into classes, divisions, and subdivisions. Analytico- synthetic classification assigns terms to individual concepts and provides rules for the local cataloger to use in constructing headings for composite subjects. Traditional classification systems in this country are basically enumerative, though many contain some elements of hierarchy and faceting. (Taylor pp 319- 321) Amanda Maple, "FACETED ACCESS: A REVIEW OF THE LITERATURE" Working Group on Faceted Access to Music, Music Library Association Annual Meeting, 10 February 1995 http://theme.music.indiana.edu/tech_s/mla/facacc.rev  

Indexing in the library and information management sense. See also classification, classifiers

classification: Can be done manually by human experts or automatically by software of many different types. However, the term as used in the microarray field has a more specific meaning: It always refers to automatic methods, and usually means automatic methods in which the classifier is built by adjusting parameters of a general model. These methods are sometimes called supervised computer- learning methods, in contrast to unsupervised methods, such as clustering. 

classifier:  A decision procedure that categorizes data into two or more predefined groups. Classifiers are also called predictors. Classifiers usually emit a score that can be interpreted as the likelihood that the data fall into a certain category, rather than just a binary yes/ no answer. In many applications it is necessary to convert this likelihood into a yes/ no answer, or perhaps a yes/ no/ maybe answer, typically through a simple thresholding scheme.  

collaborative filtering: Tools that leverage user preferences, patterns, and purchasing behavior to customize organization and navigation systems. [Peter Morville "Software for Information Architects" Argus Center for Information Architecture, 2000]  http://argus-acia.com/strange_connections/current_article.html 

Amazon's recommendations based on what other buyers of a specific title are buying is a familiar example of collaborative filtering.  

Google = about  21,600 July 19, 2002; about 49,300 Oct. 22, 2004 

collaborative metadata: A robust increase in both the amount and quality of metadata is integral to realizing the Semantic Web. The research reported on in this article addresses this topic of inquiry by investigating the most effective means for harnessing resource authors' and metadata experts' knowledge and skills for generating metadata. Jane Greenberg, W. Davenport Robertson, Semantic web construction: An Inquiry of Authors' Views on Collaborative Metadata Generation, International Conference DC 2002, Metadata for e-Communities, Oct. 13- 17, 2003, Florence Italy  http://www.bncf.net/dc2002/program/ft/paper5.pdf 

Google = about 116 Apr. 24, 2003; about 377 Oct. 22, 2004

communications standards: Pharmacogenomics

communities of practice:  Alliances

competitive intelligence: Business of biopharmaceuticals

computational linguistics:  Computational Linguistics, or Natural Language Processing (NLP), is not a new field. As early as 1946, attempts have been undertaken to use computers to process natural language. These attempts concentrated mainly on Machine Translation ... the limited performance of these systems made it clear that the underlying theoretical difficulties of the task had been grossly underestimated, and in the following years and decades much effort was spent on basic research in formal linguistics. Today, a number of Machine Translation systems are available commercially although there still is no system that produces fully automatic high- quality translations (and probably there will not be for some time). Human intervention in the form of pre- and/ or post-editing is still required in all cases.  Another application that has become commercially viable in the last years is the analysis and synthesis of spoken language, i.e. speech understanding and speech generation. ... An application that will become at least as important as those already mentioned is the creation, administration, and presentation of texts by computer. Even reliable access to written texts is a major bottleneck in science and commerce. The amount of textual information is enormous (and growing incessantly), and the traditional, word- based, information retrieval methods are getting increasingly insufficient as either precision or recall is always low (i.e. you get either a large number of irrelevant documents together with the relevant ones, or else you fail to get a large number of the relevant ones in the collection). Linguistically based retrieval methods, taking into account the meaning of sentences as encoded in the syntactic structure of natural language, promise to be a way out of this quandary. [Computational Linguistics FAQ, Univ. of Zurich, Switzerland, 2001] http://www.ifi.unizh.ch/groups/CL/CL_FAQ.html

Google = about  97,100 July 19, 2002, about 283,000 Oct. 22, 2004 

Linguistics, natural language, and computational linguistics Meta- Index, Stanford Univ. US  http://www-nlp.stanford.edu/links/linguistics.html

configurable: Many out-of-the-box solutions claim to be easy to "customize," when in fact they are referring to configuration options, not true customizability.  Manufacturers have distinct challenges, some which can be addressed out of the box, but many of which cannot. Manufacturers also need the ability to capitalize on changing dynamics in the marketplace before their competitors do. That's why it's imperative to understand the differences between configuration and customization and the value of selecting a CRM system that offers the flexibility to adapt and model specific manufacturing business processes.  Why you need to know the difference between Customizable and Configurable CRM, CDC Software podcast, Intelligent Enterprise,  2006  http://www.blogmanno.com/?q=node/33 

Customized implies programming and expense.  Configurable gives users the chance to modify options, without expensive programming. 

contextual data: While proteomic studies initially focused largely on expression and protein identification, progress in these areas drove the demand for more detailed types of proteomic data. Now researchers want information about where specific proteins are expressed, both in terms of tissues and localization within the cell. Information relating proteins to function require additional details of post- translational modification, and studies of protein interactions have moved beyond just looking at binary interactions to studies of protein complexes.

For both genomics and proteomics, this shift can be characterized as an interest in more contextual data. Enhanced insight into biological context is essential for obtaining a better understanding of how biology actually works, and thus there is now an emphasis to move from genomic and proteomic snapshots to time series data of expression. Such context is of particular value if biological studies are to be translated into medical advances, because of the importance of being able to predict the impact of potential treatments. The integration of genomic and proteomic data with medical conditions, treatment and outcomes becomes another critical type of contextual information. Christina Lingham, Beyond Genome: Thinking Globally, Cambridge Healthtech http://www.beyondgenome.com/download/editorial.pdf

controlled vocabularies: Robin Cover's XML Cover Pages is described as "a collection of references on matters of Subject Classification, Taxonomies, Ontologies, Indexing, Metadata, Metadata Registries, Controlled Vocabularies, Terminology, Thesauri, Business Semantics", 2003 http://xml.coverpages.org/classification.html

A limited number of words or phrases used in an indexing system (subject headings) or database, to ensure reliable, consistent retrieval. Long used to enhance retrievability and consistency, ontologies and/ or taxonomies certainly sound sexier than "controlled vocabularies" but continue to have a good deal in common. Taxonomies add hierarchies, while ontologies make information "machine- understandable" as well as machine- readable. 

Google = about 39,700 July 19, 2002; about 85,300 Oct. 22, 2004; about 496,000 Nov 18, 2009

Controlled vocabularies Standards, NISO ANSI/NISO Z39.19-2005 http://www.niso.org/kst/reports/standards/kfile_download?id%3Austring%3Aiso-8859-1=Z39-19-2005.pdf&pt=RkGKiXzW643YeUaYUqZ1BFwDhIG4-24RJbcZBWg8uE4vWdpZsJDs....

Broader terms: ontologytaxonomy Related terms: RDF, semantic web 

customizable: Quite labor intensive and can be very expensive.  Compare configurable.

DAML DARPA Agent Markup Language: The goal of the DAML effort is to develop a language and tools to facilitate the concept of the semantic web. http://www.daml.org/  Related term: OIL

DAML + OIL http://www.w3.org/TR/daml+oil-walkthru/

data cleaning, data integration: Algorithms & data analysis

Google = "data cleaning" about  12,200; about 22,500 July 3, 2003
"data integration" about 175,000 July 19, 2002; about 306, 000 July 3, 2003; about 817,000 Mar. 22, 2004; about 2,940,000 June 22, 2007

data conversion:   Originally data conversion was primarily a matter of moving text and database files from one medium to another, one hardware platform to another, one operating system environment to another. But as text and database representations became more sophisticated it became apparent that application interoperability was going to be the overriding issue of concern. Company History, Data Conversion Lab  http://www.dclab.com/company_history.asp 

Glossary, DCL Labs http://www.dclab.com/glossary.asp 30+ definitions

data management methods: Algorithms & data analysis  has automated methods, methods in this glossary generally combine human and automated methods.

data management vocabulary: Ontologies & taxonomies

data mart, data mining, data pipelining, data reduction methods, data warehouse: Algorithms & data analysis

data visualization:  The classical definition of visualization is as follows: the formation of mental visual images, the act or process of interpreting in visual terms or of putting into visual form. A new definition is a tool or method for interpreting image data fed into a computer and for generating images from complex multi-dimensional data sets (1987). Definitions and Rationale for Visualisation, D. Scott Brown, SIGGRAPH, 1999 http://www.siggraph.org/education/materials/HyperVis/visgoals/visgoal2.htm   includes information on data visualization.

Related term: information visualization; Broader term: visualization

databases: Bioinformatics; Databases & software directory

deep web:  Most of the Web's information is buried far down on dynamically generated sites, and standard search engines never find it.  The deep Web is qualitatively different from the surface Web. Deep Web sources store their content in searchable databases that only produce results dynamically in response to a direct request. But a direct query is a "one at a time" laborious way to search.  [Michael K. Bergman "The deep web: surfacing hidden value" White Paper, BrightPlanet, 2000-2002] http://www.brightplanet.com/deepcontent/tutorials/DeepWeb/index.asp  Another version at  http://www.press.umich.edu/jep/07-01/bergman.html

Google = about 10,200 Aug. 17, 2002; about 42,900 Oct. 22, 2004

Related term:  invisible web

description logic: Has existed as a field for a few decades yet only somewhat recently has appeared to transform from an area of academic interest to an area of broad interest. This paper provides a brief historical perspective of description logic developments that have impacted DL usability to include communities beyond universities and research labs.  Deborah L. McGuinness. ``Description Logics Emerge from Ivory Towers''. Stanford Knowledge Systems Laboratory Technical Report KSL-01-08 2001. In the Proceedings of the International Workshop on Description Logics. Stanford, CA, August 2001.http://www.ksl.stanford.edu/people/dlm/papers/dls-emerge-abstract.html

The main effort of the research in knowledge representation is providing theories and systems for expressing structured knowledge and for accessing and reasoning with it in a principled way. Description Logics are considered the most important knowledge representation formalism unifying and giving a logical basis to the well known traditions of Frame- based systems, Semantic Networks and KL- ONE-like languages, Object- Oriented representations, Semantic data models, and Type systems. [Description Logic Knowledge Representation] http://dl.kr.org/

Description Logics Home Page, Patrick Lambrix, Linkoping Univ. Sweden http://www.ida.liu.se/labs/iislab/people/patla/DL/index.html

digital libraries: International digital libraries research is intended to contribute to the fundamental knowledge required to create information systems that can operate in multiple languages, formats, media, and social and organizational contexts. International collaborative research can bring complementary approaches, resources and perspectives to bear on common needs and information technology research challenges. International digital libraries applications testbeds are intended to build operational prototypes for globally distributed, internet- based resources, and to implement these in a variety of applications contexts. The testbeds are expected to advance technologies across the digital libraries lifecycle, focus collective work on organizing domain- specific content, and engage researchers, scholars, students and teachers in enhancing research and knowledge resources in a variety of subject domains. [National Science Foundation, International Digital Libraries Collaborative Research & Applications Testbeds program solicitation, 2002] http://www.nsf.gov/pubs/2002/nsf02085/nsf02085.html

Google = about 197,000 July 19, 2002; about 1,480,000 Oct. 22, 2004 

Directed Acyclic Graph DAG: Ontologies & Taxonomies

disambiguate: Make less ambiguous, clarify, elucidate. 

Google = about  33,100 July 19, 2002; about 65,300 Oct. 22, 2004, about 340,000 Nov 18, 2009

domain expertise: Wikipedia  http://en.wikipedia.org/wiki/Domain_expert   http://en.wikipedia.org/wiki/Domain_knowledge 

Google = about 25,500 Dec. 18, 2002; about 68,500 Oct. 22, 2004; about 785,000 June 22, 2007; about 1, 120,000 Nov 18, 2009

drug discovery informatics:

Dublin Core Metadata Initiative: An open forum engaged in the development of interoperable online metadata standards that support a broad range of purposes and business models. The original workshop for the Initiative was held in Dublin, Ohio [OCLC] in 1995. http://dublincore.org/

evolvability:   Tim Berners Lee defines    http://www.w3.org/Talks/1998/0415-Evolvability/slide3-1.htm 
Wikipedia http://en.wikipedia.org/wiki/Evolvability 

Google = evolvability  about 8,210  July 19, 2002; about 21,400 Oct. 22, 2004; about 51,000 Nov 18, 2009

See also under interoperability

facet, faceted: Ontologies & taxonomies   

fractal nature of the web: http://www.w3.org/DesignIssues/Fractal.html Tim Berners- Lee, Commentary on architecture, Fractal nature of the web, first draft  

Society has to be fractal - people want to be involved on a lot of different levels. The need for things that are local and special will create enclaves. And those will give us the diversity of ideas we need to survive. Tim Berners Lee, in "The father of the web", Evan Schwartz, Wired Mar. 1997 http://www.wired.com/wired/archive/5.03/ff_father_pr.html

GIS Geographic Information Systems: Maps have traditionally been used to explore the Earth and to exploit its resources. GIS technology is an expansion of cartographic science. Geographic information systems (GIS) technology can be used for scientific investigations, resource management, and development planning. It has enhanced the efficiency and analytic power of traditional mapping. GIS technology is becoming an essential tool in the effort to understand the process of global change.  [Is GIS in your future?  Boston Chapter, Special Libraries Association meeting, Mar. 12. 2002] http://www.sla.org/chapter/cbos/meetings/fy02/sci_tech.htm

Good Informatics Practices Guidance Document (GIP): A newly drafted comprehensive body of information of regulatory requirements in the form of existing (GLP, GMP, GCP and Part 11) and currently used standards compiled in one reference guide for an IT system of a life science or healthcare environment. http://www.lsit.org/initiatives/gip.php

GUI Graphical User Interface: Computers & computing

granularity: Wikipedia http://en.wikipedia.org/wiki/Granularity 

<jargon, parallel> The size of the units of code under consideration in some context The term generally refers to the level of detail at which code is considered, e.g. "You can specify the granularity for this profiling tool". The most common computing use is in parallelism where "fine grain parallelism" means individual tasks are relatively small in terms of code size and execution time, "coarse grain" is the opposite. You talk about the "granularity" of the parallelism. The smaller the granularity, the greater the potential for parallelism and hence speed- up but the greater the overheads of synchronisation and communication. FOLDOC 1997  http://www.swif.uniba.it/lei/foldop/foldoc.cgi?granularity 

The extent to which a system contains separate components (like granules). The more components in a system - or the greater the granularity - the more flexible it is. [Webopedia] http://www.webopedia.com/TERM/g/granularity.html

Choosing different levels of granularity, i.e., imposing different quality criteria on models built by homology from representative, experimentally determined [protein] structures, leads to different numbers of family representatives as targets. NIGMS Structural Genomics Targets Workshop February 11-12, 1999  http://www.nigms.nih.gov/news/meetings/structural_genomics_targets.html

Level of detail seems to be the essence of granularity.

Google = about  250,000 July 19, 2002; about 454,000 Oct. 22, 2004; about 2,170,000 Nov 18, 2009

health information data: Includes Clinical data captured during the process of diagnosis and treatment. Epidemiological databases , that aggregate data about a population. Demographic data used to identify and communicate with and about an individual. Financial data derived from the care process or aggregated for an organization or population. Research data gathered as a part of care and used for research or gathered for specific research purposes in clinical trials. Reference data that interacts with the care of the individual or with the healthcare deliver systems, like a formulary, protocol, care plan, clinical alerts or reminders, etc. Coded data that is translated into a standard nomenclature or classification so that it may be aggregated, analyzed, and compared.  [Health Information Management; Professional definitions, Committees on Professional Development, American Health Information Management Association, 1999, 2000] http://www.ahima.org/infocenter/definitions/HIMprofessionaldefinition.htm

health information management:  Health information management improves the quality of healthcare by insuring that the best information is available to make any healthcare decision. Health information management professionals manage healthcare data and information resources. The profession encompasses services in planning, collecting, aggregating, analyzing, and disseminating individual patient and aggregate clinical data. It serves the healthcare industry including: patient care organizations, payers, research and policy agencies, and other healthcare- related industries.  [Health Information Management; Professional definitions, Committees on Professional Development, American Health Information Management Association, 1999, 2000] http://www.ahima.org/infocenter/definitions/HIMprofessionaldefinition.htm

Google = about 56,700  Jan. 2, 2003; about 145,000 Oct. 22, 2004; about 980,000 Nov 18, 2009

heterogeneous data:

informatics: The study of the application of computer and statistical techniques to the management of information. In genome projects, informatics includes the development of methods to search databases quickly, to analyse DNA sequence information, and to predict protein sequence and structure from DNA sequence data. ORD Office of Rare Diseases, NIH glossary http://ord.aspensys.com/asp/resources/glossary_a-e.asp#A 

Narrower terms: bioinformatics; cheminformatics; Computers & computing  clinical informatics, molecular informatics,  Biomaterials matinformatics research informatics; Drug discovery & development life sciences informatics, Intellectual property & legal;  patinformatics; Molecular imaging image informatics;  pharmacoinformatics, pharmainformatics Proteomics protein informatics 

information -- how much?  How Much Information 2003, School of Information Science and Systems, Univ. of California, Berkeley, 2003 http://www.sims.berkeley.edu/research/projects/how-much-info-2003/index.htm 

information architecture: "Involves the design of organization, labeling, navigation, and searching systems to help people find and manage information more successfully."  Lou Rosenfeld, Peter Morville interview quoted in Mark Hurst "About Information Architecture, Apr. 3, 2000] http://www.goodexperience.com/columns/040300infoarch.html

Google = about 132,000 July 19, 2002; about 258,000 July 3, 2003; about 622,000 Oct. 22, 2004; about 5,760,000 Nov 18, 2009

Information architecture glossary, Kat Hagedorn, Argus Associates, 2000, 60 + definitions http://argus-acia.com/white_papers/iaglossary.html

information ecology: Wikipedia http://en.wikipedia.org/wiki/Information_ecology 

The Information Ecology group (formerly the Physical Language Workshop) explores ways to connect our physical environments with information resources. Through the use of low-cost, ubiquitous technologies, we are creating seamless and pervasive ways to interact with our information—and with each other. We focus on projects that harness the ecology of consumer electronics and sensor devices—present and future—to more smoothly mediate the boundaries between the physical and information worlds we inhabit.  MIT Media Lab Design Ecology/Information Ecology  2009   http://eco.media.mit.edu/ 

Google = about 11,100 Oct. 22, 2004; about 70,200 Nov 18, 2009

information extraction: Automated ways of extracting unstructured or partially structured information from machine readable files. Compare with information retrieval.

Google = about 43,100 July 19, 2002; about 590,000 Nov 18, 2009

Related terms: natural language processing, term extraction

information harvesting: See under Knowledge Discovery in Databases KDD

Google = about 871 July 19, 2002; about 1,230 July 3, 2003; about 1,730 Oct. 22, 2004; about 1,140,000 June 22, 2007

information integration: Our research group is developing intelligent techniques to enable rapid and efficient information integration. The focus of our research has been on the technologies required for constructing distributed, integrated applications from online sources. This research includes: Information Extraction: Machine learning techniques for extracting information from online sources; Source Modeling: Constructing a semantic model of wrapped sources so that they can be automatically integrated with other sources; Record Linkage: Learning how to align records across sources; Data Integration: Generating plans to automatically integrate data across sources; Plan Execution: Representing, defining, and efficiently executing integration plans in the Web environment; Constraint-based Integration  Interactive constraint-based planning and integration for the Web environment. Information Integration Research Group, Intelligent Systems Division, Information Sciences Institute (ISI), University of Southern California http://www.isi.edu/integration/

Google = about 4,430,000 July 3, 2003; about 1,080,000 June 22, 2007; about 1, 160,000 Nov 18,m 2009

information management:  Information services of various kinds are fundamental to the discovery, development and use of medicines. Within the pharmaceutical industry, often regarded as the epitome of the 'information intensive' industry, research information units provide both external and internal information provision and management to discovery and development programmes, while medical information units provide in- depth information on the company's products to external doctors, pharmacists, etc., and commercial information units handle information on competitors, marketing data, etc. Additionally, information personnel are involved in activities such as records management and archiving, regulatory affairs, data administration, IT support, and many more. Within the NHS [National Health Service, UK] , Drug Information Pharmacists provide information services on effective use of medicines to all healthcare professions, and are also involved in databases compilation, records management, current awareness etc. The move towards evidence- based medicine, with consequent need for evaluation and presentation of information, is of obvious importance to this group. Other sectors with a heavy reliance on the handling pharmaceutical information and knowledge include publishing, database production, software services, and consultancy of varied kinds.  [MSc in Pharmaceutical Information Management, City Univ. London, UK, Dept of Information Science,  Introduction, 2002 ]http://www.soi.city.ac.uk/organisation/is/teaching/pim/

Narrower term: health information management

Google = about 1,470,000 Jan. 2, 2003; about 4,200,000 Oct. 22, 2004; about 19,300,000 Nov 18,m 2009

information overload: Biomedicine is in the middle of revolutionary advances. Genome projects, microassay methods like DNA chips, advanced radiation sources for crystallography and other instrumentation, as well as new imaging methods, have exceeded all expectations, and in the process have generated a dramatic information overload that requires new resources for handling, analyzing and interpreting data. Delays in the exploitation of the discoveries will be costly in terms of health benefits for individuals and will adversely affect the economic edge of the country. [Opportunities in Molecular Biomedicine in the Era of Teraflop Computing: March 3 & 4, 1999, Rockville, MD, NIH Resource for Macromolecular Modeling and Bioinformatics Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana- Champaign] http://www.ks.uiuc.edu/Publications/Reports/teraflop/node4.html

Many of today's problems stem from information overload and there is a desperate need for innovative software that can wade through the morass of information and present visually what we know.  The development of such tools will depend critically on further interactions between the computer scientists and the biologists so that the tools address the right questions, but are designed in a flexible and computationally efficient manner.  It is my hope that we will see these solutions published in the biological or computational literature.  Richard J. Roberts, The early days of bioinformatics publishing, Bioinformatics 16 (1): 2-4, 2000

"Information overload" is not an overstatement these days. One of the biggest challenges is to deal with the tidal wave of data, filter out extraneous noise and poor quality data, and assimilate and integrate information on a previously unimagined scale

Wikipedia http://en.wikipedia.org/wiki/Information_overload 

Google = about  118,000 July 19, 2002; about 249,000 Oct. 22, 2004; about 1,480,000 Nov 18, 2009

Where's my stuff? Ways to help with information overload, Mary Chitty, SLA presentation June 10, 2002, Los Angeles CA

information retrieval:  Wikipedia http://en.wikipedia.org/wiki/Information_retrieval 

information theory: Algorithms & data analysis

information visualization: The direct visualization of a representation of selected features or elements of complex multi- dimensional data. Data that can be used to create a visualization includes text, image data, sound, voice, video - and of course, all kinds of numerical data. Our visual analysis systems also provide the tools to interact with the data that has been visualized so that users can explore, discover and learn. Users do not look at static images, but can subset the data, run queries, do time sequence studies and create categories and correlations of data type. [Pacific Northwest National Lab, About Visualization at PNNL, 1999] http://www.pnl.gov/infoviz/

Wikipedia http://en.wikipedia.org/wiki/Information_visualization 

Google = about 28,100 July 19, 2002; about 94,200 Oct. 22, 2004; about 1,330, 000 Nov 18, 2009

Information visualization resources on the web, 2002 http://graphics.stanford.edu/courses/cs348c-96-fall/resources.html

Related term: data visualization; Broader term: visualization

informational repositories: A new strategy that allows universities to apply serious, systematic leverage to accelerate changes taking place in scholarship and scholarly communication, both moving beyond their historic relatively passive role of supporting established publishers in modernizing scholarly publishing through the licensing of digital content, and also scaling up beyond ad-hoc alliances, partnerships, and support arrangements with a few select faculty pioneers exploring more transformative new uses of the digital medium. Clifford Lynch, Institutional Repositories: Essential Infrastructure for Scholarship in the Digital Age, ARL Bimonthly Report 226, Feb. 2003 http://www.arl.org/newsltr/226/ir.html

DSpace, MIT http://www.dspace.org/

integration: Bioinformatics

interoperability: Ontologies & taxonomies

invisible web:  For this study, we have avoided the term "invisible Web" because it is inaccurate. The only thing "invisible" about searchable databases is that they are not indexable nor able to be queried by conventional search engines. http://www.brightplanet.com/deepcontent/tutorials/DeepWeb/index.asp

Those parts of the web which are inaccessible to current search engines. A straightforward example was PubMed/ Medline (until Google started indexing it.) You still can't usually access proprietary (fee- based) databases such as Thomson Dialog or Lexis- Nexis. except directly. Until fairly recently PDF documents and PowerPoint slides were inaccessible to search engines.   

Google = about 17,300 July 19, 2002; about 278,000 Oct. 22, 2004; about 802,000 Nov 18, 2009

Invisible or Deep Web:
What it is, How to find it, and Its inherent ambiguity  http://www.lib.berkeley.edu/TeachingLib/Guides/Internet/InvisibleWeb.html

Related terms: deep web, semantic web

just in time information: 90,200 websites were found with this phrase by Google on May 23, 2007. An increasing need as we are deluged with information and data -- and still need time to reflect, discuss and think about what all these mean.

Google = about 2,900 March 14, 2002, about 3,400 July 19, 2002; about 51,600 Feb. 21, 2006; about 88,400 May 7, 2007; about 781,000 Nov 18, 2009

Just-In-Time Information Retrieval. Bradley J. Rhodes. Ph.D. Dissertation, MIT Media Lab, May 2000. Just in time retrieval agents Bradley J. Rhodes http://www.research.ibm.com/journal/sj/393/part2/rhodes.html

Related terms: information overload, remembrance agents; Bioinformatics modularity

Knowledge Discovery in Databases (KDD): Algorithms & data analysis

knowledge integration: Wikipedia http://en.wikipedia.org/wiki/Knowledge_integration 

Related terms: ontologies, semantics

knowledge management:  An organization's collective knowledge - and the ability to access it - comprises a key corporate asset. Smart organizations know that to maintain competitive advantage, they need to manage their data, information, and knowledge effectively and systematically. Knowledge management involves much more than compiling data and retrieving information. It should be seen as an overarching concept that combines a management philosophy with data warehousing, workflow strategies, database management, and knowledge distribution in a network computing environment. [William A. Woods "Knowledge Management Needs Effective Search Technology" Sun Journal] http://www.sun.com/dot-com/sunjournal/V2N1/03_feat2a.html

Wikipedia http://en.wikipedia.org/wiki/Knowledge_management 

Google = about 826,000 July 19, 2002; about 3,520,000 Oct. 22, 2004; about 11,000,000 Nov 18, 2009

Knowledge Management, FDA, 2004 http://www.fda.gov/cdrh/strategic/km.html 

Virtual Library: Knowledge Management, May 2000   http://www.brint.com/km/ Definition, articles, white papers, interviews, business and technology library, periodicals and publications, “out of box thinking”, “movers and shakers”, “think tank”, calendar of events, emerging topics. 
Knowledge Management definitions,
Charlie Matthews, VisualInterconnections, 2002 http://www.visualinterconnections.com/CEM/definitions.htm
KM Glossary
, GOTCHA, Univ. of California Berkeley, 1999  About 50 terms. http://sims.berkeley.edu/courses/is213/s99/Projects/P9/web_site/glossary.htm 

Related terms: ontologies, paraphrase problem, taxonomies

knowledge risk: Business of biopharmaceuticals

laboratory informatics: A relatively new field that aims to expedite the exchange of laboratory data via electronic data exchange. Laboratory informatics specialists design standards and systems to support the acquisition, retrieval and communication of test results and other laboratory data.  Information systems are as critical to public health laboratories as instrumentation and reagents. Association of Public Health Laboratories, 2008  http://www.aphl.org/aphlprograms/informatics/Pages/defofinformatics.aspx 

Related term: Drug discovery & development  LIMS

Google = about 1,250 Dec. 31, 2002; about 3,000 Oct. 22, 2004; about 31,900 Nov 18, 2009

lexical semantics: http://en.wikipedia.org/wiki/Lexical_semantics 

lexicon: A machine- readable dictionary that may contain a good deal of additional information about the properties of the words, notated in a form that parsers can utilize. [Bob Futrelle, A brief introduction to NLP, BIONLP.org, , Computer Science, Northeastern Univ., US, 2002]  http://www.ccs.neu.edu/home/futrelle/bionlp/intro.html

A linguistics term (words and their definitions), an artificial intelligence term.  Sometimes a synonym for glossary or dictionary.

Google = about 768,000 July 19, 2002; about 1,960,000 Oct. 22, 2004

life sciences informatics: Informatics are essential at every step of genomics- based drug discovery and development. The commercial landscape of life sciences information technology has changed dramatically in the last few years. Bioinformatics, in particular, has gone through a dramatic boom/bust. While IT companies are looking to the drug discovery and development arena as a new market opportunity, pharmaceutical companies  are faced with rising pressure to reduce (or at least control) costs, and have a growing need for new informatics tools to help manage the influx of data from genomics, and turn that data into tomorrow's drugs. Key IT tools, such as high- performance computing, Web services, and grids, are being used to improve the speed and efficiency of drug discovery and development. True breakthroughs are still lacking, particularly in key areas such as gene prediction, data mining, protein structure modeling and prediction, and modeling of complex biological systems. However, most experts agree that IT and bioinformatics are essential to reaching the improved productivity the pharmaceutical industry craves.  

linked data:  Linked Data is about using the Web to connect related data that wasn’t  previously linked, or using the Web to lower the barriers to linking data currently linked using other methods.   http://linkeddata.org/

Linked data glossary http://linkeddata.org/glossary  

machine-readable: See under metadata

Google= about 303,000 July 19, 2002; about 535,000 Oct. 22, 2004

machine-understandable: See under metadata

Google= about 3,730 July 19, 2002; about 8,950 July 14, 2004

markup languages: Computers & computing

Google = about 639,000 Aug. 9, 2002; about 170,000 Oct. 22, 2004

mash-up http://en.wikipedia.org/wiki/Mashup_(web_application_hybrid

Google = about 22,100,000 Oct. 27, 2006; about 20,400,000 Nov 18, 2009

Medbiquitous Consortium: Technology standards based on XML and web services.  http://www.medbiq.org/index.html 

medical informatics: The field of information science concerned with the analysis and dissemination of medical data through the application of computers to various aspects of health care and medicine. MeSH, 1987

Medical informatics has to do with all aspects of understanding and promoting the effective organization, analysis, management, and use of information in health care. While the field of medical informatics shares the general scope of these interests with some other health care specialties and disciplines, medical informatics has developed its own areas of emphasis and approaches that have set it apart from other disciplines and specialties. For one, a common thread through medical informatics has been the emphasis on technology as an integral tool to help organize, analyze, manage, and use information. In addition, as professionals involved at the intersection of information and technology and health care, those in medical informatics have historically tended to be engaged in the research, development, and evaluation side of things, and in studying and teaching the theoretical and methodological underpinnings of data applications in health care. However, today medical informatics also counts among its profession many whose activities are focused on dimensions that include the administration and everyday collection and use of information in health care. What is Medical Informatics? History of Medical Informatics, AMIA American MEdical Informatics Association http://www.amia.org/history/what.html 

Google = about 163,000 July 19, 2002; about 479,000 Oct. 22, 2004, about 696, 000 Oct. 3, 2005; about 1,690,000 Nov 18, 2009

metadata: Could elevate the status of the web from machine- readable to something we might call machine- understandable. Metadata is "data about data" or specifically in our current context "data describing web resources." The distinction between "data" and "metadata" is not an absolute one; it is a distinction created primarily by a particular application ("one application's metadata is another application's data"). [W3C, "Introduction to RDF Metadata" 1997] http://www.w3.org/TR/NOTE-rdf-simple-intro

Metadata is machine understandable information for the web. The W3C Metadata Activity addressed the combined needs of several groups for a common framework to express assertions about information on the Web, and was superceded by the W3C Semantic Web Activity.  [W3C, Metadata and Resource Description, W3C Technology and Society Domain, 2001]http://www.w3.org/Metadata/

Information about data that enables intelligent, efficient access and management of data. … metadata is always less than the data. [Robyne M. Sumpter “Whitepaper on Data Management” Lawrence Livermore National Laboratory, February 10, 1994] http://www.llnl.gov/liv_comp/metadata/papers/whitepaper-draft.html  

Google = about  1,640,000 July 19, 2002; about 4,850,000 Oct. 22, 2004; about 25,600,000 May 9, 2005;  about 62,700,000 May 7, 2007

Narrower terms: Dublin Core Metadata Initiative,  faceted metadata Related terms: interoperability, RDF, semantic web 

micro-theories: An ontology about a specific domain, that fits within, and for the most part is consistent with, an ontology with a broader scope. For example, structural biology fits within the larger context of biology. Structural biology will have its own terminology and specific algorithms that apply within the specific domain, but may not be useful or identical to, for example, the genome community. [Lawrence Berkeley Lab "Advanced Computational Structural Genomics" Glossary]

Google = about 953 July 19, 2002; about 8,670 Oct. 22, 2004

modularity: Bioinformatics

molecular informatics:  Molecular Informatics presents methodological innovations that will lead to a deeper understanding of ligand-receptor interactions, macromolecular complexes, molecular networks, design concepts and processes that demonstrate how ideas and design concepts lead to molecules with a desired structure or function, preferably including experimental validation.

The journal's scope includes but is not limited to the fields of drug discovery and chemical biology, protein and nucleic acid engineering and design, the design of nanomolecular structures, strategies for modeling of macromolecular assemblies, molecular networks and systems, pharmaco- and chemogenomics, computer-assisted screening strategies, as well as novel technologies for the de novo design of biologically active molecules.  Molecular Informatics, Wiley 2010 forward, was QSAR & Combinatorial Science http://www.wiley-vch.de/publish/en/journals/alphabeticIndex/7777/?jURL=http://www.wiley-vch.de:80/vch/journals/2022/molinf/index.html 

Google = about  2,580 July 19, 2002; about 4,410 Oct. 22, 2004; about 342,000 Nov 18, 2009

molecular information theory: Algorithms & data analysis

nanopublishing: The word nanopublishing was coined by Jeff Jarvis, creative director of the US company Advance Publications Inc. Jarvis first used the term after being shown Gawker, a New York media gossip weblog launched by Nick Denton in December 2002. MacMillan Word of the Week Archive 2005  http://www.macmillandictionaries.com/wordoftheweek/archive/050207-nanopublishing.htm 

START Natural Language Question Answering System, InfoLab Group, Computer Science and Artificial Intelligence Lab, MIT  http://www.ai.mit.edu/projects/infolab/start-system.html 

OIL Ontology Inference Layer: A proposal for a web- based representation and inference layer for ontologies, which combines the widely used modelling primitives from frame- based languages with the formal semantics and reasoning services provided by description logics. It is compatible with RDF Schema (RDFS), and includes a precise semantics for describing term meanings (and thus also for describing implied information). http://www.ontoknowledge.org/oil/

object based ontologies:  Ontologies & taxonomies

Office of the National Coordinator for Health Information Technology (ONC): Provides leadership for the development and nationwide implementation of an interoperable health information technology infrastructure to improve the quality and efficiency of health care and the ability of consumers to manage their care and safety. http://www.hhs.gov/healthit/

ontology, ontologies:  The word "ontology" seems to generate a lot of controversy in discussions about AI  [artificial intelligence]. It has a long history in philosophy, in which it refers to the subject of existence. ... In the context of knowledge sharing, I use the term ontology to mean a specification of a conceptualization. That is, an ontology is a description (like a formal specification of a program) of the concepts and relationships that can exist for an agent or a community of agents. This definition is consistent with the usage of ontology as set- of- concept- definitions, but more general. And it is certainly a different sense of the word than its use in philosophy. What is important is what an ontology is for. My colleagues and I have been designing ontologies for the purpose of enabling knowledge sharing and reuse. In that context, an ontology is a specification used for making ontological commitments. ... Notes: 1) Ontologies are often equated with taxonomic hierarchies of classes, but class definitions, and the subsumption relation, but ontologies need not be limited to these forms. Tom Gruber, Stanford Univ. "What is an ontology?", 2001 http://www-ksl.stanford.edu/kst/what-is-an-ontology.html     more in Ontologies * Taxonomies

Narrower terms: bottom- up ontologies, biomedical ontologies, common ontology, descriptive ontology, domain ontology, dynamic ontology, heavyweight ontologies, lightweight ontologies, logic based ontologies, micro- theories, middle ontologies, mixed ontologies,  taxonomies, natural language ontologies, navigational ontology, object based ontologies, orthogonal ontologies, pure ontologies, reusable ontologies, shared ontologies, simple ontologies, structured ontology, top- down ontology, upper ontologies; Functional genomics Gene OntologyTM GO;  

Related terms: interoperability, metadata, OIL Ontology Inference Layer, ontological commitment, ontology annotation tools, ontology editors, ontology evolution, ontology interoperability, RDF, semantic web, web ontology language; Microarrays Ontology Working Group

organizational informatics: A field which studies the development and use of computerized information systems and communication systems in organizations. It includes social studies of their conception, design, effective implementation within organizations, maintenance, use, organizational value, conditions that foster risks of failures, and their effects for people and an organization's clients. It is an intellectually rich and practical research area. "Social Informatics" Indiana Univ, School of Library & Information Science  http://www.slis.indiana.edu/SI/oi1.html

Narrower term: social informatics

Google = about 153 July 19, 2002; about 211 Oct. 22, 2004

Related term:  knowledge management

pattern, pattern language:  Patterns, discussion FAQ http://g.oswego.edu/dl/pd-FAQ/pd-FAQ.html 

portal: An entry or starting point on the web, with a mixture of content and services, usually capable of personalization.

Narrower term: web portal

precision: Percentage of unrelated material excluded by a specific query or search statement. 

Related terms: Genetic testing analytical specificity, clinical specificity Compare recall  

query contraction: Needed when a search engine retrieves thousands of citations. May consist of additional (Boolean AND terms) or different (Boolean OR).

Google = about  26 July 19, 2002; about 130 Oct. 22, 2004

query expansion: Adding new and/ or different terms to a search statement (particularly when a search engine or database retrieve no hits). Often uses Boolean OR. 

Google = about 7,500 July 19, 2002; about 21,300 Oct. 22, 2004

Related terms: ontologies, taxonomies

RDF Resource Description Framework: Integrates a variety of web- based metadata activities including sitemaps, content ratings, stream channel definitions, search engine data collection (web crawling), digital library collections, and distributed authoring, using XML as an interchange syntax. The RDF specifications provide a lightweight ontology system to support the exchange of knowledge on the Web. [W3C, Semantic Web Activity: Resource Description Framework (RDF) Mar. 2001] http://www.w3.org/RDF/

Related term: knowledge management

RSS [Really Simply Syndication] feeds: A Web content syndication format based on XML. Cathleen Moore, Search engines target weblogs, InfoWorld, Mar. 17, 2003 http://www.infoworld.com/article/03/03/17/HNblogs_1.html

Newsreaders http://directory.google.com/Top/Reference/Libraries/Library_and_Informat...e

RSS 2.0 specifications, Dave Winer http://blogs.law.harvard.edu/tech/rss/

recall: The percentage of applicable material retrieved by a specific query or search statement. 

Compare precision. Related term: Genetic testing  sensitivity 

regulated information systems: Drug approvals

relevance: Percentage of truly related material retrieved by a specific query or search statement. 

Related terms: precision Genetic testing & diagnostics analytical specificity, clinical specificity. Compare recall 

remembrance agents: A set of applications that watch over a user's shoulder and suggest information relevant to the current situation. While query- based memory aids help with direct  recall, remembrance agents are an augmented associative memory. [Bradley Rhodes, Remembrance Agents Because serendipity is too important to be left to chance..., 2001]  http://rhodes.www.media.mit.edu/people/rhodes/RA/

Google = about 673 July 19, 2002; about 549 Oct. 22, 2004

Related terms: collaborative filtering, just in time information

research informatics: Research

resourceome: -Omes & -Omics

Rosetta: A systems- level design language developed to address requirements specification for systems- on- chip designs. Rosetta specifically addresses problems associated with heterogeneity and complexity in current systems. Specifically, Rosetta allows designers to develop and integrate specifications written in multiple semantic models to provide language and semantic support for concurrent engineering of electronic systems.  Accellera Rosetta Standards Committee Homepage, EDA Industry Working Groups, 2002  http://www.eda.org/slds-rosetta/

SOAP Simple Object Access Protocol:  A lightweight protocol for exchange of information in a decentralized, distributed environment. [SOAP, W3C 1.1, work in progress] http://www.w3.org/TR/SOAP/

semantic: Ontologies & taxonomies

social informatics:  Social Informatics (SI) refers to the body of research and study that examines social aspects of computerization, including the roles of information technology in social and organizational change, the uses of information technologies in social contexts, and the ways that the social organization of information technologies is influenced by social forces and social practices http://rkcsi.indiana.edu/ 

The term "Social Informatics" emerged from a series of lively conversations in February and March 1996 among scholars with an interest in advancing critical scholarship about the social aspects of computerization, including Phil Agre, Jacques Berleur, Brenda Dervin, Andrew Dillon, Rob Kling, Mark Poster, Karen Ruhleder, Ben Shneiderman, Leigh Star and Barry Wellman. As the conversation developed, it became clear that labels that could energize scholars in one sub- community could readily turn off participants in other communities. Various participants preferred different labels; a sufficient consensus emerged around "Social Informatics" that it can serve as a working label.  ["Conceptions of social informatics" Indiana Univ., School of Library and Information Science, 2002] http://www.slis.indiana.edu/SI/concepts.html

A serviceable working conception of "social informatics" is that it identifies a body of research that examines the social aspects of computerization. A more formal definition is "the interdisciplinary study of the design, uses and consequences of information technologies that takes into account their interaction with institutional and cultural contexts." ... Social informatics has been a subject of systematic analytical and critical research for the last 25 years. Unfortunately, social informatics studies are scattered in the journals of several different fields, including computer science, information systems, information science and some social sciences. Each of these fields uses somewhat different nomenclature. This diversity of communication outlets and specialized terminologies makes it hard for many non- specialists (and even specialists) to locate important studies. [Rob Kling, What is social informatics and why does  it matter? D-Lib 5(1): Jan. 1999] http://www.dlib.org/dlib/january99/kling/01kling.html 

Social informatics HomePage http://www.slis.indiana.edu/SI/

Red Rock Eater News Service, Phil Agre, UCLA, US  http://polaris.gseis.ucla.edu/pagre/rre.html 

structure:  In a biological or anatomical context, the term structure is associated with two distinct concepts (meanings): 1. a material object generated as a result of coordinated gene expression, which necessarily consists of parts (e.g., hemoglobin molecule, cell, heart, human body); and 2. the manner of organization or interrelation of the parts that constitute a structure specified by the first definition (i.e., the structure of a structure). Both definitions emphasize the critical need for declaring the principles according to which units of organization can be defined in order to be able to state what is 'whole' and what is 'part'. Specifying the manner in which parts interrelate must satisfy two requirements: 1. to determine the kinds of parts of which various structures may be constituted; and 2. to state the manner of spatial organization of parts by describing their boundaries, continuities and attachments, as well as their location, orientation and spatial adjacencies in terms of qualitative coordinates (in addition to the quantitative geometric coordinates, which are embedded in the Visible Human data sets). [Cornelius Rosse, et. al., Visible Human, Know Thyself: The Digital Anatomist Dynamic Structural Abstraction, National Library of Medicine, US] http://www.nlm.nih.gov/research/visible/vhpconf2000/AUTHORS/ROSSE/TEXTINDX.HTM

Related terms: Cell biology, Expression Compare unstructured.

subsumption: http://ai.eecs.umich.edu/cogarch0/subsump/ 

Google = about 30,800 July 19, 2002; about 80,500 Oct. 22, 2004; about 159,000 May 2, 2005

syntactic, syntax: Ontologies & taxonomies

taxonomies, taxonomy:  Frustrations with search engines and information retrieval (and information overload) have led to increased interest in specialized taxonomies. A form of controlled vocabulary, with hierarchical relationships (broader terms, narrower terms) which provide additional suggestions for browsing, as do lateral relationships (related terms) and preferred terms. While there is theoretical interest in natural language processing, a very small percentage of web search engine queries actually use natural language processing successfully. more in Ontologies * Taxonomies

Narrower terms: bottom-up taxonomies, controlled vocabularies, descriptive taxonomies, domain taxonomies, dynamic taxonomies, integrated taxonomy, lightweight taxonomies, morphological taxonomies, navigational taxonomies, orthogonal taxonomies, shared taxonomies, top- down taxonomy; Cancer genomics , diagnostics & Therapeutics molecular taxonomies Phylogenomics molecular taxonomy, phylogenetic taxonomy; 

Related terms: classifiers, query expansion; Broader/narrower? term: ontologies

See also FAQ question # 4 which has more about taxonomies.

term extraction: Robert Futrelle, Northeastern Univ., 2001 http://www.ccs.neu.edu/home/futrelle/bionlp/psb2001/psb01-tutorial-bib1.htm

Google - about 49,900 Nov 18, 2009

See related information extraction

term mining:  Term Mining in Biomedicine, Sophia Ananiadou - University of Manchester, 2007 http://talks.cam.ac.uk/talk/index/6769 

Google = about 1,990 June 16, 2003; about 2,980 Oct. 22, 2004; about 40,100 June 22, 2007

text categorisation: See Algorithms & data analysis under support vector machines

Google = about  902 "text categorization" 9,220 July 19, 2002 about 27,100 Oct. 22, 2004

text mining:  Usually data mining technologies mine knowledge from data with well-formed schemes such as relational tables. But, text data don't have such scheme, and information is described freely in the documents. Therefore, we focus on Natural Language Processing (NLP) technologies to extract such information. Using NLP technologies, documents are transformed into a collection of concepts, described using terms discovered in the text.

Usually, "text mining" is used to indicate a text search technique. But, we think of text mining as having more functions. Text mining technologies extract more information than just picking up keywords from texts: facts, author's intentions, their expectations, and their claims.  Tokyo Research Lab, IBM, Text Mining  http://www.trl.ibm.com/projects/textmining/index_e.htm 

Using data mining on unstructured data, such as the biomedical literature.  

Text Mining Glossary, ComputerWorld, 2004 http://www.computerworld.com/databasetopics/businessintelligence/story/0,10801,93967,00.html Includes Categorization, clustering, extraction, keyword search, natural language processing, taxonomy, and visualization.

Related terms:  natural language processing; Algorithms & data analysis: support vector machines

Google = about  20,600 July 19, 2002 about 39,300 July 3, 2003; about 113,000 Oct. 22, 2004; about 1,110,000 June 22, 2007

thesaurus, thesauri: See under controlled vocabulary

Google = thesaurus about  2,760,000  thesauri  about 448,000 July 19, 2002; thesaurus about 6,270,000 Oct. 22, 2004 

NISO Z39.19 Standard for Structure and Organization of Information Retrieval Thesauri  http://www.niso.org/standards/resources/Z39-19.html

UDDI: Business of biopharmaceuticals

UMLS Unified Medical Language System In 1986, the National Library of Medicine (NLM), began a long term research and development project to build a Unified Medical Language System ® (UMLS ® ). The purpose of the UMLS is to aid the development of systems that help health professionals and researchers retrieve and integrate electronic biomedical information from a variety of sources and to make it easy for users to link disparate information systems, including computer- based patient records, bibliographic databases, factual databases, and expert systems. The UMLS project develops "Knowledge Sources" that can be used by a wide variety of applications programs to overcome retrieval problems caused by differences in terminology and the scattering of relevant information across many databases.  [UMLS FactSheet, National Library of Medicine, NIH, US, 2002] http://www.nlm.nih.gov/pubs/factsheets/umls.html

unstructured data: Today, transforming unstructured data into a structured form is primarily a manual process; it is time consuming and costly. However, all leading software applications must leverage structured data to be effective. [About Mohomine] http://www.mohomine.com/about/index.asp

Generally free text, natural language.

Related term: natural language processing. Compare structured. 

Google = about  21,200 July 19, 2002

variance: One of the two components of measurement error (the other one being bias). Variance results from uncontrolled (or uncontrollable) variation that occurs in biological samples, experimental procedures, and arrays themselves;  

versioning:

visualization:   A method of computing by which the enormous bandwidth and processing power of the human visual (eye- brain) system becomes an integral part of extracting knowledge from complex data.  It utilizes graphics and imaging techniques as well as knowledge of both data management and the human visual system.  [Lloyd Trenish, Visualization for Deep Thunder, IBM Research, 2002] http://www.research.ibm.com/weather/vis/w_vis.htm

Use of computer- generated graphics to make the information more accessible and interactive. Related term data mining

Narrower terms: data visualization, information visualization; Algorithms & data analysis  dendogram, heat map, profile chart

visualisation: As the quantity of data produced by simulations grows, so does the difficulty of extracting useful information. It is now clear that in many applications visual methods are the only practical way of extracting information from the data. Computer graphics and scientific visualisation techniques have become more important in the last few years with the increased availability of computing resource and of visualisation tools.  Visualisation is becoming one of the key tools for problem solving both in traditional areas such as visualisation of complex flow and in new applications areas like the planning of surgical operations using 3-D recontruction of anatomical sites using diagnostic images or the development of highly-realistic aeroplane simulators for pilot training.  DIRECT Development of an Interdisciplinary Roundtable for Emerging Computer Technologies,  Edinburgh University, Scotland  http://www.epcc.ed.ac.uk/DIRECT/vect.html 

Definitions and Rationale for Visualisation, D. Scott Brown, SIGGRAPH, 1999 http://www.siggraph.org/education/materials/HyperVis/visgoals/visgoal2.htm 

W3C World Wide Web Consortium: Develops interoperable technologies (specifications, guidelines, software, and tools) to lead the Web to its full potential. W3C is a forum for information, commerce, communication, and collective understanding. http://www.w3.org/

web: The genome community was an early adopter of the Web, finding in it a way to publish its vast accumulation of data, and to express the rich interconnectedness of biological information. The Web is the home of primary data, of genome maps, of expression data, of DNA and protein sequences, of X-ray crystallographic structures, and of the genome project's huge outpouring of publications. ... However the Web is much more than a static repository of information. The Web is increasingly being used as a front end for sophisticated analytic software. Sequence similarity search engines, protein structural motif finders, exon identifiers, and even mapping programs have all been integrated into the Web. Java applets are adding rapidly to Web browsers' capabilities, enabling pages to be far more interactive than the original click- fetch- click interface. [Lincoln D. Stein "Introduction to Human Genome Computing via the World Wide Web", Cold Spring Harbor Lab, 1998]  

Related terms: fractal nature of the web, weblike Narrower terms:  semantic web, web portals, web services  

web harvesting: A Web site is usually viewed as a collection of individual pages interconnected by a simple URL links. This is the common basis for Web harvesting engines, where these pages are harvested, indexed, and the search results made available to end- users. As Web sites become increasingly large and sophisticated, it is worthwhile to see how prevalent simple linking is, or if other Web page navigation techniques are replacing the simple linking model. [Web Characterization Project, OCLC, 2001] http://wcp.oclc.org/pubs/rn2-navigation.html

Google = about 536 July 19, 2002; about 3,000 Oct. 22, 2004

weblogs:   Wikipedia http://en.wikipedia.org/wiki/Weblogs 

A history and a perspective http://www.rebeccablood.net/essays/weblog_history.html
Bob's Weblog Backgrounder
Bob Stepno http://radio.weblogs.com/0106327/stories/2002/12/14/bobsWeblogBackgrounder.html 

Related terms: blog, blogging, blogosphere, microcontent, nanopublishing

web portals: 2.1 Web Portals, W3C, Requirements for a web ontology language, work in progress  http://www.w3.org/TR/webont-req/#usecase-portal

Google = about 74,600 ("web portal" about 738,000) July 19, 2002

Web search glossary, Google http://www.google.com/support/bin/answer.py?answer=50187 60 definitions

web service interoperability: Web services technology has the promise to provide a new level of interoperability between software applications. It should be no wonder then that there is a rush by platform providers, software developers, and utility providers to enable their software with SOAP, WSDL, and UDDI capabilities.  http://www-106.ibm.com/developerworks/webservices/library/ws-inter.html

Google = "web service interoperability" about  412 "web services interoperability" about 9,620 July 19, 2002; about 283,000 Nov 17, 2006

web services:  The goal of the Web Services Activity is to develop a set of technologies in order to bring Web services to their full potential.  W3C "Web Services Activity 2002  http://www.w3.org/2002/ws/

Google = about 2,110,000 July 19, 2002; about 122,000,000 Nov 17, 2006

Web services glossary, W3C, http://www.w3.org/TR/ws-gloss/

webizing: "Webizing Existing Systems" Tim Berners-Lee, last updated 2001 http://www.w3.org/DesignIssues/Webize

weblike: [Tim Berners- Lee, Ralph Swick, Semantic web Amsterdam, 2000 May 16] http://www.w3.org/2000/Talks/0516-sweb-tbl/slide3-1.html

Tim Berners- Lee writes in his account of coming up with the idea of the web Weaving the Web about "learning to think in a weblike way". I don't know that I can claim to approach this yet, but the more that I write and research this glossary on and for the web, the more insight I'm getting into what he might mean. Metaphors like "shooting at a moving target" and like Wayne Gretzky "skating to where the puck is going to be" are helpful images.

Google = about  3,020 July 19, 2002; about 5,510 Oct. 22, 2004; about 75,700 Nov 17, 2006 
"web like" about 788,000,000 Nov 17, 2006 

Wiki collaborative software: Allows users to post and edit content remotely. An exciting (and free) way to build and manage content. Wiki Web sites  allow all users to add and edit content. While it might sound like a free-for-all, the authors suggest such Web sites have been used successfully in research, business, and education to document project designs, for brainstorming, and for otherwise creating content in a collaborative fashion.  Bo Leuf, Ward Cunningham, The Wiki Way: Collaboration and sharing on the internet,  2001 

wild cards and Google http://www.google.com/support/bin/answer.py?answer=3178&ctx=sibling Yes you can.

XML: Computers & computing

Bibliography
Barnes, Ken et. al, Microsoft Lexicon or Microspeak made easier, 1995- 1998, 150 + terms.  http://www.cinepad.com/mslex.htm
FOLDOC Free On-line Dictionary of Computing, Denis Howe, 2007. 14,400+ terms.  http://foldoc.org/ 
Glossary of Ontology Terms, Stanford Univ., 2001, 24 terms. http://www-ksl-svc.stanford.edu:5915/doc/frame-editor/glossary-of-terms.html
Information Resource Management Glossary, Government of British Columbia, Canada, 2001 http://www.cio.gov.bc.ca/other/daf/IRM_Glossary.htm
Lycos Tech Glossary 2002 http://webopedia.lycos.com/
Barnes, Ken et. al, Microsoft Lexicon or Microspeak made easier, 1995- 1998, 150 + terms.  http://www.cinepad.com/mslex.htm
Schneider, Tom and Karen Lewis, Glossary for Molecular Information Theory and the Delila System, Lab of Computational and Experimental Biology, NCI Frederick, US, 2004. 100+ definitions.  http://www.lecb.ncifcrf.gov/~toms/glossary.html
W3C Glossary and Dictionary http://www.w3.org/2003/glossary/ 
Web search glossary, Google http://www.google.com/support/bin/answer.py?answer=50187 60 definitions
Web services glossary, W3C, http://www.w3.org/TR/ws-gloss/
Webopedia  http://www.webopedia.com/
whatis.com Information Technology encyclopedia. About 3,000 + definitions.   http://whatis.techtarget.com/
XML Glossary http://www.softwareag.com/xml/about/glossary.htm 

Alpha glossary index

IUPAC definitions are reprinted with the permission of the International Union of Pure and Applied Chemistry.

How to look for other unfamiliar  terms

Contact | Privacy Statement | Alphabetical Glossary List | Tips & glossary FAQs | Site Map