SLA presentation June 10, 2002 Top  Biopharmaceutical Glossary Homepage/Search
PowerPoint slides  Use Internet Explorer 

Where's My Stuff? 
Taxonomy and Lexicon as Keys to Access

The process and art of creating taxonomies has long been a core competency of information professionals, but one that seem to have lost its glamour. Recently, the problem has become overwhelming with the sheer volume of available content. Consequently, the effective categorizing of data is back in vogue. 
Speakers: Mary Chitty, Cambridge Healthtech Institute; Mary Corcoran, Outsell Inc

 Working draft Last revised June 7, 2002     View a Printer-Friendly Version of this Web Page!

Mary Chitty  mchitty@healthtech.com  Library Director, Cambridge Healthtech Institute 
MSLS (UNC-Chapel Hill)

Special Libraries Association (SLA) 2002 Annual Meeting 
Pharmaceutical and Health Technology Division   June 10, 2002, 1:30-3:00 pm, Los Angeles, California


Information retrieval is inherently messy. 
Peter Morville "Little Blue Folders" Argus Associates, 2000
  http://argus-acia.com/strange_connections/strange003.htm

Information overloaded? 
Overworked and underappreciated?
Know anyone who isn't?

Making any progress?
Going in the right direction?

Biggest challenge - Productivity?
Working harder for less certain results.  
What changes would make a noticeable difference?

Reorganization of biology at the molecular and biochemical levels adds to information overload.

"The first layer of the semantic Web consists of ontologies and taxonomies ...  "A huge amount of this is being done very desperately in the realm of biotech, for the human genome and new drug development."   Tim Berners Lee, August 30, 2001 keynote at Software Development East in Boston. Alexandra Weber Morales "Web founder seeks simplicity" Show Daily Online, 2001 http://www.sdgnews.com/sd2001es_006/sd2001es_006.htm.  

Does broader and narrower begin to cover the interrelationships of genes and proteins, genomics and proteomics?

Do we even know what a gene is anymore? 

The promise of genomics 

Taxonomies and ontologies can help 

Time and cost-effective
Customer support 30x more costly than web self-service (Forrester Research "Tier Zero Customer Support" 1999).
Identify overlapping, duplicate projects.

Information overload
Aid navigation, query expansion and contraction, context, accessibility. 

Improve communication among diverse interdisciplinary, geographically scattered work groups.  

Interoperability among databases.  Text mining of scientific literature increasing.

Bibliography: How taxonomies can help
Bibliography: How to get started on taxonomies?

Taxonomy- definitions
Controlled vocabularies
, thesauri share many similarities.

taxonomy: Adds hierarchical relationships (broader terms, narrower terms) and related terms  to controlled vocabularies for improved information retrieval (preferred terms collect synonyms and near-synonyms).

Directories (Yahoo, Open Directory Project) can be called taxonomies. 

navigational taxonomies: Improve web navigation for intuitive browsing and query expansion, by careful choice of top-level categories and sub-categories. Focus on user behavior and mental models. More... Information analysis & interpretation glossary

top-down, bottom -up taxonomies

Narrower terms: controlled vocabularies, descriptive taxonomies, molecular taxonomies, morphological taxonomies, orthogonal taxonomies, phylogenetic taxonomies

Ontology definitions

ontology: Can make unstructured or semi- structured information) machine- understandable as well as machine- readable amenable to logic and reasoning, needs unambiguous term definition. Comes from philosophy and artificial intelligence.

Narrower terms common ontology, dynamic ontology,  heavyweight ontologies, lightweight ontologies, logic based ontologies, micro- theories, taxonomies, natural language ontologies, object based ontologies

More... Information management & interpretation glossary  Related terms: metadata, RDF, semantic web, XML 

Bibliography: What are taxonomies?

 

Where I am coming from

Biotechnology librarian:  Cambridge Healthtech Institute 
 -40+ meetings/year, emerging technologies, genomics & informatics.  My  Genomic glossaries & taxonomies on the web  2 years, in the works  3 years.

CHI is in the "information overload" business. But we get overloaded too! 

Pharmacy librarian: Sheppard Library, Mass College of Pharmacy, Boston MA 
Air Pollution Technical Information Center, EPA Library, Research Triangle Park, NC

Why am I doing this?
Talk about glossaries: Forcing me to articulate issues I've been thinking about and wrestling with for a long time.

Compiling my glossaries: Be able to talk about highly technical, complex subjects > 30 seconds. 

The more I know, the more I can admit not knowing.
Questions I most want to ask? Ones where the answer will surprise me.

Putting puzzle pieces together to see how they fit (often in unexpected ways).

In- house indexing, information retrieval, content management,  integration,  understanding, knowledge management.  MeSH headings don't always cover emerging technologies.  

Users find it hard to articulate what they want?  I know I do.
Taxonomies better at finding information you don't know what to call  if it does exist. Search engines best at retrieving known items.  

Change and uncertainty 
I like the image of shooting at a moving target 
I just skate to where the puck is going to be –  hockey great Wayne Gretzky

Old ways seem less productive
Unclear what changes will give better results.
New interdisciplinary, unfamiliar influences.
Hard to know what is safe to ignore as irrelevant anymore.

What's next?
Very EXCITING times in BIOLOGY and the life sciences.  
Still trying to solve REALLY HARD problems.
Pharmaceutical industry increasingly SCIENCE DRIVEN and INFORMATION INTENSIVE 

Opportunities - as well as threats

Bibliography: Cautious optimism

Biggest challenge? Integration

Many disciplines relevant to pharmaceutical and biotechnology research.

analytical chemistry, biochemistry, bioengineering, bioinformatics, biomaterialsbiomechanics, biophysics, biotechnology, cell biology, clinical and research medicine, computer sciences, developmental and structural biology, electrochemistry, electronics, engineering, enzymology, epidemiology, imaging, immunology, mathematics, microbiology, molecular biology, optics, pharmacology, public health, statistics, toxicology, virology and aspects of business, chaos theory, ethics and law are all relevant. 

How do/can different disciplines communicate and collaborate?
How many (can any?) of us can be expert in all these areas?

My taxonomy methodology

Assess user needs
Information overloaded?
Structure unstructured environment?

Measure project progress
Quantitative-web metrics
Qualitative- reviews, user feedback

Share best practices!

Genomic glossaries - quick tour 

Project evolution
-Started as simple glossary. 
-Terms not easily found in dictionaries.
-Incremental changes, developed standard templates and formats.
-Realized some definitions - like gene are in flux.  . 
-More emphasis on bibliography/ directory of glossaries.
-More firmly focused on emerging terms.

Always more terms to add:  How to look for other unfamiliar terms glossary methodology 

Scope notes and history About genomic glossaries & taxonomies 

How does this relate to your projects?

Best Practices- 

Assess user needs
Get support from the top, buy-in from users.

-Start small  and low-key
-because you're going to make changes
-Aim to make browsing intuitive
-Shorter is better (but often takes more time).
-80/20 rule

-Plan for ongoing change
-Fast growth AS LONG as  business models support

Don’t reinvent the wheel
Reuse models, templates, scaffolds (give credit, ask permission if appropriate)
See what works well, resonates with people.  

Best practices for knowledge management, intranets, extranets, portals are inextricably intertwined with taxonomies.

Bibliography: Don't reinvent the wheel Clinical (and more general) vocabularies, taxonomies, thesauri, ontologies

Lessons learned – 

Modularity - Reusability 
Mix and match, faster (eventually), time and cost-effective.

Descriptive - not prescriptive definitions
The Oxford English Dictionary is not an arbiter of proper usage, despite its widespread reputation to the contrary. The Dictionary is intended to be descriptive, not prescriptive. In other words, its content should be viewed as an objective reflection of English language usage, not a subjective collection of usage ‘dos’ and ‘don'ts’ http://dictionary.oed.com/public/help/OED_guide/overview.htm
-Terminology evolves over time.
-Organizations take years to reach consensus when compiling official definitions.

Preferred terms: New variants keep evolving, hard to say which will prevail.  Would clarifying the degree to which terms are synonymous (or nearly so) justify the difficulty in reaching consensus on variant meanings?  FAQ Question #3 How do you determine the relative prevalence/  popularity of variant terms?  Google helps

Bibliography: Dynamic taxonomies

Human vs. computer indexing
Combination often better than either in isolation

Bibliography: Human vs. automated classification

Granularity
Taxonomies inherently get more and more granular.

Aim first for highly visible results
Pick your problems (and projects) as carefully as a graduate student or postdoc does

Relevance is inherently subjective.
Information user needs differ 
What do your information users value most?

What do information users really want?

Ongoing challenges -- for future consideration
Maintenance and upkeep

Web usability
Search engines
Taxonomies complement search, provide alternative access points.

Web data analysis AND interpretation

Integration

ROI return on investment
Best practices?  Lessons learned?  [still working on this part] 

Make drug discovery and development move faster? Getting successful drugs into the market earlier? 

Information sharing
Reusable, shared ontologies
Fit with biopharma culture of proprietary, closely-held information?
Fragmentation or fractal society?

Points to remember in my opinion
Enormous ferment right now
No clear-cut best answers, few (if any) quick fixes

Tradeoffs and balancing acts

Pharmaceutical and biotech companies on bleeding-edge, cutting-edge.

Ongoing process: incremental changes, periodic major restructuring. 

Monitor scalability. 
Think in terms of industrializing information retrieval and extraction.

It's not just about the technology.
It's how people use the technology
Collect, manage, interpret, integrate data and information
Communicate (or don't) across traditional disciplines and boundaries.

Plan for ongoing change!
Web is a key enabling technology.
How will/do process(es) scale?
What can I stop doing? Automate?

Taxonomies and ontologies sound sexier than thesauri or controlled vocabularies.  

Information packaging and delivery is important.

Take home messages
FOCUS
on 

Tools to help people save time, find information fast. 
Eliminate stuff people don’t’ want to do (including us).
Share best practices and lessons learned. 
Find metrics to measure progress ( so you know when you've made some).

Aim to be a pragmatic visionary
Pick challenging - but not impossible problems - and don't try to do them alone.

Librarians (pharmaceutical, biotech and others) are smart, knowledgeable people with terrific interdisciplinary expertise, good at classification, organization and content management, with great networks of colleagues and friends.

What can we learn from each other?

Bibliography: Community building

-omics & informatics home page    Genomics info   Genomics overview   More info resources 

Mary Chitty mchitty@healthtech.com   http://www.genomicglossaries.com  Cambridge Healthtech Institute http://www.healthtech.com

Contact | Privacy Statement | Alphabetical Glossary List | Tips & glossary FAQs | Site Map