You are here > Genomics Glossary Homepage > Informatics > Artificial Intelligence, Data Science & Machine Learning Artificial
Intelligence, Data Science &
Machine Learning Glossary & Taxonomy
SCOPE NOTE Data Science includes artificial intelligence, big data, data lakes, data quality, data stewardship, data storytelling, data swamp, data visualization, deep learning, deep machine learning, FAIR data, Hadoop, heavy quants light quants, heuristic, machine learning, neural networks, Neural Networks Artificial, open science, supervised machine learning, Support Vector Machine SVM, unsupervised machine learning
"But where machine learning shines is in handling enormous numbers of
predictors — sometimes, remarkably, more predictors than observations —
and combining them in nonlinear and highly interactive ways.1 This
capacity allows us to use new kinds of data, whose sheer volume or
complexity would previously have made analyzing them unimaginable….
Machine learning has become ubiquitous and indispensable for solving
complex problems in most sciences …. In biomedicine, machine learning can
predict protein structure and function from genetic sequences and discern
optimal diets from patients’ clinical and microbiome profiles. The same
methods will open up vast new possibilities in medicine. … Clinical
medicine has always required doctors to handle enormous amounts of data,
from macro-level physiology and behavior to laboratory and imaging studies
and, increasingly, “omic” data. …. Machine learning will become an
indispensable tool for clinicians seeking to truly understand their
patients.”
Predicting the Future — Big Data, Machine Learning, and Clinical Medicine
Ziad
Obermeyer, MD & Ezekiel J. Emanuel, MD, PhD NEJM Catalyst Oct. 10, 2016
https://catalyst.nejm.org/big-data-machine-learning-clinical-medicine/
artificial intelligence
(AI): Theory and
development of COMPUTER SYSTEMS which perform tasks that normally require
human intelligence. Such tasks may include speech recognition, LEARNING;
VISUAL PERCEPTION; MATHEMATICAL COMPUTING; reasoning, PROBLEM SOLVING,
DECISION-MAKING, and translation of language. Year introduced: MeSH 1986 Or as some people have noted, laboriously trying to get computers to do
what people do intuitively, without great effort. Conversely there are
things computer can do (relatively) effortlessly such as massive numbers
of error- free calculations. The most promising applications seem to
involve incorporating both computer aided consideration of many
possibilities, combined with human judgment. Narrower terms:
artificial general intelligence, artificial narrow intelligence, cellular
automata, expert systems, fuzzy logic, genetic algorithms, neural nets
Related term: training sets.
Artificial Intelligence for Early Drug Discovery
How to Best Use AI & Machine Learning for
Identifying and Optimizing Compounds and Drug Combinations
APRIL
15-16, 2020 San Diego CA
brings together experts from
chemistry, target discovery, DMPK and toxicology to talk about the
increasing use of computational tools, AI models, machine learning
algorithms and data mining in drug design and lead optimization.
Introductory level talks to bring attendees up-to-speed with how AI is
being applied in drug discovery, followed by talks introducing advanced
concepts using relevant case studies and research findings.
https://www.drugdiscoverychemistry.com/Artificial-Intelligence/
Artificial Intelligence in
Clinical Research
2020
Feb 20-21, Orlando FL Artificial intelligence (AI) and
machine learning (ML) have propelled many industries toward a new highly
functional and powerful state. Now they are starting to make their way
into the clinical research realm. Many pharmaceutical companies and larger
CROs are starting projects involving some elements of AI, ML and robotic
process automation in clinical trials.
AI Trends: Business and Technology of Enterprise Artificial Intelligence https://www.aitrends.com/ AIWorld Conference & Expo September 29-October 1 2020 Boston MA https://aiworld.com/ There is no shortage of opinions on the potential for AI technologies in business. However, the current round of solutions is often viewed as expensive, proprietary, and complex to deploy and manage. When will AI solutions scale enterprise and industry-wide? Is it possible to measure ROI for automation? How does AI rank against other corporate initiatives?
AIWorld
Government June 22-24 2020 Washington DC
https://www.aiworldgov.com/
With AI technology at the forefront of our everyday lives, data-driven
government services are now possible from federal, state, and local
agencies. This has led to the rapid rise in availability and use of
intelligent automation solutions.
big
data:
data
sets that
are so voluminous and complex that traditional data
processing application
software are
inadequate to deal with them. Big data challenges include capturing
data, data
storage, data
analysis,
search, sharing, transfer, visualization, querying,
updating and information
privacy
… The
term has been in use since the 1990s, with some giving credit to John
Mashey for
coining or at least making it popular.[14][15] Big
data usually includes data sets with sizes beyond the ability of commonly
used software tools to capture, curate,
manage, and process data within a tolerable elapsed time.[16] Big
Data philosophy encompasses unstructured, semi-structured and structured
data, however the main focus is on unstructured data.[17] Big
data "size" is a constantly moving target, as of 2012 ranging from a few
dozen terabytes to many petabytes of
data.[18] Big
data requires a set of techniques and technologies with new forms of
integration to reveal insights from datasets that are diverse, complex,
and of a massive scale.[19]
data driven decision making
Not everyone was embracing
data-driven decision making. In fact, we found a broad spectrum of
attitudes and approaches in every industry. But across all the analyses we
conducted, one relationship stood out: The more companies characterized
themselves as data-driven, the better they performed on objective measures
of financial and operational results. In particular, companies in the top
third of their industry in the use
of data-driven decision making were, on average, 5% more productive and 6%
more profitable than their competitors. This performance difference
remained robust after accounting for the contributions of labor, capital,
purchased services, and traditional IT investment. It was statistically
significant and economically important and was reflected in measurable
increases in stock market valuations.
Big Data The Management Revolution, Andrew McAfee and
Erik Brynjolfsson
Harvard Business
review, 2012 Oct
https://hbr.org/2012/10/big-data-the-management-revolution
data lake:
The idea of data lake is to have a single store of all data in the
enterprise ranging from raw data (which implies exact copy of source
system data) to transformed data which is used for various tasks including
reporting, visualization, analytics and machine learning. The data lake
includes structured data from relational databases (rows and columns),
semi-structured data (CSV, logs, XML, JSON), unstructured data (emails,
documents, PDFs) and even binary data (images, audio, video) thus creating
a centralized data store accommodating all forms of data. Wikipedia
Accessed June 2017 https://en.wikipedia.org/wiki/Data_lake
data quality: A vital consideration for data analysis and interpretation. While people are still reeling from the vast amount of data becoming available, they need to brace themselves to both discard low quality data and handle much more at the same time.
Dr. [John] Sulston lived by one of his favorite dictums: “There is no point
in wasting good thoughts on bad data.” New York Times
https://www.nytimes.com/2018/03/15/obituaries/john-e-sulston-75-dies-found-clues-to-genes-in-a-worm.html
data science: also known as data-driven science, is an interdisciplinary field of scientific methods, processes, and systems to extract knowledge or insights from data in various forms, either structured or unstructured,[1][2] similar to data mining. … It employs techniques and theories drawn from many fields within the broad areas of mathematics, statistics, information science, and computer science, in particular from the subdomains of machine learning, classification, cluster analysis, data mining, databases, and visualization. … is now often applied to business analytics,[7] or even arbitrary use of data, or used as a sexed-up term for statistics.[8] While many university programs now offer a data science degree, there exists no consensus on a definition or curriculum contents. Wikipedia accessed 2018 Jan 23 https://en.wikipedia.org/wiki/Data_science An interdisciplinary field involving processes, theories, concepts, tools, and technologies, that enable the review, analysis, and extraction of valuable knowledge and information from structured and unstructured (raw) data. MeSH 2019 Data science is an integral component of modern biomedical research. It is the interdisciplinary field of inquiry in which quantitative and analytical approaches, processes, and systems are developed and used to extract knowledge and insights from increasingly large and/or complex sets of data. Data science has increased in importance for biomedical research over the past decade and NIH expects that trend to continue. In order to capitalize on the opportunities presented by advances in data science, and overcome key challenges, the NIH is developing a Strategic Plan for Data Science. This plan describes NIH’s overarching goals, strategic objectives, and implementation tactics for promoting the modernization of the NIH-funded biomedical data science ecosystem. The complete draft plan is available at: https://grants.nih.gov/grants/rfi/NIH-Strategic-Plan-for-Data-Science.pdf. Request for Information (RFI): Soliciting Input for the National Institutes of Health (NIH) Strategic Plan for Data Science Notice Number: NOT-OD-18-134, March 2018 https://grants.nih.gov/grants/guide/notice-files/NOT-OD-18-134.html DataScience@NIH https://datascience.nih.gov/community
data scientist: a
high-ranking professional with the training and curiosity to make
discoveries in the world of big data. The title has been around for only a
few years. (It was coined in 2008 by one of us, D.J. Patil, and Jeff
Hammerbacher, then the respective leads of data and analytics efforts at
LinkedIn and FaceBook.) … More
than anything, what data scientists do is make discoveries while swimming
in data. It’s their preferred method of navigating the world around them.
At ease in the digital realm, they are able to bring structure to large
quantities of formless data and make analysis possible. They identify rich
data sources, join them with other, potentially incomplete data sources,
and clean the resulting set.. … As they make discoveries, they communicate
what they’ve learned and suggest its implications for new business
directions. Often they are creative in displaying information visually and
making the patterns they find clear and compelling. … Data scientists’
most basic, universal skill is the ability to write code. ..More enduring
will be the need for data scientists to communicate in language that all
their stakeholders understand—and to demonstrate the special skills
involved in storytelling with data, whether verbally, visually,
or—ideally—both. … Data scientists want to be in the thick of a developing
situation, with real-time awareness of the evolving set of choices it
presents. Data Scientist: The Sexiest Job of the 21st Century Thomas H.
Davenport and D.J. Patil, Harvard Business Review Oct 2012 http://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century/ar/5
data stewardship: Beyond
proper collection, annotation, and archival, data stewardship includes
the notion of ‘long-term care’ of valuable digital assets, with the goal
that they should be discovered and re-used for downstream investigations,
either alone, or in combination with newly generated data.
The outcomes from good data management
and stewardship,
therefore, are high quality digital publications that facilitate and
simplify this ongoing process of discovery, evaluation, and reuse in
downstream studies.: The
FAIR Guiding Principles for Scientific Data Management
and Stewardship
Mark D Wilkinson, Madrid, Spain; Michel Dumontier, Stanford CA, Berend
Mons, Leiden Univ, Utrecht, Netherlands, https://www.nature.com/articles/sdata201618
data storytelling:
https://www.forbes.com/sites/brentdykes/2016/03/31/data-storytelling-the-essential-data-science-skill-everyone-needs/#6b7ff3b952ad
Related terms: heavy quants, light quants data swamp:
The data lake has
been labeled as a raw data reservoir or a hub for ETL [Extract Transfer
Load] offload. The data
lake has been defined as a central hub for self-service analytics. The
concept of the data lake has been overloaded with meanings, which puts the
usefulness of the term into question.[16]
The data in Data Lakes
should not have indefinite life in the repository to make it data swamp.
Most of corporate companies who manage data lakes define effective data
archival or data removing techniques and procedures to keep the pond
within controllable limits.
Wikipedia accessed 2018 Jan 24
https://en.wikipedia.org/wiki/Data_lake
data translators:
Translators are neither data architects nor data engineers. They’re not
even necessarily dedicated analytics professionals, and they don’t possess
deep technical expertise in programming or modeling. Instead, translators
play a critical role in bridging the technical expertise of data engineers
and data scientists with the operational expertise of marketing, supply
chain, manufacturing, risk, and other frontline managers. In their role,
translators help ensure that the deep insights generated through
sophisticated analytics translate into impact at scale in an
organization.
At the outset of an analytics initiative, translators draw on their domain
knowledge to help business leaders identify and prioritize their business
problems, based on which will create the highest value when solved. These
may be opportunities within a single line of business (e.g., improving
product quality in manufacturing) or cross-organizational initiatives
(e.g., reducing product delivery time).
Translators then tap into their working knowledge of AI and analytics to
convey these business goals to the data professionals who will create the
models and solutions. Finally, translators ensure that the solution
produces insights that the business can interpret and execute on, and,
ultimately, communicates the benefits of these insights to business users
to drive adoption. Analytics Translator: The new must have role
By Nicolaus
Henke, Jordan Levine, and Paul
McInerney, Harvard Business Review 2018 Feb
https://www.mckinsey.com/business-functions/mckinsey-analytics/our-insights/analytics-translator
data
visualization: The
classical definition of visualization is as follows: the formation of
mental visual images, the act or process of interpreting in visual terms
or of putting into visual form. A new definition is a tool or method for
interpreting image data fed into a computer and for generating images from
complex multi-dimensional data sets (1987). Definitions and
Rationale for Visualisation,D. Scott Brown, SIGGRAPH, 1999 http://www.siggraph.org/education/materials/HyperVis/visgoals/visgoal2.htm
includes information on data visualization. Related term: information
visualization; Broader term: visualization deep
learning”
– another hot topic buzzword – is simply machine learning which is derived
from “deep” neural nets. These are built by layering many networks on top
of each other, passing information down through a tangled web of
algorithms to enable a more complex simulation of human learning. Due to
the increasing power and falling price of computer processors, machines
with enough grunt to run these networks are becoming increasingly
affordable. What is Machine Learning: A complete beginner’s guide in 2017,
Bernard Marr, Forbes 2017 May Supervised or unsupervised
machine learning methods that use multiple layers of data representations
generated by nonlinear transformations, instead of individual
task-specific ALGORITHMS, to build and train neural network models. MeSH
2019
Distributed (Deep) Machine Learning Community
https://github.com/dmlc
FAIR data—Findable,
Accessible, Interoperable, Reusable:
Meeting the fair principles Principle A: accessible The principle of Accessibility speaks to the
ability to retrieve data or metadata based on its identifier, using an
open, free, and universally implementable standardized protocol. The
protocol must support authentication and authorization if necessary, and
the metadata should be accessible “indefinitely,” and independently of the
data, such that identifiers can be interpreted/understood even if the data
they identify no longer exists. Principle I: interoperable The Interoperability Principle states that
(meta)data use a formal, accessible, shared, and broadly applicable
language for knowledge representation; that vocabularies themselves should
follow FAIR principles; and that the (meta)data should include qualified
references to other (meta)data. Principle R: reusable
The FAIR Reusability principle requires that meta(data) have a plurality
of accurate and relevant attributes; provide a clear and accessible data
usage license; associate data and metadata with their provenance; and meet
domain-relevant community standards for data content. Publishing
FAIR Data: An Exemplar Methodology Utilizing PHI-Base Frontiers in Plant
Science, 2016 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4922217/
FAIR findability:
the ease with which information
contained on a website can
be found, both from outside the website (using search
engines and
the like) and by users already on the website.[1] Although
findability has relevance outside the World
Wide Web,
the term is usually used in that context. Most relevant websites do not
come up in the top results because designers and engineers do not cater to
the way ranking algorithms work currently.[2] Its
importance can be determined from the first law of e-commerce,
which states "If the user can’t find the product, the user can’t buy the
product."[3]
Wikipedia
https://en.wikipedia.org/wiki/Findability accessed 2017 Oct 28
ACCESSIBILITY:
https://en.wikipedia.org/wiki/Accessibility
REUSABILITY: In computer science and software engineering, reusability is the use of existing assets in some form within the software product development process. Assets are products and by-products of the software development life cycle and include code, software components, test suites, designs and documentation. Leverage is modifying existing assets as needed to meet specific system requirements. Wikipedia https://en.wikipedia.org/wiki/Reusability accessed 2017 Oct 28
Hadoop:
https://www.sas.com/en_us/insights/big-data/hadoop.html
http://hadoop.apache.org/ heavy quants, light quants: A “light quant” is someone who knows something about analytical and data management methods, and who also knows a lot about specific business problems. The value of the role comes, of course, from connecting the two. Of course it would be great if “heavy quants” also knew a lot about business problems and could apply their heavy quantitative skills to them, but acquiring deep quantitative skills tends to force out other types of training and experience. The “analytical translator” may also have some light quant skills, but this person is also extremely skilled at communicating the results of quantitative analyses, both light and heavy. .. Organizations need people of all quantitative weights and skills. If you want to have analytics and big data used in decisions, actions, and products and services, you may well benefit from light quants and translators. Thomas Davenport, “In praise of “light quants” and “analytical translators” , 2015 https://www2.deloitte.com/us/en/pages/deloitte-analytics/articles/in-praise-of-light-quants-and-analytical-translators.html Related term: data storytelling heuristic: Tools
such as genetic algorithms or neural networks employ heuristic methods to
derive solutions which may be based on purely empirical information and
which have no explicit rationalization. IUPAC Combinatorial Chemistry Trial and error methods I2B2 Informatics for Integrating Biology & the Bedside: An NIH- funded National Center for Biomedical Computing based at Partners HealthCare System. [Boston] http://www.i2b2.org/ information overload: Biomedicine is in the middle of revolutionary advances. Genome projects, microassay methods like DNA chips, advanced radiation sources for crystallography and other instrumentation, as well as new imaging methods, have exceeded all expectations, and in the process have generated a dramatic information overload that requires new resources for handling, analyzing and interpreting data. Delays in the exploitation of the discoveries will be costly in terms of health benefits for individuals and will adversely affect the economic edge of the country. Opportunities in Molecular Biomedicine in the Era of Teraflop Computing: March 3 & 4, 1999, Rockville, MD, NIH Resource for Macromolecular Modeling and Bioinformatics Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana- Champaign Molecular Biomedicine in the Era of Teraflop Computing - DDDAS.org Many of today's problems stem from information overload and there is a desperate need for innovative software that can wade through the morass of information and present visually what we know. The development of such tools will depend critically on further interactions between the computer scientists and the biologists so that the tools address the right questions, but are designed in a flexible and computationally efficient manner. It is my hope that we will see these solutions published in the biological or computational literature. Richard J. Roberts, The early days of bioinformatics publishing, Bioinformatics 16 (1): 2-4, 2000 "Information overload" is not an overstatement these days. One of the biggest challenges is to deal with the tidal wave of data, filter out extraneous noise and poor quality data, and assimilate and integrate information on a previously unimagined scale
Where's
my stuff? Ways to help with information overload, Mary Chitty, SLA
presentation June 10, 2002, Los Angeles CA Information in OMIM [Online Mendelian Inheritance in Man] and the published working draft of the International Human Genome Sequencing Consortium (Nature 15 Feb. 2001) has been facilitated by ties to NCBI's RefSeq and LocusLink databases. Are there other good examples of integrated databases? Related terms: Bio-Ontology Standards Group, Data Model Standards Group; Functional genomics Gene Ontology
Integration
of data: integration of the various types of large-scale data is currently
receiving much attention.
There appears, however, to be little
agreement on what exactly is meant by “integration”, not to mention how to
achieve it. The world “integration” is being attached to almost any
analysis that involves the combined use of two more large datasets. Lars.
J Jenson, Peer Bork, quality analysis and integration of large-scale
molecular data sets. Drug Discovery today, targets, 3(2): 51-56
April 2004
https://www.sciencedirect.com/science/article/abs/pii/S1741837204024089?via%3Dihub Allows researchers to increase the value they get
from the data, because if increases the base of information they can
access
just in time
information: http://www.wordstream.com/blog/ws/2013/10/02/just-in-time-information-hacks
Just-In-Time Information
Retrieval.
Bradley J. Rhodes. Ph.D. Dissertation, MIT Media Lab, May 2000. Just in
time retrieval agents Bradley J. Rhodes http://alumni.media.mit.edu/~rhodes/Papers/rhodes-phd-JITIR.pdf
machine learning: At
its most simple, machine learning is about teaching computers to learn in
the same way we do, by interpreting data from the world around us,
classifying it and learning from its successes and failures. In fact,
machine learning is a subset, or better, the leading
edge of artificial intelligence. How
did machine learning come about? Building
algorithms capable of doing this, using the binary “yes” and “no” logic of
computers, is the foundation of machine learning – a phrase which was
probably first used during serious research by Arthur Samuel at IBM during
the 1950s. Samuel’s earliest experiments involved teaching machines to
learn to play checkers. … For
example, in medicine, machine learning is being applied to genomic data to
help doctors understand, and predict, how
cancer spreads,
meaning more effective treatments can be developed. What is Machine
Learning: A complete beginner’s guide in 2017, Bernard Marr, Forbes 2017
May https://www.forbes.com/sites/bernardmarr/2017/05/04/what-is-machine-learning-a-complete-beginners-guide-in-2017/#33c58c2f578f A type of ARTIFICIAL
INTELLIGENCE that enable COMPUTERS to independently initiate and execute
LEARNING when exposed to new data. Year introduced: MeSH 2016
Machine Learning and Artificial Intelligence
2020 March 2-4, San Francisco CA
Applying AI and Machine Learning
Techniaues to solve drug Discovery challenges Machine
learning, specifically for drug discovery, development, diagnostics and
healthcare, is highly data-intensive with disparate types of data being
generated that have historically been trial-and-error processes. Deep
learning, machine learning (ML) and artificial intelligence (AI), coupled
with correct data, have the potential to make these processes less
error-prone and increase the likelihood of success from drug discovery to
the real world setting.
metadata:
Taxonomies & Ontologies Communication between
statisticians and neural net researchers is often hindered by the
different terminology used in the two fields. There is a comparison of
neural net and statistical jargon at Often uses fuzzy logic Narrower terms: artificial neural networks, probabilistic neural networks. ; Related terms: artificial intelligence open
science:
According to the FOSTER
taxonomy
[3] Open
science can often include aspects of Open
access, Open
data and
the open
source movement whereby
modern science requires software in order to process data and information.
[12][13] Open
research computation also
addresses the problem of reproducibility of
scientific results.
The term "open science"
does not have any one fixed definition or operationalization. On the one
hand, it has been referred to as a "puzzling phenomenon".[14] On
the other hand, the term has been used to encapsulate a series of
principles that aim to foster scientific growth and its complementary
access to the public. Two influential sociologists, Benedikt Fecher and
Sascha Friesike, have created multiple "schools of thought" that describe
the different interpretations of the term.[15]
predictive analytics: encompasses
a variety of statistical techniques from modeling, machine
learning,
and data
mining that
analyze current and historical facts to make predictions about
future, or otherwise unknown, events. .. The core of predictive analytics
relies on capturing relationships between explanatory
variables and
the predicted variables from past occurrences, and exploiting them to
predict the unknown outcome.
Wikipedia accessed April 2015 Predictive Model Markup Language, Data Monitor Group http://www.dmg.org/ Python: a remarkably powerful dynamic programming language that is used in a wide variety of application domains. Python is often compared to Tcl, Perl, Ruby, Scheme or Java. About Python http://www.python.org/about/ Wikipedia http://en.wikipedia.org/wiki/Python_(programming_language) R:
a free
(libre) programming
language and
software environment for statistical
computing and
graphics that is supported by the R Foundation for Statistical Computing.[6] The
R language is widely used among statisticians
and data
miners for
developing statistical
software
[7] and data
analysis.[8]
Wikipedia accessed 2018 Jan 24 https://en.wikipedia.org/wiki/R_(programming_language)
robust: A
statistical test that yields approximately correct results despite the
falsity of certain of the assumptions on which it is based Oxford
English Dictionary Hence, can refer to a
process which is relatively insensitive to human foibles and variables in
the way (for example, an assay)
is carried out. Idiot- proof.
Proposed Regulatory Framework for Modifications to Artificial
Intelligence/Machine Learning (AI/ML)-Based Software as a Medical Device
(SaMD) - Discussion Paper and Request for Feedback The Food and Drug Administration announced
Tuesday that it is developing a framework for regulating artificial
intelligence products used in medicine that continually adapt based on new
data. The agency’s outgoing commissioner, Scott Gottlieb, released a white
paper that sets forth the broad outlines of the FDA’s
proposed approach to establishing greater oversight over this rapidly
evolving segment of AI products. It is the most forceful step the FDA has
taken to assert the need to regulate a category of artificial intelligence
systems whose performance constantly changes based on exposure to new
patients and data in clinical settings. These machine-learning systems
present a particularly thorny problem for the FDA, because the agency is
essentially trying to hit a moving target in regulating them. FDA
developing new rules for artificial intelligence in medicine, STAT, 2019
April 2 https://www.statnews.com/2019/04/02/fda-new-rules-for-artificial-intelligence-in-medicine/
stochastic: "Aiming,
proceeding by guesswork" (Webster's Collegiate Dictionary). Term which is
often applied to combinatorial processes involving true random sampling,
such as selection of beads from an encoded library, or certain methods for
library design. IUPAC COMBINATORIAL CHEMISTRY Truly
random, based on probability.
Support Vector Machine SVM :SUPERVISED MACHINE
LEARNING algorithm which learns to assign labels to objects from a set of
training examples. Examples are learning to recognize fraudulent credit
card activity by examining hundreds or thousands of fraudulent and
non-fraudulent credit card activity, or learning to make disease diagnosis
or prognosis based on automatic classification of microarray gene
expression profiles drawn from hundreds or thousands of samples. Year
introduced: MeSH 2012
training set: An initial dataset for which the correct answers are known and feeding the data and correct answers into a program that adjusts the parameters of the general model. The training program adjusts the model parameters so that the model works well on the given dataset. There are usually enough parameters so that this can be accomplished, provided the dataset is reasonably consistent. The training set usually has to be very large to produce a good classifier. Narrower terms: supervised training sets, unsupervised training sets
Data Sciences Resources
How to look for other unfamiliar terms IUPAC definitions are reprinted with the permission of the International Union of Pure and Applied Chemistry. |
Contact
| Privacy Statement |
Alphabetical
Glossary List | Tips & glossary
FAQs | Site Map