PDF version of this report
You must have Adobe Acrobat reader to view, save, or print PDF files. The reader
is available for free
download.
Taxonomy Basics
Copyright 2018, Faulkner Information Services. All Rights Reserved.
Docid: 00011543
Publication Date: 1804
Report Type: TUTORIAL
Preview
As the cost of capturing and storing business information continues to fall,
many businesses find themselves sitting on a potential goldmine of valuable
data – but without the tools to dig out the right information at the right time.
A taxonomy can be used to organize business information logically and consistently,
so users can find the information they need either by searching for it directly, or by browsing
to more specific, more inclusive, or
related topics. This report discusses taxonomies: What they are, how they are
used, and the enterprise software used to create and manage them consistently
and efficiently.
Report Contents:
- Executive Summary
- Description
- Current View
- Outlook
- Recommendations
- Web Links
- Related Faulkner Reports
Executive Summary
[return to top of this report]
Enterprises collect enormous quantities of information that can
significantly improve all aspects of a business, from forecasting and decision-making to sales and customer service.
|
Using |
These benefits can be realized, however, only if people are
able to find and make sense of the right information when it is needed. Many
enterprises extract value from the business information they accumulate by organizing
the data logically and consistently into categories and subcategories, creating
a taxonomy that helps both customers and employees to better access and use
valuable
information. A taxonomy can also improve
search results by showing the levels immediately above, below, and adjacent to
the search term in the hierarchy, providing a meaningful context as well as ideas
for further exploration. Some taxonomies also list
synonyms or preferred terms and automatically expand searches to include
equivalent terms.
Multiple independent taxonomies, or facets, can be overlaid to provide
different views into the same data. Facets also allow information to be labeled and organized differently for
various groups, such as customers, sales staff, support staff, and scientists.
Several tasks are involved in maintaining a taxonomy
of business information. A business must first determine a suitable structure
for the data it has or will accumulate and then assign each piece of content a
place in the structure. To help users make the best use of this information,
the taxonomy must then be integrated with other business systems. Continued
maintenance will also be necessary, both updating the taxonomy to keep it
relevant and classifying new information as it is added.
Many businesses do not catalog their data consistently, either because the
process is too time-consuming or because more immediately urgent tasks
intervene. Even when a business does address this issue, it can be
difficult to ensure that all the data is included and categorized consistently
across business units in a way that will be logical and understandable to its
intended users.
Taxonomy management software can be used to reduce the time, labor, and
potential inconsistencies involved in creating, implementing, and maintaining a taxonomy. With such software, a business can import, convert,
merge, and modify existing taxonomies and also automatically generate
taxonomies to custom-fit its data. Taxonomy software can analyze a text and
automatically assign it to a place in the taxonomy, with the option for users
to manually override or modify the resulting classification. Taxonomy software
can also integrate with or send output to content management, portal, and other
enterprise management systems. It can even streamline workflow within a
business by enabling automatic routing and responding for documents, e-mails,
and customer interactions based on their content or other characteristics.
Taxonomy management software is increasing in power and complexity. Many
packages create a thesaurus or ontology rather than just a
taxonomy. Thesauri and ontologies are similar
to taxonomies; the main difference is their addition of non-hierarchical
associations or explicitly described relationships, such as "written
by."
A topic map expresses a taxonomy, thesaurus, or ontology in
computer-understandable language. This facilitates topic map interchange,
merging, and portability. It also enables business opportunities related to the
creation and sale of proprietary topic maps that extract knowledge from
multiple information pools. A dedicated retrieval and query language simplifies
and stimulates application development by making it easier to extract
information from topic maps.
As intelligent search technologies mature, it may become possible to achieve the
benefits of a well-organized taxonomy without the labor-intensive set up and maintenance. With such a system, a company
would still need to create a taxonomy, but a far simpler and less costly one.
Description
[return to top of this report]
Just as a library would be of little use if it failed to organize and
catalog its books, so accumulated business information provides little value to
an enterprise unless it is organized to allow efficient retrieval and analysis.
Poor information management reduces productivity. As reported by KM World:
- Knowledge
workers spend 15 to 35 percent of their time searching for information. - 40
percent cannot find the information they need on their corporate
intranets. - 15
percent of their time is spent duplicating information that exists but
cannot be found.
A taxonomy
enables users to find what they need by starting with a general topic at a high
level, and then working down through subcategories to find more specialized
information. They can also use the taxonomy to explore by moving from a
specific topic up to a more inclusive topic or sideways to related topics,
even if they are not sure what they are looking for. In addition, a taxonomy
makes searching for information easier and more effective because search
results can show the levels immediately above, below, and adjacent to the
search term in the hierarchy. This provides context as well as ideas for further
exploration. Some taxonomies use a numerical index that is shorter and easier
to work with. The Dewey Decimal and US Library of Congress systems are
numerically indexed taxonomies.
Strictly speaking, a taxonomy encodes only category
information; however, some also store a list of synonyms or preferred terms so
that a search on one term will be automatically expanded to include equivalent
terms, increasing the amount of relevant information retrieved. Others provide
semantic or concept search capabilities, which decode queries by interpreting context
as well as the syntax and semantics of natural language. Another useful feature
is the ability to provide different views into the same data by overlaying
multiple independent taxonomies, or facets. For example, a database of music
could have separate facets organized by genre, year created, and record label.
Facets also allow information to be labeled and organized differently by and for
various groups, reflecting the different perspectives and needs of each (such as
customers, sales staff, support staff, and scientists). Thesauri and ontologies
are similar to taxonomies; the main difference is their addition of
non-hierarchical relationships between terms. These can be simple "see also"
associations, or explicitly described relationships, such as "written by."
The Venn diagram in Figure 1 shows relatedness across different types of taxonomies.
Figure 1. Taxonomies Venn Diagram
Source: TaxoTips
Several tasks are involved in creating, applying, and maintaining a taxonomy of business information. A business must first
create the taxonomy framework by determining a suitable structure for the data
it has accumulated or will accumulate. This structure should capture the
relationships inherent in the body of information in an intuitive way as well
as reflecting how the information fits into the overall structure of the
business. The taxonomy framework will then need to be updated regularly to
remain relevant and useful as new information is incorporated and as changes
occur in terminology, technology, and markets.
Once the categories have been determined, a business must populate the
taxonomy by assigning each piece of content a place where it belongs. After an
initial large-scale classification or reclassification of existing content,
there will be an ongoing need to classify each new piece of information. Classification
can be immensely time-consuming, not only because of the ever-increasing
volumes of information to be catalogued but also because of difficult and
possibly controversial decisions concerning where to place certain items.
Once a taxonomy is created and populated, it must
be integrated into the business to improve users’ ability to find and make
sense of the information they need. This often includes helping them find the
right information when they are not sure what they are looking for and perhaps
are not aware of what information is available. Another benefit of a
well-designed taxonomy is to help users be confident when a search fails that
the looked-for information is really not there so they can look elsewhere
instead of continuing a fruitless search.
Most businesses are aware of these benefits but nevertheless may not
catalog the data they accumulate in a consistent, timely, comprehensive manner.
This might be because the process is too time-consuming and labor-intensive or
because more immediately urgent tasks intervene. Even when a business does
make the effort to organize its information, it can be difficult to ensure that
all the data is included and categorized consistently across business units in
a way that will be logical and understandable to its intended users.
Current View
[return to top of this report]
To reduce the time expenditure and improve the consistency of their
information management and classification processes, many business use taxonomy
management software, which can help with creating, implementing, and
maintaining taxonomies. While vendors label their products by different names –
including business semantics modeling, knowledge organization system,
controlled vocabulary, thesaurus, ontology, and metadata model – they have
enough similarity to all be categorized as enterprise taxonomy management
software. Table 1 lists some vendors of this software genre, along with their
products. Several taxonomy management software companies are included in
KMWorld’s 2018 list of “100 Companies That Matter in Knowledge Management,”
compiled with input from editors, analysts, experts, and users, with a focus on functionality,
success with clients, creativity, and innovation.1 Capabilities offered by these companies to
enhance knowledge management include artificial intelligence, machine learning, and digital assistants.
Vendor |
Products |
---|---|
Concept |
Information |
Coveo |
Intelligent search technology for CRM, customer |
CuadraSTAR (Lucidea) |
Automatic dynamic indexing of newly entered or modified |
Data |
Taxonomy/thesaurus |
Expert System |
Automatic classification and categorization of content. Customizable taxonomy creation. |
Mondeca |
Ontology and taxonomy creation and management, terminology |
PoolParty |
Taxonomy |
Smartlogic |
Automatic content classification, text analytics and information |
Synaptica |
Taxonomy |
Wordmap |
Taxonomy creation and management. |
Importing Existing Taxonomies
Most taxonomy management software allows users to import, convert, and modify
existing taxonomies. These could be from databases or classifications that a
company already maintains or published third-party taxonomies, which can be
found for many business, medical, scientific, engineering, and public policy
topic areas. Some vendors of taxonomy management software, e.g. Data Harmony,
make available a selection of predefined taxonomies, which can then be
synchronized to create a single enterprise-wide taxonomy; once the data have
been imported and converted, responsibility for subsets of the taxonomy can be
distributed to relevant subject matter experts within the company for further
customization. Taxonomy Warehouse offers a directory of available taxonomies, as well as information
about taxonomy-related blogs, events, books, standards, software products, and consulting services.
Automatic Generation
Some software automatically generates a taxonomy by
using natural language processing and/or statistical clustering to analyze the
topics and subtopics found in a company’s documents without human analysis.
For example, Concept Searching and Wordmap offer text
mining and automated categorization tools. Taxonomy software can also review
existing categorized content and suggest new categories to add. The
suggested taxonomy may then be manually adjusted, but beginning with automatic
generation can provide a big head start on the process.
Automatic Classification
Once a taxonomy has been created, taxonomy software
can use natural language processing, semantic analysis, and/or statistical
pattern matching to analyze each body of text and then assign it to a place in
the taxonomy by attaching a metadata tag. Concept Searching, CuadraSTAR, and Smartlogic are
among the vendors providing automatic classification of content. An option is
always provided to override or modify the resulting classification, and
problematic documents are automatically set aside for manual classification.
Integration and Application
Taxonomy software can be either a standalone system or a module of a
complete information storage and retrieval system. Most standalone systems can
integrate with or send output to content management, portal, and other
enterprise management systems. Storing the taxonomy shell, or definition, separately
from the content allows various applications and groups to share it.
Searching and Browsing
A taxonomy can be integrated with a navigation
system to aid searching and browsing. With some databases, the user first searches
the system’s thesaurus to find the appropriate term for his or her search.
Other systems automatically expand each search to include alternative terms. If
a database has been indexed, the software can convert the user’s term to the
preferred term and search only on it, improving search speed. A taxonomy can
also improve searching by showing a term’s categories and subcategories and
allowing the user to browse the taxonomy from there.
Many taxonomy management vendors – including CuadraSTAR,
Mondeca, PoolParty, Smartlogic, and Synaptica –
enhance and simplify information retrieval using semantic search and
context-sensitive search algorithms that interpret queries in natural language.
Workflow Automation
The text analytics in taxonomy software can also be used to streamline
workflow within a business by enabling automatic filtering, routing,
notifying, and responding for documents, e-mails, and customer interactions,
based on their content and other characteristics.
Maintenance
Taxonomies require regular maintenance in order to remain relevant and
up-to-date. New content must be incorporated into existing or new categories
(if the taxonomy management software does not automatically categorize it). To
keep the taxonomy current in terminology and associations, subject matter
experts within the company can be assigned responsibility and security
authorization to maintain the sections where they have expertise. To help
maintain internal consistency, the software can automatically update the other
half of a reciprocal relationship when a change, addition, or deletion occurs.
Reports and graphical representations of hierarchies and maps (with drag and
drop interfaces) can also aid taxonomy maintenance. For example, reports can
show broken links, linkless nodes, and who made what
changes.
Outlook
[return to top of this report]
Ontologies and Topic Maps
New technology is constantly being developed to help users manage and make
sense of their vast and increasing information resources. Taxonomy management
software in particular is becoming more powerful as it gains the ability to
represent the complex, specific relationships of a thesaurus or ontology.
A thesaurus combines a taxonomy with a "see
also" relationship that can link terms across hierarchies. This allows terms
to be related even though they are not synonyms or in the same hierarchy: e.g.
denim and jeans. A thesaurus may also include scope notes that define each term
and its usage.
An ontology is similar to a thesaurus but far
richer and more general, enabling an unlimited number of user-defined
relationships. For example, a business can be "located in" a city
while a disk drive is a "component of" a computer. An identity
relationship can specify that two terms in different ontologies
refer to the same subject so that the two ontologies
can use each other’s information and, in effect, function as a single ontology.
In this way, an ontology supports knowledge reuse and
scalable knowledge construction and becomes a valuable resource in itself based on its
detailed representation of the knowledge in a company or a subject area.
Topic maps express ontological concepts and relations in a standard,
computer-understandable language that contains the following defined terms:
- Each
topic can belong to multiple topic types (such as "food" or
"composition"). - Topics
can have multiple base names, such as formal and informal, each with
variants by context, such as language, style, or historical period. - An
occurrence of a topic can be categorized by type (such as
"article" or "mention"). - Associations
specify relationships between topics, such as "was influenced
by" or "takes place in," but they do not specify direction:
Who influenced whom is specified by the association role of each topic.
Associations and association roles can be categorized by type.
Themes limit the scope in which names, occurrences, and associations are
assigned to topics. For example, the name "bank" could be assigned to
two topics with the themes "finance" and "river." This
disambiguation aids topic map merging, improves navigation, and greatly reduces
the irrelevant information that often overloads search results. Themes also
support multilingual information, e.g. assigning the name "livre" to the topic "book" with the scope
limited to the theme "French."
If two topic maps use different topics for the same subject (e.g.
"Italia" and "Italy"), they can be joined in one of three
ways: Both topics pointing to the same web address for their subjects; both
having the same base name in the same scope; or both pointing to the same
subject indicator. A subject indicator can be an official public document such
as an ISO standard, a definition within one of the topic maps, or a published
subject indicator set up to enable knowledge interchange and merging.
The topic map is a relatively recent tool that is still being refined and
expanded. To facilitate topic map interchange, merging, and portability, OASIS
(a vendor consortium promoting open standards) has developed recommendations
for unique published subjects, which allow topic maps to overlay new
information sources. Topic map portability enables business opportunities
related to the creation and sale of proprietary topic maps, designed to overlay
and to extract knowledge from multiple information pools.
To ensure that each topic map remains consistent, valid, and logical, ISO
has created a draft of the Topic Maps Constraint Language (TMCL), a standard
constraint language that will formally restrict the allowed structure of a data
set. For example, it could be stipulated that only persons may be married.
Without such constraints, the creator of a topic map may use terms
inconsistently, use associations that don’t make sense, or leave out necessary
information.
In addition, to simplify and stimulate application development, ISO has
created a working draft of the Topic Maps Query Language (TMQL), a dedicated
retrieval and query language that will make it easier to extract information
from topic maps.
Intelligent Search
As intelligent search technologies mature, it may become possible to achieve the
benefits of a well-organized taxonomy without the labor-intensive set up and maintenance.
Coveo, for example, recently released a cloud-based, self-learning search service that is
designed to automatically analyze search behavior patterns and intuitively enhance search
results to deliver the most relevant search results without the need for comprehensive
metadata projects, deep taxonomies, and manual search tuning. With such a system, a company
would still need to create a taxonomy but a far simpler and less labor-intensive one.
Recommendations
[return to top of this report]
There is no question that a business must keep its information organized and
accessible; however, implementing a software-based taxonomy management system
may not be appropriate in every case. A taxonomy is,
after all, really nothing more than a filing system. Some businesses may be
functioning well enough with their current filing methods that they would not
benefit enough from a taxonomy management system to justify the cost of
implementation. Nevertheless, given the accelerated pace at which information
currently accumulates, most companies would benefit from implementing at least
a basic electronic taxonomy.
Implementing a taxonomy can improve the navigation of a company’s Web site
and help customers find product information faster and more reliably, leading
to increased sales and better customer relations. A Forrester research report
found that "poorly architected retailing sites" sell only half as
much as better sites. And it is very important to help users find information
on the first try: In one study of users whose searches failed, 47 percent gave
up after just one search and only 23 percent tried three or more times.
Another study of e-commerce sites showed that users find desired information
only 34 percent of the time with a simple search but 54 percent of the time
using a taxonomy. Internally, a taxonomy can improve productivity and
customer service by helping employees find information faster. Finally, the taxonomy can itself
become a valuable resource representing the company’s accumulated knowledge.
Nevertheless, the efficiency gains from a taxonomy management system can in
some situations be offset by its labor-intensive setup. Also, taxonomies must
be frequently maintained and updated or they can be worse than useless: Users
may assume that information does not exist if the taxonomy does not reflect it.
Subject areas where content and terminology are changing fast will require
even more effort to keep a taxonomy up to date. Each
company should therefore assess both the potential benefits and the costs
associated with implementing and maintaining an electronic taxonomy. For some companies, it may
make more sense to keep their taxonomy simple and add intelligent search capabilities.
If a business does decide to implement a taxonomy management system, it
should not underestimate the effort that will be involved in creating and
maintaining a taxonomy. A system that offers labor-saving options such as importing existing taxonomies
and automatic taxonomy creation and document classification may be worth
paying for. And even if only a simple taxonomy is
currently needed, it would be wise to choose a system that has the flexibility
to grow with the business and that will be compatible over time with the trend
toward richer representations such as ontologies and
topic maps.
References
- 1Sandra Haimila. “100 Companies that Matter in Knowledge Management 2018.” KMWorld (Volume 27, Issue 2). March 8, 2018.
Web Links
[return to top of this report]
- Concept Searching: http://conceptsearching.com/
- Coveo: http://coveo.com/
- CuadraSTAR (Lucidea): http://cuadra.com/
- Data Harmony: http://www.dataharmony.com/
- Expert System: http://www.expertsystem.com
- ISO: http://www.iso.org/iso/home/standards.htm
- KMWorld: http://www.kmworld.com/
- Mondeca: http://www.mondeca.com/
- OASIS: http://www.oasis-open.org/
- PoolParty: http://poolparty.biz/
- Smartlogic: http://www.smartlogic.com/
- Synaptica: http://www.synaptica.com/
- Taxonomy Warehouse: http://www.taxonomywarehouse.com/
- TaxoTips: http://www.taxotips.com/
- Wordmap: http://www.wordmap.com/
About the Author
[return to top of this report]
Betsy Walli is a licensed marriage and family therapist and an independent writer
and editor with experience in academic, technical, and marketing topics. Dr. Walli holds a
masters degree in counseling from California State University, Fullerton, and a Ph.D. in
linguistics from the Massachusetts Institute of Technology.
[return to top of this report]