PDF version of this report
You must have Adobe Acrobat reader to view, save, or print PDF files. The reader
is available for free
download.
Simple Knowledge Organization System
Copyright 2018, Faulkner Information Services. All
Rights Reserved.
docid: 00011562
Publication Date: 1809
Report Type: TUTORIAL
Preview
It can be challenging and costly to
integrate and use business information from diverse corporate databases and from
external sources such as partners’ databases and the Internet. The Semantic Web
creates a universal medium for exchanging data, so information from different
formats can be automatically shared, processed, integrated, and reused by
disparate organizations and individuals. The Simple Knowledge Organization
System (SKOS) provides a low-cost migration path for porting existing databases
to the Semantic Web. Applications and services based on SKOS can greatly
improve the efficiencies of in-house operations, scientific research,
business-to-business interactions, and consumer services and applications.
Report Contents:
Executive
Summary
[return to top of this report]
Many organizations find it nearly impossible
to access and integrate all the business information that is available to them
– data that is managed and stored in unconnected, incompatible, or obsolete
corporate databases and software programs, as well as in external sources such
as partners’ databases and the Internet.
Related Faulkner Reports |
Semantic Web Tutorial |
The Semantic Web addresses this problem by
creating a universal medium for the exchange of data, so information in
different locations and formats can be automatically shared, processed,
integrated, and reused by disparate organizations and individuals. The ultimate
goal of the Semantic Web is to allow all the information a user has the right
to access, both within and outside the enterprise, to function as a single
virtual database. Although it is still being refined, the technologies and
standards already in place have allowed the development and implementation of
new applications and services that take advantage of the Semantic Web for
information sharing. These new approaches can improve the efficiencies of
in-house operations, scientific research, business-to-business interactions,
and consumer services and applications.
The Semantic Web organizes information into
an ontology, which can represent an unlimited number of user-defined
relationships and hence can express complex conceptual structures. For example,
an ontology could represent that a city is located in a country, while a steering
wheel is a component of a car.
However, much of the business information available in existing databases is in
the simpler form of a thesaurus, which can represent only a small, predefined
set of possible relationships between terms (broader than, narrower than,
and related to). Thesauri are
designed mainly for information retrieval, while ontologies
are designed for comprehensive representation of a body of knowledge.
Before the availability of the Simple
Knowledge Organization System (SKOS), these existing thesaurus-style databases
could not easily be converted for use on the Semantic Web. Conversion required
labor-intensive hands-on guidance because their less specific semantic
relations could map to numerous possible relations in the formal ontologies expressed in the Web Ontology Language (OWL).
SKOS provides a simpler, more easily
implemented way to make existing thesauruses and concept schemes available to Semantic
Web applications, and to develop new knowledge organization systems. SKOS
provides a common data model for expressing the basic structure and content of
various concept schemes, so they can be read, shared, combined, linked, and
searched by the same software. By providing a low-cost migration path for
porting existing knowledge organization systems to the Semantic Web, SKOS
provides a bridge between different communities in library and information
sciences, and between these communities and the Semantic Web. SKOS vocabularies
can also be incorporated or extended into more complex vocabularies, including
OWL ontologies.
Description
[return to top of this report]
A common problem faced by many organizations
today is the inability to easily access and integrate all the business
information that is available to them – data that is managed and stored in
disparate, incompatible, unconnected, and sometimes obsolete corporate
databases and software programs, as well as in external sources such as
partners’ databases and the Internet. The volume of such data is vast and
constantly growing, and it is often simply not feasible to integrate it
manually.
Semantic Web
The Semantic Web addresses this problem by
creating a universal medium for the exchange of data, so information can be
automatically shared, processed, integrated, and reused by disparate
organizations and individuals. The ultimate goal of the Semantic Web is to
allow all the information a user has the right to access, both within and
outside the enterprise, to function as a single virtual database. Although it
is still being refined, sufficient technologies and standards are in place to
allow the development and implementation of new applications and services that
are greatly improving the efficiencies of in-house operations, scientific
research, business-to-business interactions, and consumer services and
applications.
The crucial feature of the Semantic Web is
that the interconnected network of data is understandable by machines, not just
by humans. It uses a common data model and data sharing
standard to define and provide context for each bit of data in a corporate
database, software program, or Web site. Tags attached to bits of data show the
data’s meaning and relationships to other data, as well as indicating when two
sources are referring to the same thing.
The Semantic Web is built on the following
components:
- A
common language for representing data. In the Resource Description Framework (RDF), each
piece of data is identified by a unique Universal Resource Identifier
(URI). URIs can be assigned by standards organizations, communities, or
individuals. (A web addresses is a special form of URI.) For
example, a pointer to the Wikipedia entry for whale could be used to represent the concept of whale, using
the URI http://en.wikipedia.org/wiki/whale.
A link that specifies a type of relationship between two pieces of data is
also identified by a URI, and connections between pieces of information
are shown in triples of data-link-data, where each element of the triple
is a URI. For example, an online reference to Shamu,
a reference to the relationship is a,
and a reference to the concept of whale
could be joined in a triple. Such information about information is called
metadata. Information encoded in RDF can be passed between computer
applications, and used in distributed, decentralized applications that
harvest metadata from multiple sources. - A
means for translating information from different databases into common
terms. Ontology languages allow
individuals or groups to define frequently used terms and data within a
subject area, and the relations among those items. The result can be a
simple hierarchical taxonomy or a complex and rich ontology, depending on
the types of relations expressed and the extensiveness of the linking. The
Web Ontology Language (OWL) is the standard ontology language for the
Semantic Web, compatible with and understood by RDF. The Simple Knowledge
Organization System (SKOS) is a simplified version, also in RDF, that
represents the more limited relationships of a thesaurus. - Rules
for reasoning about information.
Inference engines provide rules for reasoning about the information in ontologies, and for finding new relations among the
terms and data in them. For example, given the triples Shamu -is a-whale and A whale-is
a-mammal, an inference engine could conclude that Shamu
is a mammal.
Simple Knowledge
Organization System (SKOS)
An ontology can represent an unlimited number of user-defined
relationships and hence can express complex conceptual structures. A thesaurus,
on the other hand, includes only a small, closed set of possible relationships
between terms: broader than, narrower than, and related to. For example, a thesaurus might represent that city is a narrower term than country, and steering wheel is a narrower term than car, while an ontology would be more
specific, representing that a city is located
in a country, while a steering wheel is a component of a car. Figure 1 shows how relationships are expressed
in an ontology.
Figure 1. Ontology
Much of the business information available
in existing databases is in the simpler form of a thesaurus, with informal
hierarchical relationships and associations designed mainly for information
retrieval, rather than the comprehensive representation of a body of knowledge
possible with an ontology. For example, the Dewey Decimal
System and the Library of Congress classification system are simple thesauri
for organizing large collections of objects for navigation and information
retrieval. Figure 2 shows the simple classification of information used in a
thesaurus.
Figure 2. Thesaurus
Before the availability of SKOS, these
existing knowledge structures could not easily be converted for use on the
Semantic Web. They cannot be converted automatically to formal OWL ontologies because their less specific semantic relations
could map to numerous possible relations in an ontology.
Such mapping must be guided by human intervention to prevent nonsensical
conclusions caused by inappropriate mappings of the limited thesaurus relations
to the formal semantics of the OWL ontology.
SKOS was designed to provide a simpler, more
easily implemented way to make existing concept schemes available to Semantic Web
applications, and to serve as a lightweight, intuitive language for developing
new knowledge organization systems. SKOS provides a common data model for
expressing the basic structure and content of concept schemes such as thesauri,
classification schemes, subject heading lists, taxonomies, folksonomies,
and other types of controlled vocabularies, so they can be read, shared,
combined, linked, and searched by the same software. By providing a low-cost
migration path for porting existing knowledge organization systems to RDF, SKOS
provides a bridge between different communities in library and information
sciences, and between these communities and the Semantic Web. Figure 3 (created by Pool Party)
shows SKOS at the intersection of librarians and taxonomists, data
engineers and artificial intelligence, and computational linguists and information managers.
Figure 3. SKOS in the Intersection
SKOS vocabularies can also be incorporated or extended into more complex vocabularies, including
OWL ontologies. Figure 4 shows the thesaurus from
Figure 2 converted into an ontology format to allow integration with other
concept schemes.
Figure 4. Thesaurus Converted into Ontology Format
Current
View
[return to top of this report]
SKOS was developed by the
Semantic Web Deployment Working Group (SWDWG), with input from a broad community of interested
parties. The final W3C Recommendation was published in August 2009. SKOS
defines the following classes and properties to represent the common features
found in a standard thesaurus.
Classes and Properties |
Features |
---|---|
Uniform Resource Identifier (URI) |
Each concept (concrete or abstract) is |
Preferred |
Each concept can be associated with a |
Alternate |
Each concept can be labeled with multiple |
Hidden Label |
Each concept can be labeled with hidden |
Notes |
Each concept can be documented with |
Semantic |
Each concept can be linked to other |
Concepts can be aggregated into distinct
concept schemes, which are structured sets of concepts, and each concept can
belong to more than one concept scheme. Thus the same concept can be
cross-referenced, virtually located in multiple places at once. There is more
freedom with this form of classification than is possible when books must be physically
located in one section of a library or another.
Outlook
[return to top of this report]
Various resources and tools are available to
help companies implement and benefit from SKOS technology, including programs
that convert existing thesauri into SKOS format, and publicly availably SKOS
data sources. Many of these are listed on the wiki page maintained by the World
Wide Web Consortium. Available tools include the following:
- ThManager, an open-source application
for creating, editing, visualizing, and managing SKOS vocabularies,
including importing and exporting thesauri. - The SKOS
Primer, providing examples and guidance to help implementers represent
and publish their concept schemes as SKOS data. - The Intelligent
Topic Manager from Modeca, a SKOS-compliant
tool that helps an enterprise create, maintain, and interlink complex
knowledge structures. - From PoolParty, a SKOS-compliant platform
that helps an enterprise integrate and derive
value from heterogenous and volatile data. - Data Harmony (from Access Innovations, Inc.) offers several SKOS-compliant tools for creating, maintaining, and
exporting organized data. - From TemaTres, a tool for managing controlled
vocabularies, taxonomies, and thesauruses, including the ability to export to
SKOS-core and other formats. - Lexaurus from Knowledge Integration, a thesaurus
management system that supports SKOS. - TopBraid from TopQuadrant, a
SKOS-compatible tool that helps develop and manage interconnected enterprise
vocabularies, taxonomies, thesauri and ontologies. - Fluent Editor 2014 from Cognitum, an editor for working with
ontologies in the SKOS format.
Several of these tools provide automatic classification of content. In Figure
5 below is shown an example of automatic classification using TopQuadrant’s TopBraid Tagger.
Figure 5. Automatic Classification
These tools enable SKOS vocabularies to be exported, exposed, and/or published in several metadata schemas, including the following:
- Skos-Core (Simple Knowledge Organization System)
- BS 8723 (Structured Vocabularies for Information Retrieval)
- Dublin Core (ISO 15836-2003)
- MADS (Metadata Authority Description Schema)
- TopicMaps (ISO/IEC 13250:2003)
- IMS VDEX Scheme (Vocabulary Definition and Exchange)
- WXP WordPress XML
- TXT
- SQL
- JSON and JSON-LD
- Zthes
Large vocabularies in the SKOS format that
are already available in the public domain include the following:
- AGROVOC, a multilingual structured thesaurus of all
subject fields in agriculture, forestry, fisheries, food security,
sustainable development, nutrition, land use, rural livelihoods, and
related domains, used to index and search documents, Web pages, and
digital objects. It was developed by the Food and
Agriculture Organization of the United Nations (FAO) and the Commission of
the European Communities. AGROVOC contains over 35,000 concepts in
up to 29 languages. AGROVOC is aligned with 18 other multilingual knowledge organization systems. - GEMET (GEneral Multilingual Environmental Thesaurus), a
compilation of several multilingual vocabularies that aims to define a core
general terminology for themes related to the environment, including air
pollution and climate change mitigation; biological diversity; climate change
impacts, vulnerability and adaptation; ETC WMGE Waste and Material in Green
Economy; Inland, Coastal and Marine waters; and Urban, Land and Soil systems.
Published and managed by the European Environment Information and Observation
Network, it is available in 29 languages and contains over 6,000 descriptors. - The UNESCO
thesaurus, which includes a structured list of more than 7,000 general
descriptors in English, Russian, French and Spanish for indexing and
retrieving literature in the fields of education, science, social and
human science, culture, communication, information, politics, law, and
economics. It also includes the names of countries and groupings of
countries by political, economic, geographic, ethnic, religious, and
linguistic categories. - GeoNames, which contains over 10 million geographical names
in various languages, corresponding to over 9 million unique features,
categorized into nine feature classes and 645 feature codes, including
latitude, longitude, elevation, population, and postal codes. - ISO 639-2, which contains codes for
the names of the spoken languages of the world. - BARTOC (The Basel Register of Thesauri, Ontologies & Classifications)
aims to foster the sharing of knowledge by listing as many Knowledge
Organization Systems as possible, and making them searchable in 20 European languages with four
search options by keyword, taxonomy term, map, and title. BARTOC has currently indexed 2,869 vocabularies and 89 registries.
You can also take a course to learn Semantic Web technology, such as the courses offered by DATAVERSITY, PoolParty, or the Dublin Core Metadata Initiative (DCMI).
Recommendations
[return to top of this report]
The grand promise of the Semantic Web as a
seamless world-wide virtual database may not be realized soon; nevertheless, in
limited domains and in projects with limited scopes, such as among specialized
communities and in intra-company projects, Semantic Web technologies have
proven practical and useful. With the development of SKOS as a low-cost bridge,
more companies will be able to migrate their data to the Semantic Web and begin
reaping its benefits. Following are some guidelines for implementing SKOS
within an enterprise:
- Begin with a small pilot project to test how the technology
will work and to refine objectives before investing time, money, and resources
on an enterprise-wide level. - Enlist the assistance of a Semantic Web expert, because the
technology is complicated and new, and existing IT staff may not have the
expertise to implement it without outside help. - Take a course to learn Semantic Web technology.
- Keep in mind that even with Semantic Web technology,
information can still be missed, and it may be available through other channels
even if it is not reflected in the Semantic Web application. - Because information that is aggregated and repackaged can
appear more reliable and legitimate than it really is, choose Semantic Web
tools that are transparent, providing a way for users to view the source of
each piece of information in order to ascertain how reliable the data is.
Web Links
[return to top of this report]
- AGROVOC: http://aims.fao.org/standards/agrovoc/
- BARTOC: http://bartoc.org/
- Cognitum: http://www.cognitum.eu/
- Data Harmony: http://www.dataharmony.com/
- DATAVERSITY: http://www.dataversity.net/
- Dublin Core Metadata Initiative (DCMI): http://www.dublincore.org
- GEMET: http://www.eionet.europa.eu/gemet
- GeoNames: http://www.geonames.org/
- ISO: http://www.iso.org/
- Knowledge Integration: http://www.k-int.com/products/lexaurusbank
- PoolParty: http://www.poolparty.biz/
- Mondeca: http://www.mondeca.com/
- SKOS: http://www.w3.org/2004/02/skos/
- SKOS Community Wiki: http://www.w3.org/2001/sw/wiki/SKOS
- SKOS Reference: http://www.w3.org/TR/skos-reference/
- TemaTres: http://www.vocabularyserver.com/
- ThManager: http://thmanager.sourceforge.net/
- TopQuadrant: http://www.topquadrant.com/
- UNESCO Thesaurus: http://databases.unesco.org/thesaurus/
- World Wide Web Consortium: http://www.w3.org/
About the Author
[return to top of this report]
Betsy Walli is a licensed marriage and family therapist and an independent writer and editor
with experience in academic, technical, and marketing topics. Dr. Walli holds a masters degree in counseling from California
State University, Fullerton, and a Ph.D. in linguistics from the Massachusetts
Institute of Technology.
[return to top of this report]