Simple Knowledge Organization System










PDF version of this report
You must have Adobe Acrobat reader to view, save, or print PDF files. The reader
is available for free
download
.

Simple Knowledge Organization System

by Betsy Walli

docid: 00011562

Publication Date: 1809

Report Type: TUTORIAL

Preview

It can be challenging and costly to
integrate and use business information from diverse corporate databases and from
external sources such as partners’ databases and the Internet. The Semantic Web
creates a universal medium for exchanging data, so information from different
formats can be automatically shared, processed, integrated, and reused by
disparate organizations and individuals. The Simple Knowledge Organization
System (SKOS) provides a low-cost migration path for porting existing databases
to the Semantic Web. Applications and services based on SKOS can greatly
improve the efficiencies of in-house operations, scientific research,
business-to-business interactions, and consumer services and applications.

Report Contents:

Executive
Summary

[return to top of this report]

Many organizations find it nearly impossible
to access and integrate all the business information that is available to them
– data that is managed and stored in unconnected, incompatible, or obsolete
corporate databases and software programs, as well as in external sources such
as partners’ databases and the Internet.

Semantic Web Tutorial

The Semantic Web addresses this problem by
creating a universal medium for the exchange of data, so information in
different locations and formats can be automatically shared, processed,
integrated, and reused by disparate organizations and individuals. The ultimate
goal of the Semantic Web is to allow all the information a user has the right
to access, both within and outside the enterprise, to function as a single
virtual database. Although it is still being refined, the technologies and
standards already in place have allowed the development and implementation of
new applications and services that take advantage of the Semantic Web for
information sharing. These new approaches can improve the efficiencies of
in-house operations, scientific research, business-to-business interactions,
and consumer services and applications.

The Semantic Web organizes information into
an ontology, which can represent an unlimited number of user-defined
relationships and hence can express complex conceptual structures. For example,
an ontology could represent that a city is located in a country, while a steering
wheel is a component of a car.
However, much of the business information available in existing databases is in
the simpler form of a thesaurus, which can represent only a small, predefined
set of possible relationships between terms (broader than, narrower than,
and related to). Thesauri are
designed mainly for information retrieval, while ontologies
are designed for comprehensive representation of a body of knowledge.

Before the availability of the Simple
Knowledge Organization System (SKOS), these existing thesaurus-style databases
could not easily be converted for use on the Semantic Web. Conversion required
labor-intensive hands-on guidance because their less specific semantic
relations could map to numerous possible relations in the formal ontologies expressed in the Web Ontology Language (OWL).

SKOS provides a simpler, more easily
implemented way to make existing thesauruses and concept schemes available to Semantic
Web applications, and to develop new knowledge organization systems. SKOS
provides a common data model for expressing the basic structure and content of
various concept schemes, so they can be read, shared, combined, linked, and
searched by the same software. By providing a low-cost migration path for
porting existing knowledge organization systems to the Semantic Web, SKOS
provides a bridge between different communities in library and information
sciences, and between these communities and the Semantic Web. SKOS vocabularies
can also be incorporated or extended into more complex vocabularies, including
OWL ontologies.

Description

[return to top of this report]

A common problem faced by many organizations
today is the inability to easily access and integrate all the business
information that is available to them – data that is managed and stored in
disparate, incompatible, unconnected, and sometimes obsolete corporate
databases and software programs, as well as in external sources such as
partners’ databases and the Internet. The volume of such data is vast and
constantly growing, and it is often simply not feasible to integrate it
manually.

Semantic Web

The Semantic Web addresses this problem by
creating a universal medium for the exchange of data, so information can be
automatically shared, processed, integrated, and reused by disparate
organizations and individuals. The ultimate goal of the Semantic Web is to
allow all the information a user has the right to access, both within and
outside the enterprise, to function as a single virtual database. Although it
is still being refined, sufficient technologies and standards are in place to
allow the development and implementation of new applications and services that
are greatly improving the efficiencies of in-house operations, scientific
research, business-to-business interactions, and consumer services and
applications.

The crucial feature of the Semantic Web is
that the interconnected network of data is understandable by machines, not just
by humans. It uses a common data model and data sharing
standard to define and provide context for each bit of data in a corporate
database, software program, or Web site. Tags attached to bits of data show the
data’s meaning and relationships to other data, as well as indicating when two
sources are referring to the same thing.

The Semantic Web is built on the following
components:

  • A
    common language for representing data.
    In the Resource Description Framework (RDF), each
    piece of data is identified by a unique Universal Resource Identifier
    (URI). URIs can be assigned by standards organizations, communities, or
    individuals. (A web addresses is a special form of URI.) For
    example, a pointer to the Wikipedia entry for whale could be used to represent the concept of whale, using
    the URI http://en.wikipedia.org/wiki/whale.
    A link that specifies a type of relationship between two pieces of data is
    also identified by a URI, and connections between pieces of information
    are shown in triples of data-link-data, where each element of the triple
    is a URI. For example, an online reference to Shamu,
    a reference to the relationship is a,
    and a reference to the concept of whale
    could be joined in a triple. Such information about information is called
    metadata. Information encoded in RDF can be passed between computer
    applications, and used in distributed, decentralized applications that
    harvest metadata from multiple sources.
  • A
    means for translating information from different databases into common
    terms.
    Ontology languages allow
    individuals or groups to define frequently used terms and data within a
    subject area, and the relations among those items. The result can be a
    simple hierarchical taxonomy or a complex and rich ontology, depending on
    the types of relations expressed and the extensiveness of the linking. The
    Web Ontology Language (OWL) is the standard ontology language for the
    Semantic Web, compatible with and understood by RDF. The Simple Knowledge
    Organization System (SKOS) is a simplified version, also in RDF, that
    represents the more limited relationships of a thesaurus.
  • Rules
    for reasoning about information.

    Inference engines provide rules for reasoning about the information in ontologies, and for finding new relations among the
    terms and data in them. For example, given the triples Shamu -is a-whale and A whale-is
    a-mammal
    , an inference engine could conclude that Shamu
    is a mammal.

Simple Knowledge
Organization System (SKOS)

An ontology can represent an unlimited number of user-defined
relationships and hence can express complex conceptual structures. A thesaurus,
on the other hand, includes only a small, closed set of possible relationships
between terms: broader than, narrower than, and related to. For example, a thesaurus might represent that city is a narrower term than country, and steering wheel is a narrower term than car, while an ontology would be more
specific, representing that a city is located
in
a country, while a steering wheel is a component of a car. Figure 1 shows how relationships are expressed
in an ontology.

Figure 1. Ontology

Figure 1. Ontology

Much of the business information available
in existing databases is in the simpler form of a thesaurus, with informal
hierarchical relationships and associations designed mainly for information
retrieval, rather than the comprehensive representation of a body of knowledge
possible with an ontology. For example, the Dewey Decimal
System and the Library of Congress classification system are simple thesauri
for organizing large collections of objects for navigation and information
retrieval. Figure 2 shows the simple classification of information used in a
thesaurus.

Figure 2. Thesaurus

Figure 2. Thesaurus

Before the availability of SKOS, these
existing knowledge structures could not easily be converted for use on the
Semantic Web. They cannot be converted automatically to formal OWL ontologies because their less specific semantic relations
could map to numerous possible relations in an ontology.
Such mapping must be guided by human intervention to prevent nonsensical
conclusions caused by inappropriate mappings of the limited thesaurus relations
to the formal semantics of the OWL ontology.

SKOS was designed to provide a simpler, more
easily implemented way to make existing concept schemes available to Semantic Web
applications, and to serve as a lightweight, intuitive language for developing
new knowledge organization systems. SKOS provides a common data model for
expressing the basic structure and content of concept schemes such as thesauri,
classification schemes, subject heading lists, taxonomies, folksonomies,
and other types of controlled vocabularies, so they can be read, shared,
combined, linked, and searched by the same software. By providing a low-cost
migration path for porting existing knowledge organization systems to RDF, SKOS
provides a bridge between different communities in library and information
sciences, and between these communities and the Semantic Web. Figure 3 (created by Pool Party)
shows SKOS at the intersection of librarians and taxonomists, data
engineers and artificial intelligence, and computational linguists and information managers.

Figure 3. SKOS in the Intersection

Figure 3. SKOS in the Intersection

SKOS vocabularies can also be incorporated or extended into more complex vocabularies, including
OWL ontologies. Figure 4 shows the thesaurus from
Figure 2 converted into an ontology format to allow integration with other
concept schemes.

Figure 4. Thesaurus Converted into Ontology Format

Figure 4. Thesaurus Converted into Ontology Format

Current
View

[return to top of this report]

SKOS was developed by the
Semantic Web Deployment Working Group (SWDWG), with input from a broad community of interested
parties. The final W3C Recommendation was published in August 2009. SKOS
defines the following classes and properties to represent the common features
found in a standard thesaurus.

Table 1. SKOS Standard Thesaurus Classes and Properties

Classes and Properties

Features

Uniform Resource Identifier (URI)

Each concept (concrete or abstract) is
uniquely identified by a URI. URIs should be the
primary means of reference within computer systems because they provide a
frame of reference that is outside any particular program, data set, or
thesaurus. Therefore, each reference will remain unambiguous even as data
from multiple sources are aggregated. Because each reference is universal,
and understood by any program in any context, the system is open-ended and
extensible. It is important that the URI be unique, so the same URI is not
used as the primary reference for two different concepts.

Preferred
Label

Each concept can be associated with a
preferred lexical label in one or more natural languages.

Alternate
Label

Each concept can be labeled with multiple
alternative terms such as synonyms and acronyms, each associated with a
particular natural language.

Hidden Label

Each concept can be labeled with hidden
terms that will be available to computer search applications but not
displayed in search results (e.g. common misspellings or mistypings).

Notes

Each concept can be documented with
additional information in forms such as plain text, hypertext, images, and
audio. These can include change notes, definitions, editorial notes,
examples, history notes, and scope notes.

Semantic
Relationships

Each concept can be linked to other
concepts in informal hierarchies and association networks, using semantic
relationships such as broader, narrower, or related – associations that are
inherent in the meanings of the two concepts. Also available are transitive
versions of the broader and narrower relationships.

Concepts can be aggregated into distinct
concept schemes, which are structured sets of concepts, and each concept can
belong to more than one concept scheme. Thus the same concept can be
cross-referenced, virtually located in multiple places at once. There is more
freedom with this form of classification than is possible when books must be physically
located in one section of a library or another.

Outlook

[return to top of this report]

Various resources and tools are available to
help companies implement and benefit from SKOS technology, including programs
that convert existing thesauri into SKOS format, and publicly availably SKOS
data sources. Many of these are listed on the wiki page maintained by the World
Wide Web Consortium. Available tools include the following:

  • ThManager, an open-source application
    for creating, editing, visualizing, and managing SKOS vocabularies,
    including importing and exporting thesauri.
  • The SKOS
    Primer
    , providing examples and guidance to help implementers represent
    and publish their concept schemes as SKOS data.
  • The Intelligent
    Topic Manager
    from Modeca, a SKOS-compliant
    tool that helps an enterprise create, maintain, and interlink complex
    knowledge structures.
  • From PoolParty, a SKOS-compliant platform
    that helps an enterprise integrate and derive
    value from heterogenous and volatile data.
  • Data Harmony (from Access Innovations, Inc.) offers several SKOS-compliant tools for creating, maintaining, and
    exporting organized data.
  • From TemaTres, a tool for managing controlled
    vocabularies, taxonomies, and thesauruses, including the ability to export to
    SKOS-core and other formats.
  • Lexaurus from Knowledge Integration, a thesaurus
    management system that supports SKOS.
  • TopBraid from TopQuadrant, a
    SKOS-compatible tool that helps develop and manage interconnected enterprise
    vocabularies, taxonomies, thesauri and ontologies.
  • Fluent Editor 2014 from Cognitum, an editor for working with
    ontologies in the SKOS format.

Several of these tools provide automatic classification of content. In Figure
5 below is shown an example of automatic classification using TopQuadrant’s TopBraid Tagger.

Figure 5. Automatic Classification

Figure 4. Automatic Classification

These tools enable SKOS vocabularies to be exported, exposed, and/or published in several metadata schemas, including the following:

  • Skos-Core (Simple Knowledge Organization System)
  • BS 8723 (Structured Vocabularies for Information Retrieval)
  • Dublin Core (ISO 15836-2003)
  • MADS (Metadata Authority Description Schema)
  • TopicMaps (ISO/IEC 13250:2003)
  • IMS VDEX Scheme (Vocabulary Definition and Exchange)
  • WXP WordPress XML
  • TXT
  • SQL
  • JSON and JSON-LD
  • Zthes

Large vocabularies in the SKOS format that
are already available in the public domain include the following:

  • AGROVOC, a multilingual structured thesaurus of all
    subject fields in agriculture, forestry, fisheries, food security,
    sustainable development, nutrition, land use, rural livelihoods, and
    related domains, used to index and search documents, Web pages, and
    digital objects. It was developed by the Food and
    Agriculture Organization of the United Nations (FAO) and the Commission of
    the European Communities. AGROVOC contains over 35,000 concepts in
    up to 29 languages. AGROVOC is aligned with 18 other multilingual knowledge organization systems.
  • GEMET (GEneral Multilingual Environmental Thesaurus), a
    compilation of several multilingual vocabularies that aims to define a core
    general terminology for themes related to the environment, including air
    pollution and climate change mitigation; biological diversity; climate change
    impacts, vulnerability and adaptation; ETC WMGE Waste and Material in Green
    Economy; Inland, Coastal and Marine waters; and Urban, Land and Soil systems.
    Published and managed by the European Environment Information and Observation
    Network, it is available in 29 languages and contains over 6,000 descriptors.
  • The UNESCO
    thesaurus
    , which includes a structured list of more than 7,000 general
    descriptors in English, Russian, French and Spanish for indexing and
    retrieving literature in the fields of education, science, social and
    human science, culture, communication, information, politics, law, and
    economics. It also includes the names of countries and groupings of
    countries by political, economic, geographic, ethnic, religious, and
    linguistic categories.
  • GeoNames, which contains over 10 million geographical names
    in various languages, corresponding to over 9 million unique features,
    categorized into nine feature classes and 645 feature codes, including
    latitude, longitude, elevation, population, and postal codes.
  • ISO 639-2, which contains codes for
    the names of the spoken languages of the world.
  • BARTOC (The Basel Register of Thesauri, Ontologies & Classifications)
    aims to foster the sharing of knowledge by listing as many Knowledge
    Organization Systems as possible, and making them searchable in 20 European languages with four
    search options by keyword, taxonomy term, map, and title. BARTOC has currently indexed 2,869 vocabularies and 89 registries.

You can also take a course to learn Semantic Web technology, such as the courses offered by DATAVERSITY, PoolParty, or the Dublin Core Metadata Initiative (DCMI).

Recommendations

[return to top of this report]

The grand promise of the Semantic Web as a
seamless world-wide virtual database may not be realized soon; nevertheless, in
limited domains and in projects with limited scopes, such as among specialized
communities and in intra-company projects, Semantic Web technologies have
proven practical and useful. With the development of SKOS as a low-cost bridge,
more companies will be able to migrate their data to the Semantic Web and begin
reaping its benefits. Following are some guidelines for implementing SKOS
within an enterprise:

  • Begin with a small pilot project to test how the technology
    will work and to refine objectives before investing time, money, and resources
    on an enterprise-wide level.
  • Enlist the assistance of a Semantic Web expert, because the
    technology is complicated and new, and existing IT staff may not have the
    expertise to implement it without outside help.
  • Take a course to learn Semantic Web technology.
  • Keep in mind that even with Semantic Web technology,
    information can still be missed, and it may be available through other channels
    even if it is not reflected in the Semantic Web application.
  • Because information that is aggregated and repackaged can
    appear more reliable and legitimate than it really is, choose Semantic Web
    tools that are transparent, providing a way for users to view the source of
    each piece of information in order to ascertain how reliable the data is.

[return to top of this report]

About the Author

[return to top of this report]

Betsy Walli is a licensed marriage and family therapist and an independent writer and editor
with experience in academic, technical, and marketing topics. Dr. Walli holds a masters degree in counseling from California
State University, Fullerton, and a Ph.D. in linguistics from the Massachusetts
Institute of Technology.

[return to top of this report]