PDF
version of this report
You must have Adobe Acrobat reader to view, save, or print PDF files.
The reader
is available for free
download.
The Semantic Web
Copyright 2018, Faulkner Information Services. All Rights Reserved.
Docid: 00011470
Publication Date: 1809
Report Type: TUTORIAL
Preview
Development work on exposing
Web-based information to more sophisticated methods of
searching and organizing has reached the point that practical
applications can be created. Much of the behind-the-scenes technical
work is still ongoing, however, and completing a project requires
knowledge of a new array of specifications and languages. Organizations
are thus left with the choice of remaining on the sidelines and
potentially falling behind competitors or entering new territory where
there are a limited number of models of successful applications.
Report Contents:
- Executive
Summary - Description
- Current
View - Outlook
- Recommendations
- References
- Web Links
- Related Reports
Executive Summary
[return to top
of this report]
Today’s Web is composed mostly of unstructured data.
HTML
pages can be
searched via keyword queries, but this technology is limited. These
searches cannot identify the type of information on a page; for
instance, they cannot determine that a string of text is a person’s
name or that it is the price of a product. Therefore, unlike a
database, information on a Web page cannot be
automatically related to other information to
extract different pieces of data about the same person and combine that
information into a single personal profile. The vision of the semantic
Web is to establish such capabilities.
The semantic Web
has progressed significantly in recent years, in part because of the
development of standards like SPARQL
and the Web Ontology
Language. This
progress has opened
the doors for developers to solve problems that could not be otherwise
addressed and to create applications that are not possible with
conventional Web technology. But for people who are not
developers, there is not an
easy way, such as a turnkey commercial product, to get started on
semantic work. And some skeptics continue to question whether the
semantic Web will expand beyond a narrow niche.
Description
[return
to top
of this report]
All
content looks the same to conventional Web technology. But unlike
conventional Web pages, semantic
sites provide more than undifferentiated text. The name of
a product can be tagged to indicate its category (e.g., clothing,
furniture), for example, and a number displayed next to the product
name can be
identified as a price by a search utility. Computers
can then
use
this semantically meaningful content to automatically manipulate,
combine, compare, and sort information.
The
concept of a semantic Web has been around for a long time. In
recent years, however, the idea has begun to attract more interest and
is now being implemented on a modest scale. These developments have
been
enabled by the establishment of some key standards. The descriptive
schema
that has been developed for the semantic Web is the Resource
Description Framework (RDF), a W3C standard that covers the labeling
and description of data. A central part of RDF is defining
various Web resources
so that any system can determine what that resource is. For instance, a
resource may be a particular product or it may be a price that applies
to a product. In order to apply these definitions, RDF uses uniform
resource identifiers (URIs). The term URI is unfamiliar to many people,
but
the concept is widely known because of the term “uniform resource
locator” (URL), which is synonymous with a Web page’s address. A URL is
one type of URI. It is intended specifically for finding files on the
Web. The Internet Society defines a URI as “a compact sequence of
characters that identifies an abstract or physical resource.” This
specification defines the generic URI syntax and a process for
resolving URI references that might be in relative form, along with
guidelines and security considerations for the use of URIs on the
Internet.
There is an alternative to RDF,
called “OWL,” that is intended
for use when more flexibility is needed. Like
RDF, OWL
is a system for categorizing and defining
information, but it is more granular and flexible. Less mature than
RDF, it is envisioned for use in sub-communities of the semantic Web,
for cases in which RDF alone does not provide enough
flexibility.
A W3C
working group created OWL 2, which slightly changed and added to
OWL.1 A
review of the changes in OWL 2, along with other semantic Web
developments, is available in a talk given by the W3C’s Ivan Herman.2
OWL 2 has three
profiles
that
function as “trimmed down” sublanguages that are more efficient for
certain types of tasks:
- OWL
2 EL, for “ontologies
that define very large numbers of classes and/or properties” - OWL 2 QL, for applications
with “very large volumes of instance data” - OWL 2 RL, for “applications
that require scalable reasoning”3
The sublanguages of
the
first
version of OWL – OWL Lite and OWL DL – did not see extensive use.
Whereas
databases are commonly searched using the query language SQL, there
is a query language specifically for the semantic Web: SPARQL.
It is a
W3C standard
that enables queries to made across RDF data sources,
both data that is in itself stored in RDF form and data that is
viewed indirectly in RDF form. It also defines
the XML format in which query results will be returned. As
described by IBM, a
SPARQL query representing the English sentence “Find the URL of the
blog by the person named Jon Foobar” would be represented as follows:
PREFIX foaf:
<http://xmlns.com/foaf/0.1/>
SELECT ?url
FROM <bloggers.rdf>
WHERE {
?contributor foaf:name “Jon Foobar” .
?contributor foaf:weblog ?url .
}4
Yahoo! software
architect Dave
Beckett summarizes the structure of SPARQL as follows:5
Prologue (optional) |
BASE <iri> |
PREFIX prefix: <iri> (repeatable) |
|
Query Result forms (required, pick 1) |
SELECT (DISTINCT )sequence of ?variable |
SELECT (DISTINCT)* |
|
DESCRIBE sequence of ?variable or <iri> |
|
DESCRIBE * |
|
CONSTRUCT { graph pattern } |
|
ASK | |
Query Dataset Sources (optional) |
Add triples to the background graph (repeatable): |
FROM <iri> |
|
Add a named graph (repeatable): |
|
FROM NAMED <iri> |
|
Graph Pattern (optional, required for ASK) |
WHERE { graph pattern [ FILTER expression ]} |
Query Results Ordering (optional) |
ORDER BY … |
Query Results Selection (optional) |
LIMIT n, OFFSET m |
In addition to the core
specifications of OWL, RDF, and SPARQL, the semantic Web is
also supported by the following:
- Jena
– Jena is a framework,
created by the Apache Group, that helps developers work with
RDF. - Simple
Knowledge
Organization System (SKOS) – SKOS is a group of specifications,
created
using RDF, that describe semantic Web “knowledge organization systems,”
which include any type of classification system or vocabulary on the
semantic Web. (For more information about these
specifications, see the Faulkner Information Services report “Simple
Knowledge Organization System.”)
Current
View
[return to top
of this report]
Until several years ago, the
semantic Web was
primarily in a research phase. The few implementations of it were
mainly to demonstrate the potential of the idea. While some of the
activity related to semantic technology still takes place within the
research community, there are now real-world examples to use as models.6
An early example was the
Norwegian National
Broadcasters’ effort to use
the semantic Web to store extensive metadata about its collection of
recordings, with the goal of making the archives highly searchable.7
One of the more familiar of today’s semantic Web initiatives is the
Friend of a Friend Project (FOAF), which is designed to make
information
on people’s personal homepages machine readable, and thus to enable
such data to be automatically interrelated. As with many aspects of the
semantic Web, FOAF builds on Web 2.0 concepts such as personalization.
FOAF, which uses RDF, provides a set of terms that can be used to
describe people in a standard way. For instance, it defines classes
with a syntax such as “foaf: Person.” (Note that FOAF is typically
talked about and used as a way to describe people, but it has the
potential to describe other entities, such as companies.)
The efforts described above
show the potential of
the semantic Web, but
they also make it clear that only a limited amount of its potential has
yet been realized. Some of today’s semantic Web projects have minimal
“curb appeal” and limited functionality and are not user friendly enough
to expand much beyond the crowd of highly technically literate people
now using them. Also, many of today’s semantic Web projects resemble
Web 2.0 services
that are already live and being heavily used. For instance, the
semantic Web concept of tagging is being used extensively by sites such
as Delicious, and the idea of extracting and re-combining data in new
ways is being put to use by application mash-ups. The semantic Web
could eventually go beyond the Web 2.0 by creating a broader,
standardized system of interrelated information, but such a
development, if feasible, lies down the road.
Some
of the work now being done on semantic technology is demonstrated in
the
annual Semantic Web Challenge, hosted by analytics company Elsevier.
The competition pits leaders in the field against each other, and each
year focuses on a different aspect of the technology. For example, the
2017 contest
focused on knowledge graphs and was won by IBM.8
The results of these
yearly contests demonstrate the current state of the industry. In
addition, there are annual conferences in which research on semantic
technology is presented, such as the International Semantic Web
Conference and the Semantic Web Applications and Tools for Healthcare
and Life Sciences conference.
Outlook
[return
to top
of this report]
Many
of the semantic Web’s components – such as SQARQL, RDF, and OWL – were
developed
years ago and
have been usable for a long time.9
Although semantic Web
applications have been ready to be built for awhile, there is still
some technical work being done to add
capabilities and make improvements. For now, many developers may
approach semantic applications with caution, or stay on the sidelines
altogether. As touch-up work continues on semantic Web standards,
development will proceed on real-world projects that make use of
existing capabilities. The key question about the semantic Web is how
wide its scope
will be. In one vision of its future, it will remain a specialty
technology, with most of the
Web’s data being unstructured. In another vision, the Web itself will
become broadly and seamlessly semantic, with everything from social
networking to e-commerce sites using the technology.
Perhaps
the best
bellwether of the semantic Web’s prospects over the
next few years is the amount of work being done on commercial
offerings. By this measure, a significant expansion in real-world
applications does not appear to be on the horizon. There are some small
and mid-sized companies, like Cambridge Semantics and Veritas (through its 2018 acquisition of fluidOps), that are
heavily pushing the concept, but the major players in the industry have
not shown strong interest. Oracle has been one
of the biggest supporters, releasing Oracle Database 11g Semantic
Technologies and then its successor, the renamed Oracle Spatial and
Graph RDF Semantic Graph, which it bills as a platform to build and
manage semantic applications. Other industry trendsetters such as IBM
and Microsoft have the technology on their radars, but they have
largely confined their interest to their research departments. Until
several large companies release
semantic Web applications with potentially broad appeal, the pace of
the semantic
Web’s growth will likely remain measured.
There
are some companies, however, such as PoolParty, that are betting on the semantic
Web’s commercial potential and that are already offering products.
Likewise, Diffbot sells technology that creates semantic data from
ordinary Web content by scanning pages and decomposing them
into
elements such as article titles and photos.10 Diffbot’s
products are used, for example, to keep track of what people
are saying online about a particular company’s offerings and those of its
competitors. Customers include Adobe, Cisco, eBay, Microsoft, and Salesforce.11
A
likely future, at least for the next few years, is that the Semantic
Web will continue to progress and have important commercial
applications, but only for niche purposes. “Semantic Technologies will
continue to see steady growth and adoption,
but will likely never be the rallying flag on their own,” says Michael
Bergman of Cognonto, a consulting company focused on semantic
technology.12 “I think we will continue to appreciate Semantic Technologies…in service to broader needs such as
Artificial Intelligence, Machine Learning, or data interoperability.
Semantic Technologies will come to assume their natural role as
essential enablers, but not as keystones on their own for major
economic and information change.”
Recommendations
[return to top
of this report]
Most
organizations have no
need to take action on the semantic Web at this time. In fact, there is
little they could do aside from casual, preliminary research. The
applications that are available today are narrow in scope, have limited
commercial potential, and in many cases can be more effectively
achieved using common Web 2.0 techniques such as asynchronous
JavaScript and XML (AJAX) development.
Some
types of organizations could benefit from
taking a more aggressive
approach toward semantic Web technology, however. These include search
technology developers, search services, e-commerce software providers,
and database software companies. Companies in these markets may find
that their competitors have already begun work. And,
since there are not simple, turnkey tools to get semantic Web
technology up and running quickly, organizations that determine that
they could usefully apply the technology must find their own approach.
The basic structure of this process is as follows:
- Identify
a Specific Need – As W3C
director Tim
Berners-Lee has
pointed out, data
comes in many forms.13 His short list includes
government
data,
enterprise data, news, scientific data, and personal data. Trying to
convert all of this information into structured data that can be used
semantically is a big task. In most cases, better results would be
achieved by choosing just one domain of data and focusing
on building a semantic Web project around it. - Form
a
Semantic Web Team – A semantic Web team will be charged both
with understanding the business aspects of the project and with
determining how to translate the project’s business goals into
technological reality. Although the semantic Web is often considered a
developer’s concern, this team should include members who can define
and direct the strategic aspects of the project. - Select
(or Build) a Model – In some cases, organizations that are
building a semantic Web project will be able to model their work after
one of the existing case studies published online. There are now more
models to choose from than there were just a few years ago. But in most
instances, an organization will need to sketch out its own design of a
semantic Web application. - Determine
the Specifications to Use – Semantic Web specifications such
as SPARQL and OWL give developers the foundation they need to build
their projects, even if there is little guidance available about how to
specifically combine these various technologies and languages into
working parts. There are also some tools designed
for creating semantic projects.14 Before
embarking on a
project designed
for enterprise use, a semantic Web team would be prudent to employ
these tools strictly in a test environment. Once a proof of
concept
is
developed, the project can then move on to the creation of a
production-ready architecture.
References
[return
to top
of this report]
1 “Owl 2.” Semanticweb.org. October 27, 2009.
2 Ivan Herman. “A Year on the Semantic Web @ W3C.”
W3C. June 17, 2010.
3 “OWL 2 Web Ontology Language: Profiles (Second Edition).”
W3C. December 11, 2012.
4 Philip McCarthy. “Search RDF Data with SPARQL.”
IBM. May 10, 2005.
5 "SPARQL RDF Query Language Reference v. 1.7.”
Dave Beckett. 2005.
6 “Semantic Web Case Studies and Use Cases.”
W3C.
7
“Case Study: A Digital Music Archive for the Norwegian National
Broadcaster Using Semantic Web Techniques.” W3C. September 2007.
8 Elsevier. “Elsevier Announces the Winner of the 2017 Semantic Web Challenge.” Elsevier. November 13, 2017.
9 Paul Miller. “Sir Tim Berners-Lee: Semantic Web Is Open for Business.” ZDNet. February 26, 2008.
10 Barry Levine. “Data Extractor Diffbot Wants to Turn the Web into the Semantic Web.” Marketing Land. February 11, 2016.
11 Kyle Wiggers. “Diffbot Launches AI-Powered Knowledge Graph of 1 Trillion Facts About People, Places, and Things.” VentureBeat. August
30, 2018.
12 Jennifer Zaino. “Semantic Web and Semantic Technology Trends in 2018.” Dataversity. December 26, 2017.
13 Tim Berners-Lee. “Tim Berners-Lee on the Next Web.” TED. March 2009.
14 Semantic Web Wiki. “Tools.”
Web Links
[return to top
of this report]
Cambridge
Semantics: http://www.cambridgesemantics.com/
Delicious: http://delicious.com/
Diffbot: https://www.diffbot.com/
Friend of a Friend Project: http://www.foaf-project.org/
International Semantic Web Applications and Tools for Healthcare and Life Sciences Conference: https://www.rd-alliance.org/
International Semantic Web Conference: http://swsa.semanticweb.org/content/international-semantic-web-conference-iswc/
Jena: https://jena.apache.org/
Oracle:
http://www.oracle.com/
OWL 2: http://www.w3.org/TR/owl2-syntax/
PoolParty: https://www.poolparty.biz/
RDF: http://www.w3.org/RDF/
Semantic Web Applications and Tools for Healthcare and Life Science: https://www.rd-alliance.org/swat4hcls-semantic-web-applications-and-tools-healthcare-and-life-sciences/
Semantic Web
Challenge: http://challenge.semanticWeb.org/
SPARQL: http://www.w3.org/TR/sparql11-overview/
Veritas: https://www.veritas.com/
World Wide Web Consortium (W3C): http://www.w3.org/
W3C’s Data Activity blog: http://www.w3.org/blog/data/
About the
Author
[return to top
of this report]
Geoff
Keston is the author
of more than 250 articles that help organizations find
opportunities in business trends and technology. He also works directly
with clients to develop communications strategies that improve
processes and customer relationships. Mr. Keston has worked as a
project manager for a major technology consulting and services company
and is a Microsoft Certified Systems Engineer and a Certified Novell
Administrator.
[return
to top
of this report]