Library Indexing and Classification Systems (Archived Report)

PDF version of this report
You must have Adobe Acrobat reader to view, save, or print PDF files. The reader
is available for free

Archived Report:
Library Indexing
and Classification Systems

by Jerri L. Ledford

Docid: 00011005

Publication Date: 1601

Report Type: TUTORIAL


Information is only important if it can be used.
However, sometimes it is difficult to define what constitutes
information. Volumes of data, reams of paper, stacks of books, and billions of
Web pages contain bits of information that some find useful and others do not.
The problem with this wealth of information is how to classify and index it so
that the needed pieces can be quickly located and put to
use. The Internet and Internet-based companies complicate the issue with new
indexing and classification systems appearing regularly. Combine that with
a lack of standards in the industry and, even though strides are being made to
improve the situation, finding a solution has become even more difficult.

Report Contents:

Executive Summary

[return to top of this report]

Classifying and indexing
mountainous volumes of information has always been a challenge for all sizes
and types of libraries.

Library Automation Software Tutorial

The popularity of the
Internet and the volume of information that it generates only compounds the
problem, especially for libraries that are already struggling to keep up with
all the material they receive. 

To help deal with these
problems there are many different classification and indexing systems. From the
Dewey Decimal Classification System that most people are familiar with to more
obscure classification systems such as the Colon Classification System and
systems based on categorization and creative signing, library professionals
have struggled for years to develop classification and indexing systems that
keep pace with the growing demand for information.

As the Information Age
advances and the feeding frenzy for information continues to grow, however,
some of these existing systems are proving to be far too archaic to be
useful. The World Wide Web, with its billions of Web pages presents an
especially tough challenge for classification and indexing. Because the
Internet is a dynamic collection of information, researchers are still
struggling to decide what parts of it should be included in classifications.
Furthermore, initiatives from Web-based companies like Google are changing the
way that people search for and find information.

In part, this problem is
exacerbated by an ongoing debate within the industry as to what constitutes
information, data, and knowledge. To date, no standards exist to clearly
define these terms and how – or even if – each should be classified. Until
these problems are solved, no single classification and indexing system can be
put into place.


[return to top of this report]

In the
past, information was clearly defined and in printed form. Today, there is
no clear definition of information, and it can be printed or
digital. These changes in the way information is both viewed and used require
new methods of organization.

Most current organizational
methods are based on the outdated mode of information storage and delivery to
which people have become accustomed: paper. Those organization methods –
indexing and classification, specifically – have changed and matured to some
degree. Currently, there are a number of indexing and classification systems in
use that fall into one of three categories. Those three categories of
classification include:

  • Enumerative – An alphabetical list of subject headings,
    numbers assigned to each heading in alphabetical order.
  • Faceted – Also called analytico-synthetic, this type of
    classification system divides subjects into mutually exclusive, orthogonal
  • Hierarchical – This system divides subjects in a
    hierarchy from most general to most specific.

Within these three
categories of classification, there exist several classification
systems. Some use only one type of classification, while others use
multiple types. A few are widely used, while some are
only very narrowly used by specialized organizations. Some of those
classification systems that are more widely used include:

  • Dewey Decimal Classification (DDC) System – The DDC was designed to
    facilitate the specific arrangement of books on library shelves. This
    system has been revised 22 times, with the most recent revision having
    taken place in 2004. It is a faceted classification system in which
    works are classified by subject, using extensions for subject
    relationships, place, time or type of material. The system dictates
    classification numbers of not less than three digits but which can be of
    indeterminate length with a decimal point after the third digit. While DDC
    is perhaps one of the most widely used classification systems, it is not
    specific enough to represent deep information, within a book, such as
    chapter topics. Neither can it represent the information in other types of
    works such as articles or Internet content.
  • Universal Decimal Classification (UDC) System – The UDC is built upon the
    same principals as the DDC. It never became widely popular, however, and
    has only been used by a small number of libraries in the
    past. Currently only one or two organizations use the UDC.
  • Library of Congress Classification (LCC) System – The LCC system was developed
    by the Library of Congress and is used by most of the research and
    university libraries in the US. Some other countries
    also use the LCC. This classification system divides subjects into
    broad categories, but is still enumerative in nature.
  • Colon Classification – The term colon
    classification comes from the fact that this classification system uses
    colons to separate each facet of a subject into a class. A faceted
    classification system, it divides subjects by aspect and then class
    numbers are synthesized based on the classifications. This was one of
    the first faceted classification systems. Although the system results in
    very specific call numbers that are reflective of the exact content of a
    work, the call numbers assigned to each item are too long to be useful.
  • Cutter Expansive Classification – This classification system
    uses all letters and no numbers, and is considered one of the most logical
    and expansive American classification systems. Few libraries in the US,
    however, have adopted this system. The founder’s untimely death in
    addition to the fact that he built no provisions for change into the
    system make it a system that is not well-suited to today’s information
  • Bliss Bibliographic Classification (BC2) – This classification system,
    which was designed by Henry Bliss has changed dramatically since the first
    publication of the system between 1940 and 1953. The current
    iteration of the classification system is a fully faceted classification
    scheme that provides a detailed classification for information of all

Each of these different
classification systems uses a different approach to classifying information,
and some of the systems are designed to classify information of different
types: printed, audio, visual, and digital. The increasing popularity of
the Internet and the immense quantity of information that it generates
challenges even the best of these classification systems.


[return to top of this report]

There are many challenges
to classifying and indexing mountainous volumes of information. Among the
top challenges is the lack of standards within the industry. Combine that
lack of standards with the dynamic nature of today’s information and a nearly
impossible problem presents itself: How do you know what information needs to
be classified or indexed for permanent, or at the very least, long-term,
inclusion in a collection? For example, the rise of blogs (a regularly written
journal on the Web = Web log = blog) is a particular area of debate. Do blogs
provide information that should be classified or indexed, or should the blog be
omitted from a collection? Blogs are just one example of the dynamic nature of
information in today’s information-driven society.

That brings yet another
issue to light. The language by which information is defined is highly-debated
within the industry. What constitutes information? When should information
be classified as knowledge? Can information be classified or should
knowledge be classified. Again, this issue is more a matter of standards than
useful semantics. Because there is no "standard" definition of what
should or should not be classified and indexed, each system approaches the
question of what should be classified or indexed differently.

For those organizations
that have decided on a single definition of the information that should be
classified, and how it should be classified, other challenges present
themselves. How do you classify or index information that is on the Web? In
answer to this challenge, some organizations are using metadata, or data about
data, to define what should be classified and where in the classification
system it would fit. Metadata is like a listing in a card catalog. The
information on the card is the metadata, or information, about the contents of
the book.

Much of the information on
the Web is also tagged with metadata. This information is used to ensure that
when a user is searching for information on the Web, they find the right

It would make sense that if
metadata in some form is used both in a library classification system and on
the Web then it would be easy to classify and index the information on the Web.
That, however, is not the case. Not all of the information on the Web has
value, and what is more, not all of it is tagged with metadata.

To overcome this challenge,
there are vendors that offer applications that will automatically tag digital
information (or content) with the proper metadata. Other software
applications automate the integration of this information into a classification
system or indexing system. The issue of standards still remains,
however. Not all libraries use the same classification and indexing
methodologies – notably, different types of libraries often use different
classification systems, so college libraries may use one classification system
while special collections use an entirely different, sometimes proprietary,
system. The standardization issued compounds existing problems and, until there
is standardization, it will be impossible to create an exhaustive
classification and indexing of all of the information on the Web.

One recently released
classification system is the next iteration of the Anglo-American Cataloging
Rules, Second Edition
(AACR2). Titled the Resource Description and
Access (RDA)
, is the new standard for resource description and access
designed for the digital world. It is built on the foundations established by
AACR2, and provides a comprehensive set of guidelines and instructions on
resource description and access covering all types of content and media.
The new system, designed by Joint Steering Committee for Revision of
Anglo-American Cataloging Rules (JSC), is said to be much more inclusive of the
different types of media that users might want to access. Additionally, the new
system is also designed to help keep different types of documentation within a
topic or category together. This will save users time and effort in finding all
of the different media available at the time of use.

Among the challenges that
RDA should address are:

  • A flexible framework for describing all resources –
    analog and digital
  • Data that is readily adaptable to new and emerging
    database structures
  • Data that is compatible with existing records in online
    library catalogues

The down side of the new
RDA system is that it is still not a complete solution for indexing and
classification. It is, however, a step in the right direction—one that will
hopefully lead to more effective and efficient cataloging of library materials
and media in the future.


[return to top of this report]

In the future, libraries
and the keepers of collections of information will likely see a maturing of the
classification and indexing systems that exist today. It is also possible that
new technologies will emerge that make the classification of information, both
in print and on the Web, easier and faster. Today, the magnitude of effort
involved in classifying and indexing all of the information in a given area or
country is nearly impossible to overcome when both print and digital
information is taken into consideration.

As the technologies for
indexing and classification improve and mature, standards will also be
developed. Once these standards are established, tested, and then put into
effect, the challenges facing libraries and other information repositories will
become increasingly more manageable. That may take some time. Standard
views of what constitutes information, what constitutes data, and what
constitutes metadata are still very much under consideration. In time, however,
these issues will be solved. When they are, the technology will be
sufficiently mature and function to make classification of all types of
information quick and easy.

Until such a time as the
industry standardizes on definitions, practices and procedures, there are
individual organizations and groups of organizations that are finding solutions
with which to classify and index existing collections.

In the interim, there is
one project, the Digital Libraries Initiative (DLI) that is striving to
digitize existing print resources and develop classification schemes that will
integrate existing digital information into those classification
schemes. Funded by the National Science Foundation (NSF), the Defense
Advanced Research Projects Agency (DARPA), and the National Aeronautics &
Space Association (NASA), the project supports and funds research into methods
for digitizing today’s library collections and integrating them with existing
digital information. This program, however, like the classification
systems mentioned above does not offer a fully developed classification and
indexing solution. For now, DLI is simply a research group moving toward the
solutions of the future.

Google Books Initiative

One effort that’s bound to
affect future library indexing and classification schemes is the Google Books
initiative. Right now, Google users can search over the full text of some seven
million books drawn from libraries and partners (publishers and authors)

  • The Library Project – Google has partnered with
    prominent libraries around the globe to include their collections in
    Google Book Search. For books that are still in copyright, Google results
    are like a card catalog; showing info about the book and, generally, a few
    snippets of text revealing a particular search term in context. For books
    that are out of copyright, Google users can read and download the entire
  • The Partner Program – Google has also partnered
    with more than 20,000 publishers and authors to make their books
    discoverable on Google. Google users can flip through a few preview pages
    of these books, just like they’d browse them at a bookstore or library.
    Users also see links to libraries and bookstores where they can borrow or
    buy the book.

Once all the legal
impediments are resolved, Google users will be able to purchase full online
access to millions of books, a feature especially useful for accessing
hard-to-find out-of-print volumes. This means users can read an entire book
from any Internet-connected computer, simply by logging in to their Book Search

Classification By Value

Since traditional library
indexing and classification systems were built around books, they were
generally non-judgmental. There was a presumption that most books contained
vetted, i.e., accurate information. In the Internet world, however, where
everyone is a potential author and most information is not vetted, the ability
to discriminate between fact and opinion, between reliable information and
misinformation is becoming difficult, if not impossible. One of the challenges
for future library scientists will be developing a classification model based
on information accuracy and reliability – a highly controversial process. 


[return to top of this report]

There are currently
billions of pages on the World Wide Web. Combine those sites with the billions
of printed books and periodicals that people want to access and you have a
massive information repository that is nearly impossible to navigate. If a
classification and indexing scheme can be developed that will allow users to
search all of this information from a single place, a powerful information tool
will exist.

The impact of a system that
will perform a search across printed and digital materials is tremendous. It
could potentially make all types of research more streamlined and efficient,
and the amount of time it would save users is barely imaginable. This
classification and indexing solution, however, is still a vision for the future
and only a reality on a small scale. To reach the fullest potential of this
type of technology an increase in funding and attention is needed. In addition,
a set of cross-industry (and possibly International) standards are
needed. At this time, there exists no such standards, and we estimate that
it will be five to seven years before such standards are in place.

Until those standards are
in place, research can and should continue, but the best possible solution will
not be reached.

[return to top of this report]

the Author

[return to top of this report]

Jerri L. Ledford is a leading business technology
trainer and author. She has written 18 books, including The SEO Bible,
and has worked with major corporations to develop both in-house and customer
facing technology training. In addition, Ms. Ledford leads training online and
in a corporate setting, as well as leading workshops and speaking publicly
about technology issues.

[return to top of this report]