Digitizing Enterprise Content

PDF version of this report
You must have Adobe Acrobat reader to view, save, or print PDF files. The reader
is available for free

Enterprise Content

by Faulkner Staff

Docid: 00011161

Publication Date: 2005

Report Type: TUTORIAL


The process of digitizing one’s enterprise content involves
converting business material into a digital format through imaging, scanning,
or other methods. This action – as well as the use of an EDM (electronic
document management) system, ECM (enterprise content management) software, or
ASP (application service provider) – offers several advantages in terms of
storage space, workforce allocation, and paper-related costs. Although the field is mature, it will likely
continue to experience
some improvements in technology. Transforming enterprise content into
digital format is a logical step for virtually any enterprise. This tutorial
looks at these and other considerations.

Report Contents:


[return to top of
this report]

The process of digitizing content is one whereby traditional paper documents
and structured or unstructured electronic information is scanned,
categorized, and stored for future use. Common examples of these pieces

  • Physical Docs – Letters, invoices, orders, checks, and structured
    business forms
  • E-Info – Faxes, e-mails, e-forms, images, database records,
    print streams, PDFs, CAD (computer-aid design) outputs, word processing and spreadsheet files, and
    voice and audio visual content

Content digitization can exist within an EDM (electronic
document management) or ERM (electronic records management)
systems, either of which potential subsets of an ECM (enterprise content
management) system. Content can also, it should be noted, be encrypted in the process of digitization.
The inputs for this process can include traditional computer content, as well as
data from call centers, kiosks, mobile documents, the Internet, sensors, social
media, rich media, and GPS signals. The most important technology relevant to electronic
capture includes:

  • Imaging, or the transformation of paper into an electronic
  • Scanning hand-written text into machine-readable
  • Importing electronic data and documents such as database
    records, word processing files, spreadsheets, faxes, print
    streams, audio, and video.
  • Converting any of the above into unalterable images for
    records management security and retention requirements.

Digitizing content can provide significant savings and cost
reductions, including but not limited to:

  • Orders-of-magnitude storage space reductions.
  • Lower staffing levels due to the elimination of manual review
    and authorization processes.
  • A reduction in filing and re-filing.
  • A reduction in off-site document retrieval time.
  • The ability to find, retrieve, and share information
    throughout the business or supply chain quickly and easily on an
    “anything, anywhere, any time” basis.
  • Reduced paper, printer ink, and envelope usage, offering
    progress toward the “paperless office.”
  • Improved efficiency in retaining, archiving, destroying, and
    retrieving records in accordance with audit, legal, and
    regulatory compliance guidelines.
  • Improved communication of data, and improved e-mail system
    performance as e-mail messages are off-loaded to less expensive
    “near online” media.
  • Reduction in redundant information.
  • Simplification of steps to regulatory compliance.
  • Improved security, redundancy, and local access, leading to
    easier collaboration.
  • Enabling of Digital Asset Management, Digital Rights
    Management, and content repurposing.
  • Creating a structure for next-generation business activities,
    positioning the enterprise for future growth.
  • Expediting a "single viewing" of documents across platforms.
  • Improving business continuity / disaster recovery functions.

The best way to achieve these benefits is through either the
implementation of an ECM solution with fully integrated digitizing
capabilities or through the use of an ASP (application service provider). Generally speaking,
the industry is regarded as mature, with innovative technological leaps forward not
really anticipated.


[return to top of this report]

Content digitization does not exist in a vacuum, but is rather part of an EDM (electronic document management) or ERM
(electronic records management) system that can be part of a
more-encompassing ECM (enterprise content management).


The field is sometimes referred to as "Record and Information
Management, or "RIM." The word “document” normally refers to an
individual content item of any kind, and “record” normally refers
to a required and structured kind of document (for example, a
monthly report). A record can be and usually is a document, but
because of their informal nature, not all documents are
necessarily records. Often, the two terms are used

The “format diversity” that exists over
disparate systems, however, tends to lead to the conclusion that as many types of
content as possible should be digitized.


Although the paperless office may never be completely achieved,
and is not always the primary goal of digitalized enterprise
content, the concept and related benefits of digitizing content
are compelling. Some of the most dramatic savings, cost
reductions, and cost avoidance due to digitizing scanning are in
the areas of:

  • Orders-of-magnitude storage space reductions for digital media
    over paper and microfilm.
  • Lower staffing levels due to the elimination of manual review
    and authorization processes.
  • Reduced paper purchasing. This has the added benefit of
    positive environmental impact.

When properly designed and implemented, the electronic
digitization of content minimizes the cost of record and document
processing and adds value to the information, since it can be
stored, indexed, and retrieved as needed. This information can be
accessed by all qualified credentialed network users.


Generally speaking, a digitized document moves much more quickly throughout its
lifecycle than its paper equivalent. At the same time, it enables
more user interaction, is subject to a more rigidly controlled
rule set, and can automatically trigger related processes –
advantages far beyond the ability of non-digitized
content. The only mandatory human interaction is the scanning
of content into the appropriate system. Content can originate from traditional paper such as letters,
invoices, orders, checks, and other structured business forms, or
from electronic formats such as fax, e-mail, images, database
records, print streams, word processor and spreadsheet files. No
matter what the source, the first priority is to scan it into a
relational or object database so it can be intelligently managed.

The process of digitizing content includes a number of steps, including
characterization, imaging, scanning, importing, and converting.

Characterization. Content must be categorized to determine the attributes and the
rules required to manage it. The most important enabling
technologies relevant to the electronic capture content are:

  • Imaging (the transformation or “conversion” of paper to an
    electronic image).
  • Scanning of hand-written text into machine-readable
  • Importing of electronic data and documents such as database
    records, word processing files, spreadsheets, faxes, print
    streams, audio, and video.
  • Converting any of the above into unalterable images to meet
    records management security and retention requirements.

Figure 1 depicts the methods for digitizing content.

Figure 1. Methods of Digitizing Content

Figure 1. Methods of Digitizing Content

Source: DigHist.org

In financial and business applications these technologies are
used as described below to decrease the manual keying of data, a
process that is slow, expensive, and error-prone. The term
“recognology” is sometimes used to refer to technologies that
recognize characters and marks from images in FPS (forms processing) and
DCS (data capture systems). OCR (optical character
recognition), meanwhile, includes all of these technologies, and is often
used in concert with Magnetic Ink Character Recognition (MICR) and
Symbolic Recognition (bar codes).

Imaging. Imaging is the conversion of paper to an electronic (raster)
image. Once the hardcopy is scanned, the source documents can be
destroyed or archived. Processing continues with the electronic
copy. This may involve optical character recognition (OCR),
intelligent character recognition (ICR), or handwriting. Many data
elements are captured by custom applications for later use in
downstream processing, such as mailings, order processing, or
financial systems. In these cases the document must be accessible
at all times, but only retrieved when needed. This especially
applies to checks, legal documents, and financial documents.

The need to handle color or non-standard paper sizes, or to
perform high-speed and duplex scanning, are among the features
that can increase the price of the scanning solution.

Benefits of imaging include:

  • Savings in shipping, courier fees, handling, and time.
  • Significant cost savings, especially when dealing with
    documents pertaining to accounts payable and receivable.
  • Security, redundancy, and local access.
  • Easy accessibility and distribution savings. Images can
    be quickly found, displayed, printed, emailed, and faxed.
  • Forms can be shredded once scanned, thus saving on storage and

Scanning. Scanning is among the most widespread means of importing
enterprise content. Scanning utilizes intelligent character
recognition (ICR), the computer translation of hand-written text
into machine-readable characters. Characters are entered in a
printed form from an I/O device, and the captured data image is
analyzed and translated into machine-readable characters. ICR
is similar to optical character recognition (OCR) and is sometime
used in combination with it. Before ICR and OCR technology, form
processing was performed by data entry clerks who manually keyed
data in from paper forms to various computer media.

Data entry through OCR is faster,
more accurate, and more efficient than keystroke data entry,
resulting in an easily retrievable and usable business document
stored in electronic form. OCR can recover valuable information
and format it in reusable form. Information can be gathered from
old paper files, resumes and applications, forms, and address
labels. An example of OCR applications is remittance processing,
where data on utility bills or other turnaround documents needs to
be collected and entered into a system.

There are two types of OCR scanners:

  • Text input devices scan pages or large portions of
    documents. The data can be fed by hand or automatically, read,
    sorted, and stacked. Text input allows data to be edited after it
    is scanned. 
  • Data Capture devices are designed to capture repetitive
    data and to format the data as it being entered. This data cannot
    be edited later.

Benefits of scanning include:

  • Reducing data entry errors.
  • Consolidating data entry.
  • Handling large volumes/peak loads.
  • Readability by humans.
  • Compatibility with many printing techniques.

When evaluating scanners, it is important to consider the volume
and physical dimensions of the documents to be scanned, ease of
use, and the costs of purchase or rent, maintenance, and
upgrade. The ability to support a wide range of scanners is a
critical component of any EDM or ERM system, whether standalone or
as an integrated part of an ECM system. In applications such as airline baggage handling, manufacturing, and
warehousing, which involve rough handling and harsh environments,
bar codes are a better option than OCR, but they require more
label space. OCR can hold roughly six times as much
information as a standard bar code.

OCR Data Capture and Forms Processing are system components in
integrated Electronic Document and Record Management Systems, and
are used to digitize and recognize characters (data) from paper in
live business systems. Forms processing involves extracting
information from structured or customized forms, faxes, and
scanned images, and updating it on a range of outputs. The data is
extracted from hand written forms or scanned images either
manually or through specialized OCR/ICR software, and then
delivered in various formats like ASCII, CSV (Comma Delimited) and
database files.

Scanning is a manual process in several senses. Staff must select
the documents to be scanned, scan them properly, and also perform
quality control checking on the results of the scanning, since
even the best scanning equipment will not always translate even
printed text flawlessly into digital form.

Importing. This is the process of bringing existing electronic files such as
word processing, spreadsheet, graphics, audio clips, JPEG, and GIF
pictures, or video files into a content management
system. Generally, they are stored in their original formats.
However, conversion of such files where appropriate may be

Converting. Conversion, in this context, is the process of transforming
existing electronic files into permanent images for storage within
a document or records management system. Many applications –
including Word, Excel, and AutoCAD – can transfer existing files into
unalterable document images, which are then typically stored in
TIFF (Tagged Image File Format), administered by Adobe. For
documents, the process grabs a text stream directly from the
document, without the need for OCR. This text file can then
be indexed for future retrieval. Conversion does not require
scanning, saves paper, ink, and printer maintenance costs, and
results in a cleaner image than scanning. This imaging method
is best suited for permanent and semi-permanent archiving.


[return to top of this report]

Many attempt to ease the storage portion of the problem through the use
of off-site storage facilities or by using microfilm or microfiche, which
are expensive, with total
costs sometimes into seven figures, and which do not
solve the inconvenience, lost time, and decreases in
productivity resulting from attempting to retrieve records stored
in this fashion.


The goals of any content digitization solution are to create
efficiencies, reduce costs, and increase productivity. It is
best to use a phased department-by-department implementation to
allow both end-users and MIS to adapt to the new
technology. In addition, the lessons learned in the early
phases will make later phases that much more smooth. An ASP approach may reduce upfront capital investment and MIS
staff requirements. For some enterprises, this will be less
cost-effective in the long term. Technology and pricing will vary
according to the number of users and the types and volumes of
records to be digitized and managed. 

Cost Considerations

The cost for a content digitization solution (hardware, software and
implementation services) can range from one hundred thousand
dollars to over one million dollars, depending on the number of
users and scope of the project. Typically, businesses recover the
costs of digitizing content in less than two years, based on
hard savings produced by eliminating or reducing expenses
associated with paper, file folders, storage space, A/R days,
microfilm costs, phone expenses, and the reduction / reallocation of
full time existing staff. Soft savings include increased
customer satisfaction from more complete and timely responses to
questions, elimination of lost or misfiled documents, and
efficiencies gained by providing simultaneous multi-user access to

Prior to Implementation

Unless the enterprise already has a significant investment in ECM
with strong ERM and EDM functionality, there are two solutions:
purchase of an ECM system with suitable content digitizing
functionality, or use an ASP. In either case, when evaluating
potential vendors, enterprises should look for solutions that:

  • Allow for modular rather than “big bang” implementations to
    ensure the continuity of daily processing, as well as the
    successful training of and integration with the various
    departments involved in the project.
  • Allow for remote scanning and retrieval over the Web. This
    enables “anything, anywhere, any time” access, and reduces file
    storage and retrieval costs.
  • Are based on open standards. There are a wide range of choices
    for hardware and software, and selecting open systems that work
    on existing hardware and software platforms is crucial.
  • Are offered by vendors with a proven track record, a solid
    product, and a consultative approach.
  • Can track and manage both paper and electronic content.
    Paper-based records will be continue to be necessary until users
    have grown accustomed to viewing documents online rather than in
    print. Note that users will still need to scan paper documents
    in order to digitize them.
  • Include “bundled” consulting services, since implementing an
    ECM system affects all departments and requires an adjustment in
    business processes. Trained content digitization and ECM
    consultants are specialists in the installation of the necessary
    scanning and imaging hardware and in building the required
    interfaces to seamlessly integrate the ECM, are experienced in
    change management and business processes, and can help ensure a
    smooth transition to the benefits of an electronic record.
  • Offer alternative financing, such as an ASP solution, which
    allows customers to pay-as-they-go rather than requiring a large
    capital investment in expensive hardware that may become
    obsolete. The ASP provider can be contractually required to keep
    up to date with the most current scanning and imaging

Potential Deterioration

There is increasing awareness that digitized content can
deteriorate like other storage media, depending on the nature of
the media and the conditions under which it is maintained. Digital
content storage, likewise, must be monitored for both physical decline and
for advances in technology.


[return to top of this report]

Content digitization, generally speaking, is a mature industry
with no major innovative technological leaps forward expected to occur. Devices will become faster, quality will increase,
and prices will drop, but neither the core approaches of imaging,
scanning, importing, and converting nor the core “recognologies”
will be replaced any time in the immediate future, except perhaps
in specialized areas such as medical imaging.

Most professionals now own or have access to several electronic
devices, and as a result, mobile applications for content
management are emerging.
The digitization of content allows devices to be inter-connected
and to share information. Almost certainly more types of currently
non-digital content will have to be converted to a wider variety
of electronic formats for use in an increasing number of
applications running on a broad array of devices.

In summary, the changes that do occur will be in the number of
input and output formats and devices supported, the ease of use of
tools and techniques, and the replacement – on
the enterprise level – of standalone content solutions with content digitization embedded within ECMS


[return to top of this report]

It is difficult to see why an enterprise of any appreciable size
would not want to digitize its content, given the number and
significance of the benefits to be realized. Even the Library of
Congress began the process of digitizing its contents, scanning
its twenty five thousandth book in early 2009. It now has scanned
millions of “items” or separate documents, and offers its own
scanning service, Federal Scanning Service at the Library of
Congress (FedScan). The capturing of records in electronic form
enables ECM, including improvements in Records, Document, and
Knowledge management. Figure 2 summarizes the reasons that record
management through the digitizing of content is important for the

2. Importance of Digitizing Content

Figure 2. Importance of Digitizing Content

Source: i-SCOOP.eu (via Capgemini, MIT Digital Transformation Framework)

The optimal solution is one that efficiently manages both
electronic and paper-based documents and can be integrated with
other information systems, resulting in a complete electronic base
of data available online or from other legacy applications through
which enterprise-wide document and records management can be
accessed. Bottom-line benefits include:

  • A reduction in costs of filing and re-filing.
  • A reduction in off-site document retrieval costs.
  • The ability to share information throughout the business or
    supply chain quickly and easily.
  • Simultaneous multi-user access to a record or document.

One approach is to bring in an ASP (Application Service Provider)
with offerings specifically geared towards bridging the gap from
paper to electronic document management, especially those that
include modules to address:

  • Business office functions for scanning, viewing, and printing
    of scanned and electronic documents.
  • Computer Output to Laser Disc (COLD) capabilities for
    documents, and print streams.
  • Workflow management modules that automate tasks and procedures
    using document images and data.

An ECMS-based content digitization solution would include all the
above plus:

  • Avoidance of costs and penalties (legal, audit, and
    regulatory) related to missing information.
  • Cost savings for supplies (paper, printer ink, envelopes),
    courier services, and physical storage space.
  • Improved cash flow due to faster access to more trustworthy
    accounts receivable data.
  • Improved customer service.
  • Improved efficiency (less time and lower cost) in scanning and
    classifying documents.
  • Improved efficiency in retrieving records for audits, legal,
    and proof of regulatory compliance.
  • Quicker, more efficient, and easier access to digitized
    content in general.
  • Reduction of staff supporting traditional paper filing and

The enterprise using an ASP solution must find out from the
vendor the procedures in place to arrest any possible physical
deterioration of digitized content, and to maintain and possibly
upgrade the storage of such content when technological advances
require it.

While vendor issues are important, so are cultural issues within
the enterprise. Employees may not immediately take to content
digitization or recognize its benefits. Among the elements needed
for effective content digitization are:

  • Strong support from senior management
  • Strategic analysis of the document
    management needs of the enterprise, including a content audit
  • Discussions, communications, and
    training among staff
  • Identification and recognition of
    staff who can guide and train other users
  • A dedicated project group

[return to top of
this report]

[return to top of
this report]