Darwin Information Typing Architecture
Copyright 2018, Faulkner Information Services. All
Publication Date: 1805
Report Type: TUTORIAL
The Darwin Information Typing Architecture (DITA) is an XML-based
approach to authoring and distributing topic-based materials. The approach
was originally developed by IBM and is a method for creating and
maintaining technical documentation and content. The current version of
DITA is v1.3, which was issued and accepted as an OASIS standard in 2015.
A simplified version called Lightweight DITA was released in late 2017. A steady increase in
available tools for DITA is assisting its growth. Any organization of
substantial size should at least consider a DITA implementation for its
[return to top of this
The Darwin Information Typing Architecture (DITA) is an XML-based
approach to authoring and distributing topic-based materials.
Related Faulkner Reports
XML Development Tools Market Trends
In 2001, IBM published the first DITA release, and in 2004 IBM donated
the DITA technology to OASIS, an international open standards consortium,
which officially released DITA 1.0 in May of 2005. DITA 1.1 and 1.2
followed, and 1.3 was approved as an OASIS standard in late 2015. Lightweight DITA, a simplified version of DITA,
was made part of the official standard in 2017. DITA has the largest
membership of any OASIS technical committee.
DITA provides companies that produce complex products a way to create and
maintain substantial and complicated supporting documentation and content.
Recent studies suggest that the largest percentage of companies using DITA
are computer software companies, followed by IT technology and services
organizations, with telecommunications companies a distant third.
DITA can be extended and customized for specific industries and
applications, and several vendors (including Adobe, EasyDITA, DITA-OT
(open source), Dita Writer, Arbortext, Blast Radius/Ixiasoft, and Altova)
have released DITA products. According to one source, over 400 companies
have adopted DITA and used it for major documentation products, including
Amazon, Apple, Avaya, Boeing, CEDROM-SNi, Chevron, Cisco, Hitachi, Nokia,
Adobe, ATI Technologies, IBM, Kodak, Information Builders, McAfee,
Oracle, RIM, and SalesForce.
As DITA matures and adds new features in addition to general purpose and
vertical-specific DTDs, expect to see greater use in such areas as
collaborative authoring, integration with enterprise content management
systems, and customer support.
DITA, even with customization, is inappropriate for enterprises that do
not require reuse or modular delivery of their documentation or that have
no need to interchange modular information with other enterprises or
groups. For those who do have such needs, benefits can be realized in the
areas of cost reduction and time to market.
[return to top of this
Value can be realized through the adoption of vendor-neutral approaches
to core business problems such as content reuse, specialization, and
globalization in product support documentation. Publishing a standard is
not enough, since standards require a demand from a critical mass of
users, a process for improving and maintaining the standard, and support
from both vendors and open-source developers.
The Darwin Information Typing Architecture (DITA) – the name Darwin was
chosen to indicate the importance of inheritance and specialization in
documentation as well as a reference to biological evolution – is an
XML-based approach to authoring and distributing topic-based materials. It
was first published in May 2005 as a standard by OASIS (the Organization
for the Advancement of Structured Information Standards). DITA has
typically been used to develop documentation and other product-support
content such as online help systems.
Companies that produce complex products need a way to quickly and easily
create and maintain substantial and complicated supporting documentation
and content. DITA provides:
in a media-neutral format that enables re-purposing in print,
online help, HTML, and other formats.
content structure with the ability to flexibly create and manage
reusable chunks of content.
Structurally, a topic starts with a title element, followed by a mix of
text and images. Topics are organized into sections, which can be nested.
The use of core modules, the hierarchical depth, and the ability to nest
modules provide the flexibility to service a broad range of technical
DITA provides a mechanism for extending the core schema/DTD and framework
through what it calls specialization, which involves creating extensions
to the existing DTDs or schemas.
The initial set of DITA topic types is oriented toward the types of
information assumed to be most often created for product support:
used to create procedural materials such as step-by-step instructions.
used to create content that provides quick access to facts.
used to provide general-purpose background information.
These three content types provide a comprehensive starting point for most
product-support content, and the chunking of content coupled with the XML
markup enables reuse. Clearly, if the content is properly authored as
stand-alone topics, these topics can be reused in different contexts, and
topics from multiple components can be integrated into new, unique
solutions. DITA can easily be extended and customized for more
Specialization enables DITA users to define structures that are specific
to their information requirements and business processes but that remain
mapped to the core DITA types, ensuring that all conforming
specializations of DITA base types still reflect the core DITA-defined
structures. As a result specialized DITA information can be
interchanged with both users and with standard DITA processors.
One common use of DITA specialization is programming documentation. For
example, the documentation of an application programming interface (API)
usually requires the writer to create a specific reference document full
of sample code, processing rules, and other technical data, none of which
are explicit in the generic reference in the DITA DTD. The writer
either specializes the reference information type or creates a new API
reference based on the DITA structures, adding the additional required
structures as necessary.
The ability to modularize the supporting infrastructure enables direct
reuse of existing modules as well the ability to locally and unilaterally
extend DITA processing. With a centrally defined documentation system, a
group with additional requirements doesn’t have to go back to the central
support group to request the added functionality—it can simply implement
the needed features locally. Since the cost of this type of extension can
be accurately estimated and measured, the group can make a rational
business decision about whether to do the specialization and to bear the
cost of implementing the additional features. A “silo” approach, on
the other hand, requires each group to bear the full cost of
developing its system(s) from scratch, with little or no opportunity for
higher-level oversight or cost control.
With DITA, specialization is incremental. Organizations can add to an
existing base, rather than starting a new, from-the-ground-up
implementation. To implement a specialization, organizations normally only
need to add to an existing DITA-based system, which makes the incremental
cost of satisfying new requirements much lower than it would otherwise
The types of documents under DITA are generally long, with a mixture of
text, tables, charts, and graphics. Moreover, the documentation often must
be produced in different forms – for example, print, online Help sets, and
HTML. Translating such documents into multiple languages in support
of globalization efforts can be challenging.
When any complex products – for example, large manufacturing platforms,
enterprise software and hardware, medical equipment, consumer electronics,
or prescription drugs – are introduced to new global markets, the
globalization of that content is a major factor in successfully selling
and supporting them:
automobile manufactured in the US must have its manuals and other
supporting documents translated to the local language, and also needs to
reflect instrumentation and other features that may be unique to that
country or language.
software is typically published with supporting user manuals, reference
manuals, programming guides, and various kinds of online Help. The
supporting content needs to be translated and localized.
pharmaceutical companies, time to market is a key element of
profitability when launching a new drug in a new marketplace. To
introduce a new prescription drug into global markets, everything from
marketing materials to product labeling and clinical trial results must
be translated and localized.
In each of the above examples, the content is voluminous and must be
distributed through multiple channels – for example, print, online,
CD-ROM, and HTML. Organizations have begun to tackle this multichannel
challenge (multiple products, supported by multiple content types that
need to be provided in multiple languages and formats) with single-source
publishing that can be updated once, with the various content formats
later produced from the central source.
On a small scale, this kind of multichannel publishing can be done with
desktop tools, which can produce print, HTML, online Help and other
formats natively or with the support of add-on products. But true
single-source publishing assumes some kind of format-neutral encoding, and
increasingly organizations are looking at DITA as the encoding mechanism.
This kind of publishing model extends the benefits of having the content
in reusable components. By managing translation and localization of
content in a discrete, controlled, and automated manner, companies can
achieve major efficiencies and benefits of scale.
Lightweight DITA, now under development, will differ from the parent
product in important ways. It will contain 27 instead of 94 topic
elements; two mapping elements instead of ten; and six document types instead
of 23. The product will include the ability to configure nesting, and will
provide for easier specialization. All attributes will be managed as
History & Milestone Events
Reuse of documentation artifacts is a long time product support
goal. In the 1980s, the Standard Generalized Markup Language (SGML)
was used to create specialized tag sets for the US Department of Defense,
the Air Transport Association, and the automotive industry. Boeing,
General Motors, and United Technologies created millions of pages of
technical documentation using SGML, although many of those pages have
migrated to XML.
These verticals required large capital investments but had the benefit of
an agreed upon SGML tag set that was created just for them. The US DOD,
for example, had specific SGML Document Type Definitions
(DTDs). Other than these specific standards for monolithic verticals,
there was no usable way of working with SGML. The DocBook SGML DTD
was created in 1991 and later converted to an XML DTD. Though widely used,
it is not well-suited to topic-based authoring.
DITA is meant to be a more efficient, user-friendly, and appropriate
vehicle for doing the same thing: creating content in a neutral format
that can be quickly and easily re-purposed in multiple formats and
can create chunks of content to be reused in multiple products and
High-tech vendors such as IBM and HP create enormous volumes of product
support documentation. IBM created one of the earliest systems for
creating complex documentation, Bookmaster, and many of the key developers
of SGML also came out of IBM, which continues to be an innovative resource
for DITA growth, including the development of Lightweight DITA. Document
processing tools and approaches that were used at IBM became commercial
tools and standard practices in hundreds of other companies.
In 1999, IBM formed a cross-company workgroup to look at ways of better
leveraging XML in its product-support content. Ultimately the group
focused on how to best facilitate reuse, to allow a single source of
generically tagged modules of content to create multiple content products
that could be published in multiple formats. This led to the first DITA
release, which, after a year of internal testing and prototyping, was
published in March of 2001 on IBM’s developerWorks site.
In March of 2004, IBM donated the DITA technology to OASIS, and
simultaneously handed over governance of DITA to an OASIS Technical
Committee (TC). The TC put out a call for participation that same month.
Founding members of the group included IBM, Nokia and OASIS. The
formal standard published in May of 2005, and the TC is currently working
on requirements and design for subsequent versions of the standard. The TC
has since grown to include companies from different industries and
additional countries, including Intel, Cisco Systems, Microsoft, Oracle,
RIM, and Boeing. IBM continues to be a major supporter and developer of
In March of 2007, DITA v1.1 was opened for public review and
comments. The review period ended May 4, 2007. In August 2007
OASIS approved DITA v. 1.1 as an OASIS standard, and v. 1.2 was approved
in 2010, along with requests for comments for v. 1.3. At the same time,
planning began for a streamlined version of the product called Lightweight
A draft of v. 1.3 was released in April 2015. DITA v1.3 was approved as
an OASIS standard on December 17, 2015. The next major milestone was the
release of Lightweight DITA in November 2017.
[return to top of this
In technical writing circles DITA is generally considered to be a
significant development, surpassing equivalent approaches. The Oasis
Technical Committee (OAT) reviews requirements and design for new and
improved versions of DITA. The OAT now includes companies from different
industries, including IBM and Cisco.
IBM developed DITA, but the OASIS open standards consortium now oversees
the advancement and governance of DITA. Members include Adobe, RIM,
Cisco, Beoing, IBM, Microsoft, the Veterans Health Administration, IBM,
Intel, Sun, Alcatel-Lucent, Nokia, ArborText, Blast Radius, BMC Software,
the US Department of Defense, and Innodata.
In order to implement DITA, companies need to conduct thorough due
diligence to ensure that their workflows, resources, and technology are
ready ton reap the full benefits of a DITA implementation. DITA is not a
casual approach; it requires top-level support, training, and coordination
within the enterprise.
As DITA matures and features and general purpose and vertical-specific
DTDs are added, expect to see greater use in areas such as:
- Increased integration with Enterprise Content Management Systems, in
effect creating a knowledge base with easy location and retrieval of
- Collaborative authoring, allowing the incorporation of useful input
from customers, Call Centers, Product Development groups, and others:
- Product design
- Teaching guides
- Customer support:
- Contact centers, call centers
- Chat rooms
- Instant messaging
- Issue logging, tracking, and resolution
[return to top of this
DITA might not be a good choice for organizations that:
- Have existing information structures and practices that cannot be
easily redefined in terms of DITA’s base types. This is not unusual.
- Do not require reuse or modular delivery of their documentation.
- Have no need to interchange modular information with other enterprises
or groups within a larger enterprise.
On the other hand, enterprises for which DITA is appropriate should
consider the following items before implementing a DITA solution. Not
doing so could, at the very least, lead to time and cost overruns in the
implementation of DITA. When implementing DITA, the following should be
- Avoid replicating inefficient processes such as redundant
workflows. Also, content management applications should enable easy
identification of new and existing materials so that they can be
- Do not underestimate the initial effort involved in learning the
product, customizing templates, and setting up workflows. Consider
using a third party for training and for consulting support.
- In many organizations departmental or divisional solutions are not
integrated in a unified approach, leading to:
- Duplication of effort;
- Non-integrated technology systems;
- Time and cost overruns;
- “Silo’d” expertise that cannot be leveraged across departments or
DITA specialization can be used to provide a framework for avoiding these
problems. For example, DITA provides an architecture that supports
group-specific solutions that are nevertheless compatible with the overall
corporate solution and with other group-specific solutions. Another
approach is to implement pilot projects at the departmental level, while
remaining focused on the enterprise level’s needs. It is important to
evaluate DITA, not just in the context of technical writing, but in
relation to all aspects of the enterprise.
It is critical that the enterprise not underestimate the level of
expertise and effort involved in smoothly implementing DITA. As with
any packaged software application, there are vendors who claim to offer
“out of the box” solutions. Vendor claims notwithstanding, these
solutions will need to be customized to fit the specific requirements of
the purchasing organization. This means starting with a complete
requirements analysis to determine what additional specializations or
refinements must be added to the core DITA base types. Tools will help but
are not complete solutions to the DITA training curve. Organizations will
- Perform format analysis and style sheet development.
- Engage in legacy data conversion.
- Integrate DITA modules with digital asset management systems, content
management systems, authoring support systems, and other tools.
While DITA has gained significant traction in the marketplace, it is
still a new technology, and best practices remain to be established. As
with many new technologies, there is a temptation among DITA vendors to
overstate its capabilities. Despite vendor claims to the contrary,
although DITA is a major step forward for companies considering the
adoption of XML it is not a panacea for curing every problem that
organizations face in creating technical documentation, online Help, and
other product support documentation. Among the conditions for successful
DITA implementation are the following:
- Adopting a content structure that provides the ability to flexibly
create and manage reusable chunks of content.
- Approaching the creation of the content from the fresh perspective of
developing the content as single units of information that can then be
used in a myriad of combinations and permutations.
The solution to the second item is topic-based authoring, which DITA
makes possible, but this is still a new concept for many organizations.
Some larger organizations such as Sybase, IBI, IBM, and Adobe are on this
path, but others are still training and providing reorientation to
staff. Still others have not yet decided to address the issue.
The same goes for DITA in general. Some large organizations are deeply
immersed in its use; others are well underway; some are just beginning to
prototype or pilot an effort. Although some organizations – IBM, for
instance – are using the core DITA DTD with little modification, many –
including Adobe and Autodesk – have tailored the DITA DTDs to meet their
unique requirements, finding that for their needs, many core DITA types
require specialization to be made useful.
The release of Lightweight DITA offers new opportunities for the use of
DITA, but its limitations must be considered, including a reduced feature
set. An advantage it offers is a new capability to publish across various
DITA is useful and powerful, but as the saying goes, “there is no free
ride.” Not all technical writers are willing or able to write in the
modular, context-free way that DITA tends to require, and using DITA can
require more sophisticated authoring and content management tools than are
typically required for creating non-modular books. In other words, DITA
may be a “bigger gun” that needed for many organizations.
In the case of legacy XML and SGML systems that cannot be easily mapped
to full feature DITA, it may be better to begin by applying DITA concepts
of modularity and specialization without conforming to the specific DITA
structures. This approach provides many of the benefits of DITA with
little or no change to existing documents. The modularization approach can
be applied as needed, making a minimum of structural changes. Also,
even when legacy data structures can’t be directly mapped to DITA
structures, it is relatively easy to implement transformations to DITA to
supply DITA-conforming XML information to others as needed.
An out-of-the-box DITA tool kit can provide perhaps 80 percent of the
functionality required to use DITA in production, which means that
companies will still need to acquire or develop the remaining 20 percent.
This will likely account for much of the system’s implementation cost. As
a result, organizations should not expect DITA to immediately produce
dramatic savings, although in the long run most DITA-based systems will
pay dividends by enabling more cost-effective reuse,
interchange, and repurposing of information assets.
DITA is an opportunity for
enterprises to review their resource utilization in the areas of
product-related documentation. Subject matter experts
(SMEs) spend time formatting technical documents or performing other
tasks that can easily be offloaded to less scarce resources. Frequently
several groups within an organization may be creating similar content at
the same time, or maintaining reused content in multiple repositories, and
at other times companies may be using expensive resources to perform
copyediting and production functions that could be outsourced with a
minimal impact on the quality of the finished product.
DITA also requires indexing
content for search and retrieval, particularly descriptive metadata about
topics and elements. Many organizations will find that maintaining reused
content is quite difficult. Often, reused content modules require slight
content modifications based on context. A Ford Taurus and a Mercury Sable
may have identical features, but there are branding differences between
the two vehicles that must be adhered to when producing an owner’s manual.
The greater the volume of such exception management, the more
intricate DITA becomes.
Lastly, using DITA does not change the fact that organizations that
can properly leverage its strengths will still require business-specific
authoring, content management, and publishing systems. While a rapidly
increasing number of tools provide some DITA support out of the box, this
support cannot satisfy all requirements for all users. Therefore, just as
with any other XML deployment activity, organizations must plan for the
development and maintenance of their authoring, management, and publishing
infrastructures. Strong management commitment is essential.
To review, one of the historical barriers to entry in using XML was
complexity, along with the need for organizations to do significant tool
development, data conversion, and system integration before realizing the
benefits of re-purposing and reuse. One successful approach to
accelerating adoption and reducing cost has been to provide industry
vocabularies for XML, especially in vertical industries like automotive,
aerospace, and telecommunications. The emergence of DITA as a horizontal,
extensible architecture for information development is significant for the
broader marketplace. However, DITA brings its own complexity and has
a significant learning curve.
The benefits and ROI for an organization that uses DITA are closely
connected to the volume of the content they produce, their need to reuse
and repurpose the content, and the need to translate the content into
different languages. The more the opportunity for modular construction of
documentation, the more useful DITA is likely to be. Additionally, DITA
provides a framework for best practices for globalization and content
management, which allows enterprises to think more strategically about how
to manage these complex and expensive process.
[return to top of this
About the Author
[return to top of this
Kirk Woodward is a technical writer. In addition to
project management, Mr. Woodward’s areas of expertise include enterprise
software, hardware systems, and the use of Internet resources.
[return to top of this