Natural Language Generation

version of this report

You must have Adobe Acrobat reader to view, save, or print PDF files. The
reader is available for free

Natural Language Generation

by James G. Barr

Docid: 00021396

Publication Date: 2208

Publication Type: TUTORIAL


Natural language generation (NLG), also known as automated narrative
generation (ANG), is a technology that transforms enterprise data into
narrative reports by recognizing and extracting key insights contained
within the data, and translating those findings into plain English (or
another language). There are articles, for example, prepared by respected
news outlets like the Associated Press that are actually “penned” by
computers. While perhaps overly optimistic, some analysts have contended
that as much as 90 percent of news could be algorithmically generated by
the mid-2020s, much of it without human intervention.

Report Contents:

Executive Summary

[return to top of this report]

“News content has always
been limited by resources. More events happen in the world each day than
can be covered by available journalists.” Automated journalism
“[transcends] the limits of humans. Because the algorithms that write news
stories can, given an ample input of data, spit out an exponentially
greater number of news stories. A news story essentially becomes an
instantly attainable object, much like the results of a search engine

– Matt

Please note: This report was written by a real human being. This
acknowledgement may sound silly but some articles prepared by respected
news outlets like the Associated Press are actually “penned” by computers
– artificial intelligence programs practicing a new form of authorship
called natural language generation.

Faulkner Reports
Artificial Intelligence Tutorial
Computational Linguistics
Smart Machines Tutorial

While we’ve long known that mechanical operations like automotive
manufacturing can be reduced to repeatable processes – which, in many
cases, robots can assimilate and perform with greater efficiency, fewer
defects, and lower costs than humans – it turns out that certain
intellectual activities like translating a company’s quarterly earnings
report into an article for investors is also “mechanical” – and can be
accomplished by computer.

This new reality was glimpsed a few years ago when application developers
started to produce programs that could sort through massive amounts of
evidentiary material accumulated through e-discovery orders and identify
items of interest to litigators. Instead of engaging an army of
associates and paralegals to analyze the documents, firm lawyers could
delegate the work to computers, which not only operate faster, but are
immune to the type of fatigue-based errors and omissions that might
characterize reviews by humans.

Natural language generation (NLG), also known as automated narrative
generation (ANG), is a technology that transforms enterprise data into
narrative reports by recognizing and extracting key insights contained
within the data, and translating those findings into plain English (or
other language). The automated narratives can be generated in multiple
forms, each tailored to a specific audience.

Eliminating Cognitive Bias

As emphasized by Deloitte analysts, one of the principal advantages of
NLG – and the reason for pursuing machine writing – is the elimination of
human cognitive bias. “Seeking out pre-conceived insights from a set of
data, as opposed to performing objective analysis, increases as humans
strive to make sense of increasingly complicated datasets and data models.
As data growth transcends volume and impacts variety and velocity of data,
the risk of cognitive [bias] and its limitations in the production of
valuable insights become more material.”2 In such situations,
natural language generation is, increasingly, preferred.

The NLG Market

Reflecting the emergence of NLG as a valuable – and viable –
technology, MarketsandMarkets reports that the natural language generation
marketspace, which was valued at $277.2 million in 2017, is expected to
reach $825.3 Million by 2023, representing an impressive compound annual
growth rate (CAGR) of 20.8 percent during the forecast period.3


[return to top of this report]

NLG and CL

Offered for technological context, Natural Language Generation (NLG) is
generally recognized as a branch of Computational Linguistics (CL), which
also encompasses Natural Language Processing (NLP) and Natural Language
Understanding (NLU).

Natural Language Processing

As the engineering arm of computational linguistics, natural language
processing (NLP) helps streamline and expedite business processes by:

  • Automating routine tasks, through chatbots and other digital
  • Improving searches, “disambiguating” word definitions based on context
    (carrier, for example, means something different in biomedical than in
    industrial contexts).
  • Enabling search engine optimization, improving an enterprise’s rank –
    and, thus, visibility – in online searches.
  • Analyzing and organizing large document collections, including
    corporate reports and scientific documents.
  • Advancing social media analytics, evaluating customer reviews and
    social media comments to make better sense of huge volumes of
  • Providing market insights, analyzing the language of customers to
    determine their needs and how to communicate with them.
  • Moderating user or customer content, [maintaining] quality and
    civility by analyzing not only the words, but also the tone and intent
    of comments.4

Natural Language Understanding

Natural language understanding (NLU) is also called Natural Language
Interpretation” (NLI). Through NLU analysis, “computers manage to
interpret … language and define a user’s intent.” Unlike simple speech
recognition, NLU “focuses on the determination of intent, sentiment, and

Commenting on the potential of NLU, analyst Bogdan Koretski predicts that
“Due to the deep and correct language interpretation, machine-to-human
communication will reach a completely new level. Business processes like
data collection and analysis, data vetting, and facts checking will be
automated and errors excluded.”5

Natural Language Generation

Natural language generation (NLG) is a technology that transforms
enterprise data into narrative reports by recognizing and extracting key
insights contained within the data, and translating these findings into
plain English (or other language). The automated narratives can be
generated in multiple forms, each tailored to a specific audience.

While the output of an NLG program is text, the input can take various

  • Textual data;
  • Non-linguistic data, like sensor data; and, more recently,
  • Visual data, like images or videos.

NLG output can be:

  • In written form, like a report or press release; and
  • In spoken form, like data delivered by a chatbot.

As described by analyst Ivy Wigmore, the natural language generation
function is divided into six stages:

  1. Content analysis – Data is filtered to determine what should
    be included in the content produced at the end of the process.
  2. Data understanding – The data is interpreted, patterns are
    identified, and it’s put into context. Machine learning is often used at
    this stage.
  3. Document structuring – A document plan is created and a
    narrative structure chosen based on the type of data being interpreted.
  4. Sentence aggregation – Relevant sentences or parts of
    sentences are combined in ways that accurately summarize the topic.
  5. Grammatical structuring – Grammatical rules are applied to
    generate natural-sounding text. The program deduces the syntactical
    structure of the sentence. It then uses this information to rewrite the
    sentence in a grammatically correct manner.
  6. Language presentation – The final output is generated based on
    a template or format the user or programmer has selected.”6

Neural Text Generation

From a technological perspective, natural language generation is still in
its infancy. As analyst Robert Dale observes, “today’s commercial NLG
technology appears to be relatively simple in terms of how it works.
Nonetheless, there is a market for the results that these techniques can
produce, and much of the real value of the solutions on offer comes down
to how easy they are to use and how seamlessly they fit into existing

[In] terms of [the] underlying technology for generating language,
there is of course a new kid on the block: neural text generation, which
has radically revised the NLG research agenda.
7 Neural
networks mirror the operation of the human brain, enabling computer
programs to recognize patterns and solve problems as complex – and often
as confounding – as natural language generation.


[return to top of this report]

Much of the impetus behind the development of natural language generation
can be traced to Big Data.

Enabled by rapid advances in storage capacity and processing power,
companies are spending big on Big Data, hoping to gain vital business
intelligence from the mountains of data being created or extracted each

Big Data is useless, however, without the ability to reduce it to
actionable intelligence and, just as importantly, present that
intelligence in a digestible form. Applying the aphorism that “one
picture is worth a thousand words,” many information analysts have
utilized dashboards containing charts, graphs, “infographics,” and other
visual effects to summarize the meaning behind Big Data.

But as analysts at Automated Insights observe, “dashboards disappoint.”

“Everyone reads dashboards
differently. Without guidance, it can be easy to miss significant
attributes. Some users may miss elements whose meaning is obvious to
others – even though that insight may be of crucial business
importance. Charts require titles, notes, and, all too often,
in-person explanations from their creators. Dashboards can create
beautiful visualizations that leave CEOs and line-workers alike asking:
what does it mean? In other words, dashboards may help data
experts formulate insights. But data experts still have to use
narrative to impart those insights to others in an

To realize the full benefits of Big Data, natural language is needed to
explain, clearly and authoritatively:

  • What insights a particular Big Data set is offering, and
  • Why such insights are relevant to the enterprise and its business

NLG Use Cases

Natural language generation is commonly used to facilitate the following:

Narrative Generation – NLG can convert
charts, graphs, and spreadsheets into clear, concise text.

Chatbot Communications – NLG can craft
context-specific responses to user queries.

Industrial Support – NLG can give voice
to Internet of Things (IoT) sensors, improving equipment performance and

Language Translation – NLG can
transform one natural language into another.

Sentiment Analysis – First, NLU
determines which of several languages resonates with users; second, NLG
delivers messages in the users’ preferred languages.9

Speech Transcription – First, speech
recognition is employed to understand an audio feed; second, NLG turns the
speech into text.

Content Customization – NLG can create
marketing and other communications tailored to a specific group or even

Robot Journalism – NLG can write
routine news stories, like sporting event wrap-ups, or financial earnings

Emotional Content NLG

While most natural language generation is geared toward a “professional”
audience – and evinces a professional tone – some NLG scientists are
starting to embrace emotional content through the development of emotional
chatbots. According to analyst Bardia Eshghi, “The technology behind
emotional chatbots strives to make them grasp an understanding of the
user’s feelings and emotions and to then offer applicable advice or course
of action.”

Among the prospective users of emotional chatbots are:

  • Human resources representatives,
  • Customer services representatives, and
  • Psychologists or other mental health professionals10

AI Story Generation

Perhaps the most promising application of natural language generation is
the creation of fictional works.

Some observers have remarked that there are only a finite number of plots
in all of literature and that all storylines are just variations on these
plots. Some suggest (only partly in jest) that watching network
television reinforces this hypothesis, with critics claiming that they
find many TV and movie scripts “derivative.”

One solution being pursued by NLG and AI researchers is “AI Story
Generation,” or “the use of [machine intelligence] to produce a fictional
story from a minimal set of inputs.” As analyst Mark Riedl explains,
“Aside from the grand challenge of an AI system that can write a book that
people would want to read,” storytelling offers other advantages. For

  • “Humans often find it easier to explain [situations or circumstances]
    via vignettes [or evocative descriptions or accounts], and are often
    able to more easily process complex procedural information via
  • “Telling and listening to stories is … a way that humans build


[return to top of this report]

Select the Right Tool

To achieve optimum results, enterprise planners should exercise care in
selecting a natural language generation tool. Critical criteria include:

  • Summarization – How effectively – and completely –
    does the solution summarize target dataset contents?
  • Simplicity – Is the generated text easy to read and
  • English Proficiency – Does the generated text
    comply with standard rules of English grammar and syntax?
  • Multilingual Capabilities – Can the solution
    accommodate languages other than English?
  • Security – Can the solution be adopted without
    compromising current security standards?
  • Scalability – Can the solution grow with the
  • Platform – Can the solution be deployed on-premise or
    in the Cloud (as desired)?12

Be Realistic About ROI

Understand the financial commitment associated with natural language
generation, and consider the return on investment.

Analyst Mark Kaput cautions that “NLG solutions, even basic ones,
typically require substantial time to set up. You also need to pay for a
solution, and possibly related NLG services. You’ll want to take a
realistic look at the technology, what it can do for you, and how much you
can scale using it.

“Start by analyzing how long reports, articles or narratives currently
take, then see how much time NLG can potentially shave off.

“Finally, apply those time savings to all staff whom NLG would affect. An
hour saved per week per employee may make financial sense for your

Document All Enterprise Processes

Finally, one area in which all enterprises – small to large – will
benefit from natural language generation is process documentation.

Understanding how enterprise operations are conducted on a daily basis
can promote process improvement, even encourage business reengineering –
reducing recurring costs and enhancing customer and employee satisfaction.

[return to top of this report]


About the Author

[return to top of this report]

James G. Barr is a leading business continuity analyst
and business writer with more than 30 years’ IT experience. A member
of “Who’s Who in Finance and Industry,” Mr. Barr has designed, developed,
and deployed business continuity plans for a number of Fortune 500
firms. He is the author of several books, including How to Succeed in
Business BY Really Trying, a member of Faulkner’s Advisory Panel, and a
senior editor for Faulkner’s Security Management Practices.
Mr. Barr can be reached via e-mail at

[return to top of this report]