PDF
version of this report
You must have Adobe Acrobat reader to view, save, or print PDF files.
The reader
is available for free
download.
Big Data Technology
Copyright
2019, Faulkner Information Services. All Rights Reserved.
Docid: 00021369
Publication Date: 1910
Report Type: TUTORIAL
Preview
E-commerce, social networking, and
other Internet practices have generated unprecedented amounts of data.
Organizations in many industries are trying to turn this “Big Data”
into actionable business information, and vendors are working to create
tools and services to help them do so. The challenges are daunting, however, and the concept still has many
potential uses that have not been fully explored.
Taking advantage of the insights that can be acquired through Big Data
requires focusing on business goals and putting together the right
combination of technologies and practices to achieve them.
Report Contents:
Executive Summary
[return to top
of this report]
Using Big Data requires a range of
technologies, most of which are sophisticated and require specialized
skills to use.
Related Faulkner Reports |
Big Data Marketplace |
Big Data Analytics Tutorial |
Artificial Intelligence Tutorial |
Machine Learning Tutorial |
For many companies, a bulk of the work is done by Hadoop,
which
distributes data tasks across large groups of servers, and by NoSQL,
which handles database management. But additional tools are needed to
collect
and analyze data, and organizations bear the burden of selecting these
tools and making them work together as part of an overall system.
Furthermore, some of these tools are still immature, making it even
harder to successfully implement a Big Data initiative.
Now that Big Data has been in practice for several years, many
observers are
emphasizing the importance of focusing on useful information, so-called
“smart data.” To uncover the business value of data, organizations
would be wise to recognize that it is necessary to change their
decision-making philosophy and to understand that Big Data is best used
for largescale applications but may not be the best approach for all
analytical tasks.
Description
[return to top
of this report]
Big Data is generated and analyzed with a wide variety of
tools, each of which plays a role in an overall system. The
most widely
used Big Data tool is Hadoop,
from the Apache Project. Hadoop is open source software that enables
large volumes of data to be processed on a distributed group of
servers. Thousands of servers can be used, providing significant
processing power and fault tolerance. Developers have improved the security of Hadoop, which was not originally
expected to be used in as many mission critical situations as it is
today.1
Big
Data efforts
also make wide use of what is known as NoSQL, an alternative to
traditional relational databases. Whereas relational databases store
information in tables, NoSQL databases store it in documents. The NoSQL
approach leads to redundancy, but its simplicity translates into
flexibility, which is important when data is spread out across large
numbers of often heterogeneous servers, as is the case in many Big Data
environments.
But Hadoop
and NoSQL perform just some of the tasks that
are needed
to effectively collect and analyze Big Data. Technology writer Peter
Wayner
said the
following about his experience
evaluating several Big Data tools:
“After wading through these products, it became clear that ‘Big Data’
was much bigger than any single buzzword. It’s not really fair to lump
together products that largely build tables with those that attempt
complicated mathematical operations.”2
Many
of the other tools
on the market work with Hadoop and NoSQL databases,
providing functionality for:3
- Data management
- Number
processing - Reporting
and visualization - Integration
with particular applications - A platform
for software development
And
despite their popularity, Hadoop and NoSQL may not remain the top tools
in the market. Perhaps
the biggest challenge to Hadoop is Spark, which is also from the Apache
Project. Spark is an alternative to Hadoop’s MapReduce
component,
which facilitates Big Data by enabling large volumes of data to be
processed in parallel across many computers. Apache says that Spark is
100 times faster than MapReduce, as well as being easy to use and
running on a wide variety
of platforms. And not only is Apache itself touting Spark over
Hadoop, but influential companies like Cloudera4
and IBM5 support it. Hadoop and Spark
may be able to co-exist.
“[T]hey don’t really
serve the same purposes,” says technology writer Katherine Noyes.6
“Hadoop is essentially a distributed data infrastructure: It
distributes massive data collections across multiple nodes within a
cluster of commodity servers, which means you don’t need to buy and
maintain expensive custom hardware. It also indexes and keeps track of
that data, enabling big-data processing and analytics far more
effectively than was possible previously. Spark, on the other hand, is
a data-processing tool that operates on those distributed data
collections; it doesn’t do distributed storage.”
Current View
[return to top
of this report]
The
Big Data market is moving decisively in two directions. First,
it
is growing larger. The market for Big Data and business analytics
combined will grow at a compound rate of 13.2 percent until 2022, when
it will hit $274.3 billion.7 Second,
Big Data is growing in
strategic significance. For example, it is being used for critical
applications such as the following:
- To inform marketing campaigns8
- To optimize financial trading9
- To modernize power infrastructures10
- To provide data for healthcare delivery and research11
And as organizations put Big Data to greater use, they are also
scrutinizing the tools they use more closely. “The
honeymoon days of big data experimentation are over,” writes Mary
Shacklett, President of research and consulting firm Transworld Data.12
Organizations now expect Big
Data systems to perform well, just as with any critical enterprise
software. In particular, they measure performance based on the
following:
“how fast new applications are developed and deployed;
“how well the system is executing these new applications; and
“if the system is performing as economically and efficiently as
possible.”13
Outlook
[return to top
of this report]
Smart Data and Smart Devices
One
change
that is already occurring is that the industry is focusing more on what
some call “smart data.” “‘Big data’ isn’t as important as ‘smart data’
or the ‘right data,'” says Kimberly A. Whitler, Professor at the
University of Virginia’s Darden School of Business. “Companies are
getting excited over the notion of big data, but it’s ultimately only
as good as the insights you get out of it. And in order to get
actionable insights out of it, you have to combine big data with small
data….The small data provides the context and calibration that big
data can’t do on its own. When you combine the two, you get smart data.”14
This
shift is pushing organizations to carefully consider what data to use.
“To
have meaning, and hence to be ‘smart,’ data needs to have context. It
needs to be correlated with something,” says Oracle’s Jonathan
Palmer.15 As an example, he describes the
following scenario: “There is no point in collecting
heart rate every second of a two year clinical trial, if the heart rate
data cannot be aligned to the exact date and time that the
investigational drug was taken.” Making a similar point,
Jeremy
Goldman, CEO of consulting firm Firebrand Group, says that “A
company that focuses on smart data, rather than big data, is more
likely to be concerned with those specific nuggets of information that
will directly impact the business.”16 But
Goldman also says that
identifying smart data is difficult because “data scientists are
expensive and in short supply.”
AI and Machine Learning
The development of Big Data is closely intertwined with the
development of AI and machine learning.17 These
technologies are expanding the range of applications for Big Data and
enabling it to perform at higher speeds and on much larger sample
sizes. “The
convergence of big data with AI has emerged as the single most
important development that is shaping the future of how firms drive
business value from their data and analytics capabilities,” says Randy
Bean, CEO of consulting firm NewVantage Partners.18
“The
availability of greater volumes and sources of data is, for the first
time, enabling capabilities in AI and machine learning that remained
dormant for decades due to lack of data availability, limited sample
sizes, and an inability to analyze massive amounts of data in
milliseconds.” NewVantage links Big Data and AI closely,
essentially discussing them as a single technology in its annual survey
of executives.19
Big Data as a Service
One reason that Big Data is often difficult to fully
use has been that organizations have
needed to
patch together many technologies, which they have been responsible for
choosing, configuring, and interconnecting. But there are now
cloud-based Big Data services offered by companies such as Google,
Oracle, and
SAP and more recently Dell EMC.20 They offer
popular tools like Apache Hadoop, Spark, and Hive or
the data analysis platform H20. The service providers choose the tools
to use (possibly with some input from customers), configure them to
interoperate, and establish and monitor security measures.
Recommendations
[return
to top
of this report]
Approach Projects Carefully
Successfully
implementing Big Data takes good planning and the right type of
environment. Some of the most common obstacles to successfully using
Big Data involve finding people who are skilled at implementing it, the
cost, and issues that carry over from legacy data analysis systems.21
Overcoming
these obstacles takes significant commitment, and
any resources devoted to the effort need to be well selected.
Organizations also need to get support from top executives and ensure
that skilled staff members are onboard. But initiatives can’t
necessarily rely just on executives and the IT staff. Often, partners
must be brought in, and a variety of
departments within an organization may be required to play a role
because the
data involved originates from, or is used by, various departments.22
Approach Big Data as a Conceptual Change
The decision to use Big Data represents a change in
management philosophy. Making
this transition is not
easy, and it is in many cases best approached gradually. Adam Kleinberg
provides guidance about some
simple steps that organizations can take to realize some of the
benefits of Big Data.23 He recommends using
free tools like
Google
Trends, which identifies the top search topics on the Web, and
Quantcast,
which provides data about the visitors to a given Quantcast.
Focus
on Business Goals
Big Data gives companies
volumes of raw numbers. Translating these numbers into business
knowledge is,
however, a
separate process, but it is ultimately the most
important one. In an
article titled “The CEO/CMO
Dilemma: So Much Data, So Little Impact,” marketing executive
Frank Wheeler is quoted as saying, “Often, there
are multiple data systems that need to be integrated, an analytical
capability that needs to be developed, and an advanced technical
expertise needed to transform the data into useful information.
What’s missing is an end-to-end understanding of how to get
the
right data, understand it, and then turn it into a growth-driven
plan.”24
“Data platforms are not ‘one size fits all,'” says an article published by consulting group Aberdeen.25
“You’ll need to create a data platform that complements your
organization’s strengths and your existing technology footprint, and
uses the most effective tools to meet your data ingest and analysis
needs. Typically, this will be a dynamic combination of legacy and new
technology, off-the shelf and open source licensing, and static and
fluid data access methods.”
Consider the Tools to Use
Organizations can analyze
information in many ways
without using what would typically be called “Big Data.” And often,
this approach
may provide equally valuable information while costing less and
requiring less effort. This is a point that has been made by enterprise
architecture designer Tim O’Brien: “Big data is a necessity
at scale: if you’re trying to listen to every
transatlantic phonecall, you need to use MapReduce…. If you don’t you
can probably scale with a database.”26
And
Andrew C. Oliver of search technology company Lucidworks recommends
that organizations consider stopping the use of certain popular tools.
“Some
technologies may be holding you
back. Remember, this is the fastest-moving area of enterprise tech —
so much so that some software acts as a placeholder until better bits
arrive….Those upgrades — or replacements — can make the
difference between a successful big data initiative and one you’ll be
living down for the next few years.”27
For
some industries and applications, popular tools may not even be an
option. As Mary Shacklett observes, while Hadoop is often “cheaper” and
“easier,” organizations like “life sciences, weather,
pharmaceutical, mining, medical, government, and academic companies and
institutions” typically need high-performance computing systems to
accommodate large data volumes.28
Many
companies may choose to use the ecosystem of products that Apache has
developed. These include not only the popular, longstanding Hadoop
and its newer alternative Spark, but also Airflow, Beam, Carbon Data,
and Cassandra.29 The emergence of a multiple-tool ecosystem,
such as exists in other technology domains, helps to alleviate some of
the confusion that organizations might experience in trying to piece
together various Big Data tools.
References
1
Kevin T. Smith. “Big Data Security: The Evolution of Hadoop’s Security
Model.” InfoQ. August 14, 2013.
2 Peter Wayner. “7 Top Tools for Taming
Big
Data.”
InfoWorld. April 18, 2012.
3 Derrick Harris. “A Programmer’s Guide
to Big
Data: 12 Tools
to Know.” GigaOM. December 18, 2012.
4 “Overview of CDS 2 Powered by Apache Spark.” Cloudera. October 1, 2018.
5 “Apache Spark.” IBM
Bluemix Catalog. IBM. September 21, 2017.
6 Katherine Noyes. “Five
Things You Need to Know About Hadoop v. Apache Spark.” InfoWorld.
December 11, 2015.
7 “IDC Forecasts Revenues for Big Data and Business Analytics
Solutions Will Reach $189.1 Billion This Year with Double-Digit Annual
Growth Through 2022.” IDC. April 4, 2019.
8 “Big Data, Bigger Marketing.” SAS.
9 Gary Eastwood. “3 Ways Big Data Is
Changing Financial Trading.”
NetworkWorld. June 20, 2017.
10 Reza Arghandeh and Yuxun Zhou. “Big Data Application in
Power Systems.”
Elsevier Science. 2017.
11 Rebecca Parker. “Big Data Analytics in Healthcare Market Growing Tremendously.” Market Report Gazette. September 28, 2019.
12 Mary
Shacklett. “How Big Data Developers Use Automation Tools to Identify
Mission-Critical Apps.” TechRepublic. March 14,
2017.
13 Ibid.
14 Kimberly A. Whitler. “Stop Focusing On Big Data And Start Focusing On Smart Data.” Forbes. August 20, 2019.
15 Jonathan Palmer. “From Big Data to
Smart Data.” Oracle: Health Sciences Blog.
October 6, 2016.
16 Jeremy Goldman. “Big Data Is So 2016. We
Need Smart Data.” Inc. March 21, 2017.
17 Tony
Baer. “2017 Trends to Watch: Big Data.” Ovum.
November 21, 2016.
18 Randy Bean. “How Big Data Is Empowering AI
and Machine
Learning at Scale.” MIT Sloan Management Review.
May 8, 2017.
19 NewVantage
Partners. “Big Data and AI Executive Survey 2019.” NewVantage
Partners. 2019.
20 Marc Ferranti. “Dell EMC Puts Big Data as a
Service on Premises.”
NetworkWorld. September 12, 2018.
21 Ari
Amster. “Survey: State of Big Data Adoption.” Qubole.
March 7, 2016.
22 Alfred
Tat-Kei Ho and with Bo McCall. “Ten Actions to Implement Big Data
Initiatives.” IBM Center for the Business of Government.
2016.
23 Adam Kleinberg. “5 Small
Ways to Use
Big Data to
Majorly Improve Business.” Entrepreneur. August
20, 2013.
24 Kimberly A. Whitler. “The CEO/CMO
Dilemma: So Much
Data, So Little Impact.” Forbes. July 18, 2012.
25 Aberdeen. “Best Practices for Implementing Big Data and Data Sciences for Analytics.” Aberdeen. February 24, 2017.
26 Jack Clark. “Big Data Tools Cost
Too Much, Do
Too Little.” The Register. February 28, 2013.
27 Andrew C. Oliver. “7 Big Data Tools to Ditch
in 2017.” InfoWorld.
October 6, 2016.
28 Mary Shacklett. “4 Steps to
Implementing High-Performance Computing for Big Data Processing.”
TechRepublic. February 20, 2018.
29 Shailna Patidar. “7 Emerging Big Data Technologies to Watch Out For.” Analytics India. October 3, 2018.
Web Links
[return to top
of this report]
- Apache Hadoop: http://hadoop.apache.org/
- Apache Spark: http://spark.apache.org/
- Dell EMC: https://www.dellemc.com/
- Google: http://www.google.com/
- Google Trends: http://www.google.com/trends/
- H20: https://www.h2o.ai/
- Oracle: http://www.oracle.com/
Quantcast: https://www.quantcast.com/ - SAP: http://www.sap.com/
About the Author
[return to top
of this report]
Geoff Keston is the author of
more than 250 articles that help
organizations find opportunities in business trends and technology. He
also works directly with clients to
develop communications strategies that improve processes and customer
relationships. Mr. Keston has worked as a project manager for a major
technology consulting and services company and is a Microsoft Certified
Systems Engineer and a Certified Novell Administrator.
[return to top
of this report]