PDF version of this report
You must have Adobe Acrobat reader to view, save, or print PDF files. The reader
is available for free
download.
Deepfake and AI-Generated Security Threats
Copyright 2021, Faulkner Information Services. All Rights Reserved.
Docid: 00021098
Publication Date: 2107
Report Type: TUTORIAL
Preview
We live in a period of history where the Internet has become
one of the most important tools available to humankind. It has enhanced
global communication by orders of magnitude, increased the knowledge
available to any given individual by incomprehensible amounts, and provided
on-demand entertainment that we could only dream of just a decade ago.
Unfortunately, as with any tool this powerful, it has also been used for
nefarious purposes. The most prevalent and well-publicized of these malicious
uses in recent years has been the proliferation of fake news and propaganda via
Web sites, social media, and online communications hubs. This trend has impacted
humanity at every level, causing suffering to innocent individuals, and swaying
the fates of entire nations by influencing the election of their top officials.
However, to this point, that influence has been tempered by the ability of
careful, thoughtful people to see through the lies and attempts at trickery
by simply using their own common sense and intuition. Now, that safeguard may be
taken away by emerging technologies capable of creating depictions of actual and
fictional human beings so realistic, so convincing, that even a family member
might have a hard time telling it apart from the real thing. This report details
the threat technologies like deepfake and AI-generated online personas pose to
personal privacy and security, the methods being used to produce these deceptive
images and videos, and the countermeasures available to protect against
the dystopian threat of a forthcoming information age in which no form of
electronic communication can be completely trusted.
Report Contents:
- Definition and Explanation
- Current AI-Based Security Threats
- Combating Fake News
- Summary
- Related Reports
Related Faulkner Reports |
The Internet’s Fake News Problem |
Definition and Explanation
[return to top of this report]
Although this report focuses on the singular topic of how AI-based image and
video manipulation could pose a global security threat, there are two very
distinct forks to this possible threat. The first is the manipulation of images
by AI-based software for the purpose of changing the appearance of a real
person, or creating a completely fictional person that is indiscernible from one
of actual flesh and blood. The second branch is the usage of AI-based software
to alter video and audio to make it appear that a person is saying something
they did not say or doing something they did not do. Although both of these
software tools can be used to achieve similar goals, the actual products that
can be produced by using them vary greatly in the purposes to which they can
be put and the goals they can achieve. For this reason,
this section – and some subsequent sections of the report – will be split into two
segments: image manipulation and video manipulation.
Image Manipulation
Image manipulation is by no means a new technology. Pictures have been
getting "Photoshopped" since Adobe debuted the photo editing software in the 80s, and non-electronic methods of
image manipulation had been in use for many years prior. However, this
report’s focus is specific to methods of manipulating and creating images that
employ artificial intelligence to accomplish their goals. While a skilled artist
could use Photoshop to make nearly anything seem real, people with that skillset
are rare, often requiring years of training to reach that level of aptitude.
But, thanks to AI, the same ends can now be produced by a novice with no
experience whatsoever. On the one hand, this may seem relatively benign,
including things like cute Instagram and Whatsapp filters designed to add crowns
to an image or make it look like a person is sticking out a giant dog’s tongue.
However, a very similar technology can be used to make individuals appear old or
weak, or even to make it appear as if they are wearing blackface. Given the
constant flood of news the public is inundated with about which politician is
apologizing for which controversial blunder, it is easy to see
how such images could be employed by purveyors of fake news to sow discord and
misinformation.
In a similar vein, those same propagandists now also have tools in their
arsenal to create AI-generated images of photorealistic human beings to use as
the face for their influence campaigns. Need a young woman to prove that female
voters support your controversial candidate? Just cook one up using AI without
any of the pesky legal risks of using an image of an actual person who may or
may not agree with your point. Need a man to pose as a member of your opposition
while writing vile social media posts in order to damage their
credibility? Simply compile one using AI to serve as the straw man your
plan requires. The ability to create imaginary people at a whim is
not, in and of itself frightening, but the uses to which this option can be put
(covered in more detail later in the report) are truly terrifying.
Video and Audio Manipulation
While the aforementioned act of "Photoshopping" an image is common enough to
have entered the public vernacular, most people have been fairly confident, to
this point in history, that similarly convincing manipulation of video cannot be
accomplished by anything short of a big budget Hollywood film. Even then, special effects often fail to completely fool
the audience thanks to tell-tale signs of manipulation, computer generated
imagery, or camera trickery. However, this is now changing thanks to the ability
of newly developed AI-based software to manipulate video in ways that are far
more convincing than even the best Hollywood has to offer. Perhaps most
frightening of all is the fact that this can be accomplished with relatively few
tools, for relatively little cost, using consumer-level hardware many readers
likely already have in their homes. If manipulating an image is enough to
convince some fake news readers that a person has participated in an illicit
act, imagine the impact of a similarly doctored video of that person appearing
to do or say something completely unforgivable.
Although deepfake has become something of the face of this type of
technology, it is by no means the only tool to accomplish the task, nor is it a
monolithic piece of software produced by a single developer. Indeed, deepfake,
or some slight variation of the word, is already creeping towards filling the same
linguistic purpose as "Photoshop," a catch-all verb designed to reference any
AI-based manipulation of a video.
Alongside video manipulation, AI-based audio manipulation has also progressed
to the point where it is literally possible to put words into a person’s mouth.
Using technology similar to that powering our AI-based digital assistants, bad
actors can now take a database of voice clips of a public figure, process that
data, and create a tool that allows them to script anything they wish. The
result will be a convincing replica of the person in question actually reading
that script. Although no examples yet exist of this technology verifiably being
used in criminal or nefarious activity, there are already instances of it being
used in the entertainment industry to literally put words into the mouths of
individuals who have passed on, including a documentary on the late chef
Anthony Bourdain that includes quotes never actually spoken by the
man, but generated via AI following his death.1 While this use
is relatively harmless, if arguably morbid, it could be combined with the aforementioned video manipulation tools
create a recipe for putting any person you choose into any number of
compromising, embarrassing, or illegal situations.
Current AI-Based Security Threats
[return to top of this report]
While the previous section laid the groundwork for the nature and
capabilities of the threats posed by AI-based image and video manipulation, this
section will dive deeper into the specific ways in which those threats can be
employed with examples of how they have already been used to demonstrate
possible vectors of attack, as well as a few early examples of malicious parties
attempting to use the technology to fool the public.
Image Manipulation
As stated above, image manipulation has posed a danger to the trust we have placed
in photographs for several decades now. However, with AI-powered manipulation
added to the mix, images can now be changed quicker, and often to a more
convincing degree, than almost any human can match. In fact, one of the key
capabilities of AI to deceive is its ability to manipulate images in near real
time. Imagine a much more advanced version of the Instagram or Snapchat filters
mentioned above that alters the subject's appearance not into something cartoonish
and whimsical but realistic and believable. While this may seem like an
advanced capability that should require either a lengthy processing time or
significant computing power, it has already reached the point where it can be
accomplished by a simple, freely available smartphone app. This reality has lawmakers greatly concerned.
The best available example of this technology so far is likely is a
mobile app called FaceApp. The app originally launched in 2017 but took more
than a year to come into the spotlight due to a viral surge of celebrities using
an age changing filter to show what they could look like as
older or younger versions of themselves. The surprisingly convincing results of
the app’s aging or de-aging efforts led many users to try it out for
themselves, giving the software and its developers access to detailed photos and
facial scans of millions of users, including some very important, very high
profile individuals. It is this information gathering capability that began to
concern lawmakers in mid-2019. The first official backlash came from the US
Democratic National Committee (DNC), which advised its members, particularly
those participating in active campaigns, to avoid using the software entirely.2
The primary reason for this cautionary statement seemed to
be the origin of the app, which was developed by a team located primarily in
Russia. The DNC’s well-known history of being hacked by Russian citizens makes
the precaution understandable, even without the nearly unfettered access the app
requests to the user’s uploaded photos.
Figure 1. FaceApp’s Age Progression Demo Image
Source: FaceApp.com
While the app is used to alter photos for the user’s own enjoyment,
those same images could be manipulated to a politician’s detriment. One need
only look at the history of FaceApp itself for a prime example. The app originally launched with "Black, White and Asian"
filters in its repertoire.3 Even prior to recent controversies surrounding public figures and their use
of or stance on "blackface,"
the backlash this caused was monumental. The filters were eventually removed
from the app and the developer later apologized. Imagine a scenario in which
the DNC’s fears prove true and the developers of FaceApp provide access to its
image database, knowingly or unknowingly, to Russia-based hackers. A cache of
celebrity and politician images could be mined for anyone that the malicious
actors want to target for character assassination. The image could then have
this "digital blackface" filter applied and be posted somewhere it could be
seen by millions of people. The subject of the image could protest its
existence, try to prove that it was altered without their knowledge or against
their will. But, recent history shows quite well how impossible it is for
even the most outlandish fake news stories to be pushed completely out of the
public consciousness, even after evidence disproving their claims has been found
and verified.
It is the danger posed by that this level of access to images of public figures that led Senate Minority Leader Chuck Schumer to call on the FBI and Federal
Trade Commission (FTC) to investigate the app.4 Schumer wrote in his
letter to the FBI that "FaceApp’s location in Russia raises questions regarding
how and when the company provides access to the data of U.S. citizens to third
parties, including foreign governments."5 He went on to ask the FBI
to "assess whether the personal data uploaded by millions of Americans onto
FaceApp may be finding it way into the hands of the Russian government, or
entities with ties to the Russian government." Although FaceApp’s
developers vehemently deny the idea that they share users' data with any
unauthorized third parties, the accusations remain troubling and could very well
lead to foreign actors attempting to exploit similar repositories of these types
of images in the future, even if they have not already attempted it.
Image Generation
Manipulating the appearance of real-world individuals can undoubtedly be used
to damage their public image. But, what if no usable images are available? Or,
what if you, as a purveyor of libelous propaganda, wish to create a false
narrative that cannot be supported by any amount of image manipulation? How
could these new AI-based security threats benefit you? One extremely useful way
is by creating a digital straw man through which your narrative can be
spread via social media channels. Of course, this avenue of attack
generally requires an accompanying online persona to back these opinions. One
could simply pilfer a real person’s image, but that brings the risk of
being accused of identity theft and having your operation shut down. What if,
instead of using a real person, you could create one from whole cloth, tailored
specifically to the needs of your political or social stance? This is already
quite possible.
Figure 2. An Artificial Headshot Generated by ThisPersonDoesNotExist.com
Source: ThisPersonDoesNotExist.com
The availability of this type of technology came into the public eye in 2019
when a Web site titled ThisPersonDoesNotExist.com was launched.6 The
Web site employs a technology known as "generative adversarial networks" or GANs,
originally developed by video processing tech maker Nvidia, to create photo-realistic images of human beings that, as the name would suggest, do not
actually exist.7 Although the person in the image may slightly
resemble any of the actual people from which the various portions of their face
were drawn, the final result is a perfect digital Frankenstein’s Monster, almost
completely indiscernible from a real image of a real human being. One article on
the site’s launch called the images it generates "disturbingly convincing, a
warning against trusting images and a whisper of just how gnostically paranoid
everything is going to get."8 This description is an apt one that
gets to the heart of what this report is truly about: that technology is quickly
advancing to the point where it can alter the forms of media we rely on for the
truth to the point where any scenario can be made to seem real. While the
concept of a complete loss of trust in photographic images may seem
unrealistically dystopian, it is, nonetheless, coming to pass, and the
criminals and malefactors who could benefit from it are salivating at the ways
in which it could help their cause.
Video Manipulation
Video of an event is often seen as the
ultimate arbiter of its veracity. Eyewitness accounts of something having
happened can be misremembered or an outright lie, and images can be
manipulated, as noted in detail above. But, video was thought to be immune to such
deception due to the difficulty that even movie studios have in making mundane, but untrue, events seem completely real.
Take, for example, the now-infamous case of Superman’s moustache. Actor Henry Cavill grew a moustache for his role in the sixth
film of the Mission:
Impossible franchise. Due to a contractual obligation requiring him
to maintain his public appearance for that role, Cavill was forced to refrain from
shaving the mustache during the period in which he was filming his scenes as
Superman in the Justice League movie. It was decided that the moustache would
have to be digitally removed from the film to conceal the very
un-Superman facial hair from appearing in Justice League. Despite the seeming
simplicity of this special effect and the massive budget available to the
film’s makers, the resulting digital alterations to Cavill’s face went down in
film history as one of the most embarrassing CGI (computer generated imagery)
failures ever to besmirch a blockbuster offering.9 While this rather
lengthy aside may seem like a complete nonsequitur, it is included to illustrate the extreme difficulty that even highly skilled
professionals have when attempting to convincingly alter the human face while it
is in motion in a video. The results, despite their limitless resources and
best efforts, almost always seem slightly off, usually straying into what’s know
as the Uncanny Valley, a term referring to the tendency for imperfect attempts
at artificial replications of human being to create unease or disgust within the
people viewing them.
Ironically, the artificial intelligence that we find so hard to
convincingly humanize has surpassed us in the goal of
convincingly replicating human motion to the point where video can be produced
of a real, recognizable human being doing or saying something that he or she
never did or said. As with image manipulation, this new wave of technological
trickery can be best shown off by a single example of AI-based video
manipulation: deepfake. The term deepfake was created in 2017 when it
was demonstrated that the same generative adversarial networks that power FaceApp could be applied to video.10 While the technology has obvious
benign or even beneficial applications in the entertainment industry, it has
generally made the news not for the special effects it can produce but for the
danger it poses. The most obvious, and arguably greatest, of these is the ability
to put words into the mouth of a powerful politician or world
leader. This was demonstrated most famously by actor and filmmaker Jordan Peele
when he commissioned and helped produce a video of former President Barack Obama
presenting a public service announcement on the dangers of deepfake.11
In the video, Obama appears to lay out the possibilities deepfake offers,
references one of Peele’s own movies, and insults his successor, all before it is
revealed that the audio portion of the clip is actually being spoken by Peele
himself, with deepfake technology handling the task of nearly perfectly syncing
the video’s lips to his words. While Peele used the video to illustrate how
easily someone could be fooled by the first few moments of it, he did eventually
reveal it to be fake. If, instead, someone produced a similar video of Barack
Obama swearing, insulting the sitting president, or saying anything they chose
for him to say, the backlash and outrage would be monumental, with Obama’s political enemies likely latching onto the video as the proof they
have always wanted that he is a truly terrible human being.
Some might believe that even visually convincing videos such as the Barack
Obama example above would never be trotted out by political adversaries for fear
of discovery that the content was produced by fraudulent means. Those people would be verifiably wrong. In fact, a simple video
editing trick that could be accomplished by even the barest novice video editor
has already fooled many high-powered individuals, up to and including the sitting
President of the United States, and has been used in a subsequent attempt to
politically weaponize its content against a political rival. The incident in
question, which did not even need to rely on deepfake technology, involved a
pair of videos of Speaker of the House Nancy Pelosi. In one of these
doctored films, the Speaker’s speech was slowed down by a significant amount to
make it appear that her speech was being slurred during a news conference.12
This was followed by a second video in which that similar footage of the Speaker
was aired on the Fox News program Lou Dobbs Tonight.13 In this
segment, the video was edited in a similar fashion while also stringing
together
several verbal stumbles in order to make it appear that the Democratic
Representative was having severe difficulty speaking. The clip was portrayed by
Dobbs as evidence that Pelosi was in cognitive decline. Making matters worse is
the fact that former President Trump himself tweeted out that exact segment with text
saying "PELOSI STAMMERS THROUGH NEWS CONFERENCE."14
Whether Trump was actually fooled by the video or not is largely irrelevant
in this case. The true point of this example is simply to show the impact
something as simple as a misleading video edit can have on the news cycle.
Imagine, then, something as sophisticated as deepfake technology being applied
to the same video. Rather than simply appearing to be drunk or suffering from
cognitive difficulties, Pelosi could have been made to say whatever a political
rival wished. Indeed, the congresswoman could have been made to extol the
virtues of Satan or her love for Adolf Hitler, and the video would have been
nearly indiscernible from the truth to some human eyes.
Readers may be wondering at this point about the accompanying audio required
for these media nightmare scenarios to play out. After all, if Pelosi had never
said the things mentioned in the theoretical example listed above, then there
would be no appropriate voice recording to provide the necessary phrases needed
to back the visual fakery with audio. Unfortunately for fans of the truth, audio
processing technology has also advanced to a point where it is able to produce
artificial speech that is nearly indistinguishable from an actual human voice. Two
standouts in this area are WaveNet, a Google-backed audio processing technology
able to "generate speech which mimics any human voice and which sounds more
natural than the best existing Text-to-Speech systems," and Adobe’s Project Voco,
a "Photoshop for voices" that can take as little as twenty minutes of audio speech
recordings from a subject and use it to artificially reproduce that person’s
speech uttering phrases they have never said in entire their life.15,16
Either of these software solutions, as well as several others currently being
developed, can be used to handle the speech portion of deepfake videos. Or, one
could take a less technical approach and simply find a talented individual to
mimic the person’s voice, an option which the aforementioned example of Jordan
Peele’s Barack Obama video showed can be eerily effective.
Countermeasures
[return to top of this report]
One may feel that the world is doomed to descend into
an apocalyptic confusion in which no recorded image or video can ever be trusted
again. You may ask, "If this type of technology already exists, what will the
news look like in ten years, or twenty? Will it all be AI-generated lies designed to
advance the agenda of some corporation or politician?" While that outcome can,
unfortunately, not be entirely ruled out, it most definitely can be avoided.
This is because technology is ultimately a tool. In nearly any area of
technological development where humans are using it for nefarious purposes,
there are other humans using it to combat those malignant goals. These "good
guys" are essentially fighting fire with fire by employing the same
technological advancements being put to criminal uses in order to maintain law
and order, and, particularly in this case, to protect the truth. "White Hat"
hackers, cybersecurity experts, cryptographers, and many other professions have
all trod this road before, using fraudsters’ own tools against them. This
section will cover some of the most important tools currently available for combating image and video manipulation, while also illustrating
ways in which the average person can employ their own common sense to fight the
influx of AI-powered fake news.
Tech Tools for Fighting Image and Video Manipulation
There are two ways in which fake images and videos can be fought: by stopping
them before they are created or by detecting them after they are made. The first is perhaps the more cutting edge. In a paper titled "Hiding Faces in
Plain Sight: Disrupting AI Face Synthesis with Adversarial Perturbations," a
team of researchers from Cornell University suggest disrupting the generative
adversarial networks (GANs) used by
FaceApp, deepfake, and other AI-based image and photo manipulation tools by
purposely injecting noise into published images.17 Essentially, they
suggest introducing randomized digital artifacts into images as a way to reduce
their usability by synthesis tools. These artifacts would corrupt the
information provided by source images, reducing the quality of the final product
to the point that the video or image would have obvious visual distortions or
artifacts of its own. While this option sounds promising in theory, the
application of such a technique could prove problematic. It would essentially
require every publicly available image of a given individual to be digitally
altered in order for this noise to be added. This might be feasible
for all images published via first party or official channels, but it becomes
less likely that all news outlets capturing images of the individual would
comply, and essentially impossible that all private citizens would, or could,
participate. Just imagine the number of high-quality selfies that Barack Obama
must have taken in his political career. Surely these would provide more than
enough information for image or video manipulation, and would never have had the
necessary noise added to prevent their use.
While preventative measures such as the one above do show promise, and they
may eventually reach the point where they can adequately protect a person’s
image from being used as a source in the first place, it is with the detection
of image and photo manipulation that the most immediately promising technology
now lies. One of the current leaders in this area is a professor at Dartmouth
College named Henry Farid. The professor and one of his graduate students have
published literature on a new method of detecting deepfakes that they call a "softbiometric
signature."18 This technique uses a similar process to the technology
it is trying to counteract by processing hours upon hours of video of a given
individual. Rather than using this data to create a fraudulent representation of
them, it detects movements and expressions particular to them. This profile can
then be compared to a given video to detect any discrepancies with that person’s
typical behavior. The professor, once again using Barack Obama as an example,
explained to CNN that the former President has a tendency to frown and tilt his
head downward when delivering negative facts, while he tilts it upward when
delivering happy news. These tendencies, and others like them can be compiled
into a comprehensive reference against which new videos of an individual can be
checked. Farid claims he is also working on other systems that use GANs
themselves to detect deepfakes, rather than create them. However, the professor
told CNN he is reluctant to go into too much detail on how such detection
systems work, due to the fact that that information could be employed by
fraudsters to improve their own technique. A colleague of the professor – Siwei
Lyu, director of the computer vision learning lab at Albany SUNY – illustrated
such dangers in the same CNN article, referencing an instance in which he
publicly mentioned that deepfakes could be detected by examining the unusual
patterns of blinking by the individual in the video. Lyu noted that less than
one month after he made this statement someone generated a deepfake video with
realistic blinking.19
If this fraud virus detection battle seems like somewhat of an arms race,
that is because it very much is, and it is not even a particularly new one.
Going back to the earlier mention of Adobe’s Photoshop being one of the first
tools available to accomplish digital image manipulation, that company has now
begun employing AI to detect when someone has used its software to alter an
image.20 It does this by looking for edits made by Photoshop’s
popular liquify tool that allows image manipulators turn a frown into a smile, change the position of a limb, or alter
various other physical characteristics. The company fed an AI program numerous
pairs of images, one of which had been altered using the tool. The software
eventually learned to detect telltale signs of digital trickery, allowing it to
correctly spot alterations in 99 percent of subsequent test images. For
comparison, human test subjects were only able to detect the same alterations
with an accuracy of 53 percent.
Figure 3. An Example of Adobe’s Detection AI at Work
Source: Adobe
While this tool is not specifically
designed to detect the types of AI-generated images referenced throughout most of
this report, the technology powering it could certainly be applied to that
application. In fact, it is with this AI vs AI scenario that the most promising
research into combating deepfakes and AI-manipulated imagery lies. Artificial
intelligences can process millions upon millions of images to detect differences
so minute that they would be unrecognizable to the human eye. This ability will
prove absolutely paramount in the race against fakers and fraudsters. However,
now that the genie is out of the bottle, it will almost definitely be a nearly
endless battle between those building ever-better fakes and those trying to
detect and stop them.
What you can do to detect image and
video manipulation
Unfortunately, as mentioned above,
humans are not the best at detecting when they are being duped. Deepfakes and
AI-generated images have already well outstripped the capabilities of most human
beings to detect them. That said, very few manipulations are completely perfect.
While some are very, very close, nearly all of them include some minor flaw
that, while easily written off as a photographic artifact, could instead be used
to out the image or video as a fake. Below is a list of telltale signs to look
for in images and videos that may betray the fact that they have been
manipulated.
Images
Figure 4. An Example of AI-Generated Imagery Creating Distortion via Incorrect Processing
Source: ThisPersonDoesNotExist.com
Distortion – While AI-based
manipulation of images does not employ Photoshop or any human-interface
software for that matter, it does employ similar techniques. One of these is
distorting an image to make it appear closer to the desired end result. The same
techniques used to make models appear impossibly perfect can also be employed to
overlay the face of an unwilling victim onto an image of someone else entirely.
However, the process of making that source image of the victim’s face fit into
the final image requires it to be distorted. If done well, this will take into
account everything from perspective to lighting sources and even barely
perceptible facets like the way in which light passes through the slightly
translucent surface of human skin. However, it is rare that such perfection can be
accomplished by humans or machines. Slight variations in the line of a silhouette,
an isolated discrepancy in the resolution of a certain area of an image, or a
facial or body feature that seems to be bent unnaturally are all examples of the
type of distortion that can be a dead giveaway. In the above image, this type
of flaw can be seen in the right lens of the woman’s glasses. Rather than
correctly rendering the background to match the rest of the image, the AI
produced a distorted, blurred spot that includes colors that shouldn’t be
present and bent, wavy representations of what should be a fairly clear image of
what is behind that lens.
Figure 5. An Example of an AI Attempting to Blur Portions of an Image to Match Features
Source:ThisPersonDoesNotExist.com
Blurring – Both humans and AI
can often use subtle blurring to mask portions of an image that have been
altered. In the above image, this can be seen in the man’s mouth. While the
majority of his face is in sharp focus, the image becomes inexplicably blurry
around the left side of his lips and teeth. This is likely due to the mouth
portion of the image having been sourced from a photo taken from a slightly
different perspective. The automated process of reshaping that mouth to fit the
finished product resulted in the pixels contained within that part of the mouth becoming compressed or stretched to the point where the blue was introduced. A
similar effect can be seen at the top of the subject’s visible ear. Although
this may appear fairly obvious when pointed out, it is still a relatively subtle
effect, and one that could very easily be missed when looking at the often
low-resolution images used on social media.
Figure 6. An Example of a Somewhat Obvious Artifact within an AI-Generated Image
Source:ThisPersonDoesNotExist.com
Artifacts – Artifacts in
digital images can take many forms, such as strange blocks of
pixels, inexplicable rainbow colors where none should be, or simple black spots
within an image. In the above example, several artifacts can be seen. The most
obvious are the ones present in the woman's hat. The AI that generated this
image was unable to correctly replicate the texture of the knitted material. It
misconstrued or inadequately distorted its source imagery in several spots,
producing white voids surrounding by inexplicable bright spots. This may have
been due to it interpreting something within its source image as a reflection
or a simple inability for it to knit (no pun intended) the gaps in the source
images together correctly. Another slightly more subtle example can be seen in
the subject’s smile. While all of the other visible teeth appear normal, the
last visible incisor on the left shows a strange bubble-like formation that
almost appears to overlap the woman’s lip, despite the tooth clearly being
behind the lip. This might be another example of reflectivity throwing off the
AI, suggesting that it has some difficulty with processing images using
strong light sources.
Putting it all together – These are just a few examples of how AI can essentially "mess up" when
attempting to create a fake image. Signs like these can appear in human
manipulations as well as AI-based manipulations of existing images and those
created from whole cloth. Unfortunately, not all fakes will be as imperfect as
the three example images provided. This can be seen in the fact that the same AI
that produced these also produced the much closer to flawless example in Figure
2 of this report. However, if there are flaws within a given image, humans can
be surprisingly good at spotting them. Yes, training and education can play a
big part in the ability to detect fakes. However, instinct also has its role.
The earlier reference to the "Uncanny Valley" effect is an important one. Nearly
everyone has looked at an image or a video and thought "something about this
seems … off." That gut feeling is typically the brain detecting some discrepancy
between what it is seeing and what it knows a human should look
like. It could be something as simple as a lack of reflection in the subject’s
eye or the strange musculature visible when they move their mouth. In any case,
it is often wise to trust feelings like these. Our brains are better than we
give them credit for at spotting manipulation, even if it is often at a
subconscious level.
Video
Unfortunately it is much more
difficult to detect a well-faked video. Although there are many reasons for
this, the primary one is that they simply tend to move too fast. This is not to
say that the subject of the video is actually moving quickly. Rather, it refers
to the fact that each frame of the video is only visible for a fraction of
a second, giving the viewer almost no time at all to detect alterations that
may have been made to it. Imagine if one of the three example images above was
shown for only 1/24th of a second, a common rate of frames per second used for
many videos. Would the viewer still be able to detect the same, seemingly
obvious discrepancies under those circumstances? Assuming the
flaws were gone in the next frame or were replaced by a different set of flaws
in each subsequent frame, almost certainly not. The aspect of the human brain
that allows us to see rapidly switched static images as motion rather than as a
slideshow is the same aspect that could facilitate such trickery. The brain
processes a smoothed over average of the images it is perceiving, often ignoring jagged
changes in position between frames and mentally smoothing that motion into a
much more natural-seeming one. This same tendency is what could hide the flaws
that might be present in only a small portion of the frames within a deepfake.
With all of this said, there are still a few signs that can often given away a
faked video.
Odd mannerisms – This is a
variation on the research of Professor Farid mentioned above. Human beings tend
to have a unique set of mannerisms. Certain facial expressions, talking with ones
hands, emotional tells, and more can all identify a person even if
their appearance is somehow obscured. Actors study these physical traits for
years in an attempt to hone their craft, often still failing to hide their own
tendencies within the artificial mannerisms of a given role. This method of
detection would of course work best when the viewer is already familiar with the
mannerisms of the subject of a potentially fake video. This could be because the
subject is a friend or family member, or simply because the subject is a famous
celebrity or politician. This is the rare case when being well-known to the
public consciousness may provide a protection against fakery.
Skin tone discrepancies – A
person’s skin tone is a highly unique characteristic of their appearance. Some
AIs are capable of altering the tone and color of their source imagery to create
a convincing series of frames within a video, even if those frames were
drawn from source images with subjects of varying skin tones. However, an
adequate selection of source imagery is not always available. This can lead to
a subject’s face, typically drawn from the actual target of the
fake, not matching the skin tone of the body, which is often drawn
from third-party sources that may not involve the target of the fake at all. In
a similar vein, skin tones change based on the available lighting. Even if the
AI is able to correctly match factors like facial and body skin tones,
it might not also match those same tones to the apparent light source in a video.
Imagine, for example, a video in which the subject’s left hand appears lighter
than their right, even though the light source is situated closer to the
subject’s right side. This odd bit of shadow could be a dead give away.
Muscle movement – This aspect
of faked videos may be the easiest to spot. Realistic muscle movements from
computer generated characters has been something of a holy grail in Hollywood
for decades. Unfortunately for makers of sci-fi and action films, it has rarely
been achieved. While some AIs are already better at it than some big-budget
filmmakers, it is still very, very difficult to precisely match the miniscule
muscle movements of the human face when manipulating a video. This is because of
the numerous micro-expressions any person will tend to go through even when
having a relatively emotionless exchange. Minor tics in the subject’s brows,
widening of the eyes, tiny smirks, nostril flares, a tightening of the
jaw … these are all things that need to be synced to the words a subject is
supposed to be speaking. While researchers like Professor Farid are designing AI
programs capable of spotting such oddities, humans can, with varying degrees of
success, look for them as well.
Summary
[return to top of this report]
Ultimately, we are only at the very beginning of an era in which AI-generated
imagery and deepfake-like video manipulation begin suffusing the world’s media
production. It may be that, in just a few years, we look back at this time and
laugh at how naive we were about the possibilities, or wish for the days when
AI-based manipulation was as crude as it is now. Some people might look to
lawmakers and regulators to ask, "Why aren’t you doing something to stop this?"
But how could they? Should they outlaw special effects companies that are too
good at their jobs? Stifle and curtail tools to make fantasies come to life on
the big screen just so some foreign propagandist would be unable to replicate a public
figure’s image to fulfill their nefarious goals? Even if they choose to attempt
this, some of the most impressive advancements in fakery mentioned in this
report are accomplished by private individuals and complete unknowns
tinkering with increasingly sophisticated AI frameworks available to nearly
everyone. Once again, the genie is already out of its bottle and there’s not
stuffing it back in.
How, then, can we protect ourselves from a nightmare
scenario in which no form of media can ever be trusted again? Of course, prevention and detection will play a
major role in any counter-offensive against the likes of deepfake and its ilk.
Researchers and security experts are only now getting their feet wet in this
arena and will have to become veteran soldiers in an ongoing war against AIs that
aim to deceive, honing their tools and techniques while malicious parties
continue to refine their own art.
However, none of this will matter if we allow
ourselves to stop paying attention. Most attempts at media fraud,
AI-based or otherwise, succeed because the victim was simply not paying
close enough attention to whatever was duping them. A perfect example
would be the proliferation of fake news perpetrated by social media
browsers that like to "just read the headlines." Nearly everyone has
said something along the lines of "I read on the Internet that…" based
solely on having seen a headline without ever looking into the article
it topped. Although this is typically a benign form of laziness, it can
have dire consequences when the content of that article would have
revealed the headline as complete hogwash. It is in this way that lies
are spread, by people that chose not to take the extra second to think
about what their eyes were seeing, to ask themselves if it really did
make sense, and to apply what facts they already know to every new fact
they learn to verify their likelihood of being true.
Imagine if the oft-reference video of Barack Obama did not include the reveal
of Jordan Peele as its producer. Some would be enraged at the insult Obama
issued to the sitting president. But, in the entirety of his public life, did Barack Obama ever use such language? Would it make any sense for a former
President that has been extremely private since leaving office to suddenly make
a video in which he made such an inflammatory statement? Maybe the viewer would think,
"It might be a good idea to examine this video a
bit more closely." Maybe that examination would lead them to say, "Hey, come to think
of it … his voice sounds kind of strange…." The point of this extended "what
if" scenario is to illustrate the importance of always remaining vigilant. We
are entering a period in human history where an ever-growing number of
malefactors stand to benefit from lying to us, duping us, and making us
complicit in their efforts to manipulate and deceive. It is, therefore, up to us
to do everything we can to guard against being unwilling tools in the hands of
these shadowy forces by always keeping a keen eye and a sharp mind bent towards
looking for the truth in everything we see, and never just taking anything for
granted.
References
1 Bonifacic. I. "The New Anthony Bourdain Documentary 'Roadrunner'
Leans Partly on Deepfaked Audio." Engadget. July 2021
2 O’Sullivan, Donnie. "DNC Warns 2020 Campaigns Not to Use FaceApp ‘Developed by Russians.’" CNN. July 2019.
3 McCoogan, Cara. "FaceApp Deletes New Black, White, and Asian Filters
after Racism Storm." The Telegraph.
August 2017.
4 O’Sullivan, Donnie. "Schumer Calls for Feds to Investigate FaceApp." CNN. July 2019.
5 Ibid.
6 "This Person Does Not Exist." ThisPersonDoesNotExist.com.
Retrieved July 2019.
7 Paez, Danny. "This Person Does Not Exist Is the Best One-Off Website
of 2019." Inverse. February 2019.
8 "This Person Does Not Exist." BoingBoing. February 2019.
9 Koerber, Brian. "Henry Cavill’s Mustache Was
Digitally Removed in ‘Justice League’ and It’s Laughably Bad." Mashable.
November 2017.
10 Schwartz, Oscar. "You Thought Fake News Was Bad? Deep Fakes
Are
Where Truth Goes to Die." The Guardian. November 2018.
11 Romana, Aja. "Jordan Peele’s Simulated Obama PSA Is a
Doubled-Edged Warning Against Fake News." Vox. April 2018.
12 Harwell, Drew. "Faked Pelosi Videos, Slowed to Make Her Appear
Drunk,
Spread Across Social Media." The Washington Post.
May 2019.
13 Novak, Matt. "Bull**** Viral Videos of Nancy Pelosi Show Fake Content
Doesn’t Have to Be a Deepfake." Gizmodo.
May 2019.
14 Ibid.
15 "WaveNet: A Generative Model for Raw Audio."
Deepmind.com.
Retrieved July 2019
16 "Adobe Voco ‘Photoshop-for-Voice’ Causes Concern." BBC News.
November 2016.
17 Li, Yuezun, et al. "Hiding in Plain Sight: Disrupting AI Face Synthesis
with Adversarial Perturbations." arxiv.org. June 2019.
18 Metz, Racel. "The Fight to Stay Ahead of Deepfake Videos
Before the
2020 US Election." CNN Business. June 2019.
19 Ibid.
20 Vincent, James. "Adobe’s Prototype AI Tool Automatically Spots Photoshopped
Faces." The Verge. June 2019.
About the Author
[return to top of this report]
Michael Gariffo is an editor for Faulkner Information Services. He
tracks and writes about enterprise software, the Web, and the IT services
sector, as well as telecommunications and data networking.
[return to top of this report]