One look at the Nicholas Cage deepfakes or Jordan Peele’s deepfake PSA makes it clear that we’re dealing with strange new technology. These examples, while relatively harmless, raise questions about the future. Can we trust video and audio? Can we hold people accountable for their onscreen actions? Are we ready for deepfakes?
Deepfakes Are New, Easy to Make, and Growing Fast
Deepfake technology is only a few years old, but it’s already exploded into something that’s both captivating and unsettling. The term “deepfake,” which was coined on a Reddit thread in 2017, is used to describe the recreation of a human’s appearance or voice through artificial intelligence. Surprisingly, just about anyone can create a deepfake with a crappy PC, some software, and a few hours of work.
As with any new technology, there’s some confusion surrounding deepfakes. The “drunk Pelosi” video is an excellent example of this confusion. Deepfakes are constructed by AI, and they’re made to impersonate people. The “dunk Pelosi” video, which has been referred to as a deepfake, is actually just a video of Nancy Pelosi that’s been slowed down and pitch-corrected to add a slurred-speech effect.
This is also what makes deepfakery different from, say, the CGI Carrie Fisher in Star Wars: Rogue One. While Disney spent oodles of money studying Carrie Fisher’s face and recreating it by hand, a nerd with some deepfake software can do the same job for free in a single day. AI makes the job incredibly simple, cheap, and convincing.
How to Make a Deepfake
Like a student in a classroom, AI has to “learn” how to perform its intended task. It does this through a process of brute-force trial and error, usually referred to as machine learning or deep learning. An AI that’s designed to complete the first level of Super Mario Bros, for example, will play the game over and over again until it figures out the best way to win. The person designing the AI needs to provide some data to get things started, along with a few “rules” when things go wrong along the way. Aside from that, the AI does all of the work.
The same goes for deepfake facial recreation. But, of course, recreating faces isn’t the same as beating a video game. If we were to create a deepfake of Nicholas Cage hosting the Wendy Williams show, here’s what we would need:
A Destination Video: As of right now, deepfakes work best with clear, clean destination videos. That’s why some of the most convincing deepfakes are of politicians; they tend to stand still at a podium under consistent lighting. So, we just need a video of Wendy sitting still and talking. Two Datasets: For mouth and head movements to look accurate, we need a dataset of Wendy Williams’ face and a dataset of Nicholas Cage’s face. If Wendy looks to the right, we need a photo of Nicholas Cage looking to the right. If Wendy opens her mouth, we need a picture of Cage opening his mouth.
After that, we let the AI do its job. It tries to create the deepfake over and over again, learning from its mistakes along the way. Simple, right? Well, a video of Cage’s face on Wendy William’s body isn’t going to fool anybody, so how can we go a bit further?
The most convincing (and potentially harmful) deepfakes are all-out impersonations. The popular Obama deepfake by Jordan Peele is a good example. So let’s do one of these impersonations. Let’s create a deepfake of Mark Zuckerberg declaring his hatred of ants—that sounds convincing, right? Here’s what we’ll need:
A Destination Video: This could be a video of Zuckerberg himself or an actor who looks similar to Zuckerberg. If our destination video is of an actor, we’ll simply paste Zuckerberg’s face on the actor. Photo Data: We need photos of Zuckerberg talking, blinking, and moving his head around. If we’re superimposing his face on an actor, we’ll also need a dataset of the actor’s facial movements. The Zuck’s Voice: Our deepfake needs to sound like The Zuck. We can do this by recording an impersonator, or by recreating Zuckerberg’s voice with AI. To recreate his voice, we simply run audio samples of Zuckerberg through an AI like Lyrebird, and then type out what we want him to say. A Lip-Sync AI: Since we’re adding the voice of fake Zuckerberg to our video, a lip-sync AI needs to make sure that the deepfake facial movements match what’s being said.
We’re not trying to downplay the work and expertise that goes into deepfakery. But when compared to the million dollar CGI job that brought Audrey Hepburn back from the dead, deepfakes are a walk in the park. And while we haven’t fallen for a political or celebrity deepfake just yet, even the crappiest, most obvious deepfakes have caused real harm.
RELATED: The Problem With AI: Machines Are Learning Things, But Can’t Understand Them
Deepfakes Have Already Caused Real-World Harm
As of right now, the majority of deepfakes are just Nicholas Cage memes, public service announcements, and creepy celebrity porn. These outlets are relatively harmless and easy to identify, but in some cases, deepfakes are successfully used to spread misinformation and hurt the lives of others.
In India, deepfakes are employed by Hindu nationalists to discredit and incite violence against female journalists. In 2018, a journalist named Rana Ayyub fell victim to such a misinformation campaign, which included a deepfake video of her face superimposed on a pornographic video. This led to other forms of online harassment and the threat of physical violence.
Stateside, deepfake technology is often used to create nonconsensual revenge porn. As reported by Vice, many users on the now-banned deepfakes Reddit forum asked how to create deepfakes of ex-girlfriends, crushes, friends, and classmates (yes, child porn). The problem is so huge that Virginia now outlaws all forms of non-consensual pornography, including deepfakes.
As deepfakes become more and more convincing, the technology will undoubtedly be used for more dubious purposes. But there’s a chance that we’re overreacting, right? Isn’t this the most natural step after Photoshop?
Deepfakes Are a Natural Extension of Doctored Images
Even at their most basic level, deepfakes are unsettling. We trust video and audio recordings to capture people’s words and actions without any bias or misinformation. But in a way, the threat of deepfakes isn’t new at all. It’s existed since we first started using photography.
Take, for instance, the few photographs that exist of Abraham Lincoln. The majority of these photographs (including the portraits on the penny and the five dollar bill) were doctored by a photographer named Mathew Brady to improve Lincoln’s spindly appearance (specifically his thin neck). Some of these portraits were edited in a manner that’s reminiscent of deepfakes, with Lincoln’s head superimposed on the bodies of “strong” men like Calhoun (the example below is an etching, not a photograph).
This sounds like a bizarre bit of publicity, but during the 1860s, photography carried a certain amount of “truth” that we now reserve for video and audio recordings. It was considered to be the polar opposite of art—a science. These photos were doctored to intentionally discredit the newspapers that criticized Lincoln for his weak body. In the end, it worked. Americans were impressed by Lincoln’s figure, and Lincoln himself claimed that Brady’s photos “made me president.”
The connection between deepfakes and 19th-century photo editing is oddly comforting. It offers us the narrative that, while this technology has serious consequences, it isn’t something that’s entirely out of our control. But, sadly, that narrative may not hold for very long.
We Won’t Be Able to Spot Deepfakes Forever
We’re used to spotting fake images and videos with our eyes. It’s easy to look at a Joseph Goebbels family portrait and say, “there’s something strange about that guy in the back.” A glance at North Korean propaganda photos makes it evident that, without YouTube tutorials, people suck at Photoshop. And as impressive as deepfakes are, it’s still possible to spot a deepfake on sight alone.
But we won’t be able to spot deepfakes for much longer. Every year, deepfakes become more convincing and even easier to create. You can make a deepfake with a single photo, and you can use AI like Lyrebird to clone voices in under a minute. High-tech deepfakes that merge fake video and audio are incredibly convincing, even when they’re made to imitate recognizable figures like Mark Zuckerberg.
In the future, we may use AI, algorithms, and blockchain technology to fight against deepfakes. Theoretically, AI could scan videos to look for deepfake “fingerprints,” and blockchain tech installed across operating systems could flag users or files that have touched deepfake software.
If these anti-deepfake methods sound stupid to you, then join the club. Even AI researchers are doubtful that there’s a true solution to deepfakes. As detection software gets better, so will deepfakes. Eventually, we’ll reach a point where deepfakes will be impossible to detect, and we’ll have a lot more to worry about than fake celebrity porn and Nicolas Cage videos.