Deepfakes began to garner attention in 2017, when a video of Jordan Peele posing as former President Barack Obama went viral. In the years since, the disturbing potential dangers surrounding deepfakes have come to light, affecting people around the globe.
What is a Deepfake and Voice Cloning?
A deepfake is a form of artificial intelligence that replaces an existing image or video with the likeness of another person’s face. Deepfakes can be as harmless as someone humorously impersonating a friend. However, recent trends have shown the use of deep fakes in political spheres, where they can be used to spread misinformation.
Voice cloning is a sect of deepfakes that focuses on replicating someone’s voice. This has been used more often and has been introduced by various companies as a fun gadget tool. Adobe’s VoCo is at the forefront of this and after hearing your voice for twenty minutes, the technology can replicate your voice. Adobe is researching how to make a watermark detection system in order to prevent forgery.
What are the Potential Consequences?
Though we have not seen deepfakes being used to cause misinformation on a global level, it has been used today against regular people and voice cloning is the most popular among deepfakes being used. Many big corporate companies, Adobe and Respeecher, have been developing beta technology that can replicate voices. In real life, this technology could be used to alter one’s voice to replicate and impersonate public figures.
As of recently, voice cloning was used in a memorial film for the late Anthony Bourdain. They used voice cloning for the movie as a narrator’s voice. His consent for this was never clearly stated and they started this project after he had passed. Many people were quick to point out how he never said the lines and how “it was ghoulish.” This brought up the new question of whether it was ethical to clone the voice of someone when they have passed or can not give consent. In the case of Bourdain, many on different social media platforms decided it was unethical while many close to Bourdain raised no complaints regarding the use of his voice.
Deepfakes have also been used in other unethical ways. An example of this is the case of Noelle Martin, an Australian activist, whose face was deep-faked into an adult film when she was 17. Her various social media accounts were used as a reference to digitally steal her face and put it in these adult photos and videos. She tried to contact various government agencies and the companies themself but nothing worked. The person behind this was unknown which made it virtually impossible to track them down and thus nothing happened.
What is being done?
Researchers at various different institutions are using different methods to identify the use of deepfakes in technology. At the University of Albany, Professor Siwei Lyu has worked on detecting deepfakes by using “resolution inconsistencies.” Resolution inconsistencies occur in deepfakes when a swapped face does not match up with its surroundings in the photo or video. These inconsistencies help researchers develop methods to detect deepfakes.
While at Purdue University, Professor Edward Delp and former research assistant David Güera are trying to use convolutional neural networks to detect deepfakes in videos. The neural network works by using “frame-level inconsistencies” to identify deep-fake videos. Frame-level inconsistencies simply means the inconsistencies created when deepfake technology puts someone’s face on another. They are using sets of deep-faked videos to train their neural network to identify them. The researchers want to exploit this in order to identify deep-faked videos created by popular technology.
Researchers at UC Riverside and Santa Barbara use two different methods, CNN and LSTM, to see how well they identify deep-faked media. A convolution neural network, CNN, is at its most basic definition is a type of deep learning algorithm that is trained to differentiate photos from others using specific aspects of some photos. In terms of deepfake identification, it can be used to find these inconsistencies mentioned above. LSTM based networks are a part of this process because according to the research at UC Riverside and Santa Barbara can help with classification and localization with the processed media. This is to help organize the wide database of media in order to find results more easily.
They are testing them based on how well they can identify the inconsistencies present in these videos. They concluded in their research that both the Convolution Neural Network (CNN) and LSTM based networks are effective in identifying deepfakes. Looking into the future they would like to see both of these methods combined.
Other than research, public advocacy in the law is another way to help stop deepfakes. The Malicious Deep Fake Prohibition Act of 2018 establishes a new criminal offense for the distribution of fake online media that appears to be realistic. Though it was not passed it would have helped make strides in the right direction in this field and helped many who have been wrongfully affected by these technologies.