Combat Deepfake Audio

New Research Provides Solution to Combat Deepfake Audio


As deepfakes and manipulated audio clips continue to emerge as a growing threat on the internet, a team of researchers has developed a method to determine the authenticity of audio recordings. The team, consisting of Romit Barua, Gautham Koorma, and Sarah Barrington, presented their research on voice cloning as their final project for the Master of Information Management and Systems degree program at the School of Information. Under the guidance of Professor Hany Farid, the team explored various techniques to differentiate between real and cloned voices used to impersonate individuals.

Initially, the team analyzed audio samples of real and fake voices based on perceptual features or patterns that can be visually identified. By examining the audio waves, they observed that real human voices tend to have more pauses and variations in volume throughout the clip. This is due to individuals using filler words and potentially moving away from the microphone while recording. This analysis enabled the team to identify pauses and variations in amplitude as crucial factors in determining the authenticity of a voice. However, this method proved to be less accurate than desired.

Taking a more detailed approach, the team utilized an off-the-shelf audio wave analysis package to examine general spectral features. The program extracted over 6,000 features, including summary statistics and regression coefficients, before narrowing them down to the 20 most significant ones. Comparing these extracted features with other audio clips, the team refined their method to increase accuracy.

The team achieved their most accurate results by training a deep-learning model using learned features. The raw audio was fed into the model, which processed and extracted multi-dimensional representations called embeddings. These embeddings were then used by the model to differentiate between real and synthetic audio. This method consistently outperformed the previous techniques, demonstrating as little as 0% error in lab settings. However, the researchers noted that this method might be challenging to understand without proper context.

The research team believes that their findings can help address the growing concerns surrounding voice cloning and deepfake audio, which have increasingly been used for nefarious purposes. Voice cloning, in particular, has demonstrated practical applications, such as bypassing biometric verifications or requesting money from unsuspecting family members. By providing a reliable method to detect deepfake audio, the team hopes to mitigate the potential harm caused by this technology.

As the threat of manipulated audio continues to evolve rapidly, these researchers have made significant progress in combating deepfake audio. With their findings, individuals and organizations can better verify the authenticity of audio recordings, fostering trust and reducing the potential for misinformation.

1. Source: Coherent Market Insights, Public sources, Desk research
2. We have leveraaged AI tools to mine information and compile it