The Race to Hide Your Voice


Your voice reveals more about you than you realize. To the human ear, your voice can instantly give away your mood, for example—it’s easy to tell if you’re excited or upset. But machines can learn a lot more: inferring your age, gender, ethnicity, socio-economic status, health conditions, and beyond. Researchers have even been able to generate images of faces based on the information contained in individuals’ voice data.

As machines become better at understanding you through your voice, companies are cashing in. Voice recognition systems—from Siri and Alexa to those using your voice as your password—have proliferated in recent years as artificial intelligence and machine learning have unlocked the ability to understand not just what you are saying but who you are. Big Voice may be a $20 billion

industry within a few years. And as the market grows, privacy-focused researchers are increasingly searching for ways to protect people from having their voice data used against them.

Vocal Threats

Both the words you say and how you say them can be used to identify you, says Emmanuel Vincent, a senior research scientist specializing in voice technologies at France’s National Institute for Research in Digital Science and Technology (Inria), but this is only the beginning. “You will also find other pieces of information about your emotions or your medical condition,” Vincent says.

“These additional pieces of information help build a more complete profile—then this would be used for all sorts of targeted advertisements,” Vincent says. As well as your voice data potentially feeding into the vast realm of data used to show you online ads, there’s also the risk that hackers could access the location where your voice data is stored and use it to impersonate you. A small number of these cloning incidents have already happened, proving the value your voice holds. Simple robocall scams have also recorded people saying “yes” to use the confirmation in payment scams.

Last year, TikTok changed its privacy policies and started collecting the voiceprints—a loose term for the data your voice contains—of people in the US alongside other biometric data, such as your faceprint. More broadly, call centers are using AI to analyze people’s “behavior and emotion” during phone calls and evaluate the “tone, pace, and pitch of every single word” to develop profiles of people and increase sales. “We’re almost in a situation where the systems to recognize who you are and link everything together exist, but the protection is not there—and it’s still quite far away from being readily usable,” says Henry Turner, who researched the security of voice systems at the University of Oxford.


Hidden Meaning

Your voice is produced through a complex process involving the lungs and your voice box, throat, nose, mouth, and sinuses. More than a hundred muscles are activated when you speak, says Rébecca Kleinberger, a voice researcher at the MIT Media Lab. “It’s also very much the brain,” Kleinberger says. 

Researchers are experimenting with four ways to enhance privacy for your voice, says Natalia Tomashenko, a researcher at Avignon University, France, who has been studying voice and is the first author of a research paper on the results of a voice privacy engineering challenge. None of the methods are perfect, but they are being explored as possible ways to boost privacy in the infrastructure processing your voice data.

First is obfuscation, which tries to completely hide who the speaker is. Think of a Hollywood depiction of a hacker totally distorting their voice over a phone call as they explain a devilish plot or ransom (or hacktivist collective Anonymous’s promotional videos). Simple voice-changing hardware allows anyone to quickly change the sound of their voice. More advanced speech-to-text-to-speech systems can transcribe what you’re saying and then reverse the process and say it in a new voice.

Second, Tomashenko says, researchers are looking at distributed and federated learning—where your data doesn’t leave your device but machine learning models still learn to recognize speech by sharing their training with a bigger system. Another approach involves building encrypted infrastructure to protect people’s voices from snooping. However, most efforts are focused on voice anonymization.

Anonymization attempts to keep your voice sounding human while stripping out as much of the information that could be used to identify you as possible. Speech anonymization efforts currently involve two separate strands: anonymizing the content of what someone is saying by deleting or replacing any sensitive words in files before they are saved and anonymizing the voice itself. Most voice anonymization efforts at the moment involve passing someone’s voice through experimental software that will change some of the parameters in the voice signal to make it sound different. This can involve altering the pitch, replacing segments of speech with information from other voices, and synthesizing the final output.