Data Science Investment Counter —Funding raised by UK data science companies in 2018.
£ 5.640 Million

Machine learning system translates brain signals directly into speech

Neuroengineers from Columbia University have created a system able to translate thought into recognisable, intelligible speech.

By monitoring a person’s brain activity, the technology can rebuild the words a person hears with unmatched clarity.

This new technology is powered by a mix of speech synthesizers and artificial intelligence and could lead to new ways for computers to communicate directly with the brain.

It could also translate into improving drastically the lives of people who cannot speak, enabling them to communicate with others through speech again.

“Our voices help connect us to our friends, family and the world around us, which is why losing the power of one’s voice due to injury or disease is so devastating,” said Nima Mesgarani, PhD, the paper’s senior author and a principal investigator at Columbia University’s Mortimer B. Zuckerman Mind Brain Behavior Institute.

“With today’s study, we have a potential way to restore that power. We’ve shown that, with the right technology, these people’s thoughts could be decoded and understood by any listener.”

In the past few decades, research has shown consistently that when people speak — or even imagine speaking — tell-tale patterns of activity appear in their brain. Distinct (but recognizable) pattern of signals also emerge when we listen to someone speak, or imagine listening. Experts have therefore long tried to translate those patterns into verbal speech.

Earlier models focused on decoding spectrograms, which are visual representations of sound frequencies. But despite consistent attempts, this approach has ultimately failed to produce anything similar to intelligible speech.

Dr Mesgarani’s team tried a different approach. They used a vocoder, a computer algorithm that can synthesize speech after being trained on recordings of people talking.

“This is the same technology used by Amazon Echo and Apple Siri to give verbal responses to our questions,” said Dr Mesgarani, who is also an associate professor of electrical engineering at Columbia’s Fu Foundation School of Engineering and Applied Science.

In order to teach the vocoder to interpret brain activity, Dr Mesgarani worked with Ashesh Dinesh Mehta, MD, PhD, a neurosurgeon at Northwell Health Physician Partners Neuroscience Institute and co-author of today’s paper. Dr Mehta treats epilepsy patients, some of whom must undergo regular surgeries.

“Working with Dr Mehta, we asked epilepsy patients already undergoing brain surgery to listen to sentences spoken by different people, while we measured patterns of brain activity,” said Dr Mesgarani. “These neural patterns trained the vocoder.”

The researchers then asked those same patients to listen to speakers reciting digits between 0 to 9 while recording brain signals that could then be run through the vocoder at the same time. The sound produced by the vocoder in response to those signals was finally analysed and cleaned up by neural networks, a type of artificial intelligence built to emulate the structure of neurons in the biological brain.

“We found that people could understand and repeat the sounds about 75% of the time, which is well above and beyond any previous attempts,” said Dr Mesgarani. The improvement in intelligibility was particularly clear when comparing the new recordings to the earlier, spectrogram-based attempts. “The sensitive vocoder and powerful neural networks represented the sounds the patients had originally listened to with surprising accuracy.”

Dr Mesgarani added his team plan to test more complicated words and sentences next, and they want to run the same tests on brain signals emitted when a person speaks or imagines speaking. Ultimately, he said, they hope their system could be part of an implant, similar to those worn by some epilepsy patients, that translates the wearer’s thoughts directly into words.

“In this scenario, if the wearer thinks ‘I need a glass of water,’ our system could take the brain signals generated by that thought, and turn them into synthesized, verbal speech,” said Dr Mesgarani. “This would be a game changer. It would give anyone who has lost their ability to speak, whether through injury or disease, the renewed chance to connect to the world around them.”


Co-working space and blog dedicated to all things data science.

Subscribe to our newsletter