Connect with us

Hi, what are you looking for?


Microsoft’s text-to-speech AI can imitate anyone

Microsoft’s text-to-speech AI can imitate anyone

Microsoft researchers announced the text-to-speech AI model VALL-E, which can simulate the voice of a real person based on just a three-second audio sample. In this way, while preserving the intonations characteristic of the speaker, he reproduces any audio-textual material, as if the speech of a particular person had been heard. Its creators envision its use as an advanced application for reading and editing text, even with other generative AI models such as GPT-3, which generates the text.

Redmond points to VALL-E as a neural language model, based on a compression neural network called EnCodec that Meta announced last year. Unlike other text-to-speech processes that work by manipulating waveforms, Microsoft Audio Codec creates symbols from selected text and sample audio signals.

VALL-E essentially analyzes the characteristics of a given person’s speech, and splits the information using EnCodec into separate components, “phonetic codes,” to create the final waveform. In addition to imitating the tone of the speaker, it can also imitate the “acoustic environment” of the sound sample. For example, if the sample is cut from a phone call, it reproduces the acoustics and frequency characteristics of the phone call.

The Redmond researchers worked with the audio library provided by Meta, which contains more than 60,000 hours of English speech by more than 7,000 people. Since in order for VALL-E to generate high-quality, realistic content, the audio sample must show a high match with one of the data used for training, so it is planned to expand the database with additional data in the future.

See also  AMD detailed the 3D V-Cache architecture

Due to the violations, Microsoft does not make the test or the VALL-E code available to others at this time. According to its announcement, the company will follow its own guidelines for AI-related developments in the future, and a separate form is being prepared to determine if a VALL-E-assisted audio segment has been generated. Offline project on his GitHub page You can listen to how the algorithm makes music: it’s not perfect yet, and some tracks sound like a machine, but there are some really scary realistic results.

Click to comment

Leave a Reply

Your email address will not be published. Required fields are marked *

You May Also Like

Top News

In a harrowing incident that has shaken the community of Lewiston, Maine, a series of shootings on Wednesday evening resulted in a tragic loss...

Top News

President Joe Biden’s abrupt departure from a speech on the U.S. economy at the White House on Monday sent a ripple of speculation and...


A dangerous application appeared in the Apple App Store disguised as a known program. reported the Based on TechCrunch article. Dangerous app in...


Chinese scientists have discovered a little-known type of ore containing a rare earth metal highly sought after for its superconducting properties. The ore, called...

Copyright © 2024 Campus Lately.