C-3PO remains the ultimate on-screen language translator. Appropriately,
leading-edge language translation tech is starting to sound more like science fiction these days. Hopefully, the phrase “We’re doomed” is not the first thing that pops into your head when you ponder this fact. To get you feeling better about AI, we examine something very cool. Google’s Translatotron is a language translator that will let you not only speak in a different language, but it also uses your voice to do it. Read on to see how we are going beyond Google translation as you know it.
Google is known for many things. It provides the world with a variety of products and service. A popular one is Google Translate, where it will translate text from one language to another. Because of its accuracy and speed, it dominates the market. Now Google wants to convert your live speech to another language but using your own voice.
Speech-to-speech translation systems have been developed over the past several decades to help people who speak different languages to communicate with each other. Such systems have usually been broken into three separate components: automatic speech recognition to copy the source speech as text, machine translation to translate the transcribed text into the target language, and text-to-speech synthesis (TTS) to generate speech in the target language from the translated text. Dividing the task into such a cascade of systems has been very successful. This waterfall approach is powering many commercial speech-to-speech translation products, including Google Translate.
In “Direct speech-to-speech translation with a sequence-to-sequence model,” Google proposed an experimental new system. This is the first “twist” a person needs to grasp for considering the incredible leaps Google is making in language translation. Consider their AI group has provided the foundation for going from a voice audio file directly to voice audio file in another language. This is a breakthrough translation model that functions without relying on intermediate text representation.
The Translatotron goes a step further. This translation system avoids dividing the task into separate stages, thus providing a few advantages over cascaded systems. These include a faster inference speed, naturally avoiding compounding errors between recognition and translation. The unexpected development makes it straightforward to retain the voice of the original speaker after translation. Additionally, all the words that do not need translating are handled faster.
INMATING YOUR VOICE
By incorporating a speaker encoder network, Translatotron is also able to retain the original speaker’s vocal characteristics in the translated speech. This improvement makes the translated speech sound more natural and less jarring. In fact, it is a feature which leverages previous Google research on voice verification. By using neural network algorithms of artificial intelligence, the system delivers translated speech in the target language, even in the same voice as the original sound fragment was made. In other words, the user simply speaks, and the voice assistant translates his speech into the desired language while maintaining intonation, timbre, and pitch.
THE WOW EFFECT
Google goes beyond language translation, as you know it. Now it has devised the mechanism to translate automatically and perform a voice-to-voice translation, with very precise results. Throughout the process, Google Artificial Intelligence uses multitasking to predict the tendencies of the source while generating translation spectrograms. While still needing polishing, they aim to reproduce a person’s unique expression. The wow factor is already there, but it will only grow when they can consistently achieve the preservation of the original voice in a variety of commercial channels.