Text-to-speech tech isn’t exactly new, although the downside is that for the most part, the current iteration of the technology isn’t very realistic. This means that if you’re hoping for text-to-speech to simulate actual human conversations, that might be a pretty hard task to accomplish.
However, there has been quite a lot of effort made on that front, and more recently it seems that Microsoft has attempted their hand at something similar, with the main difference being that Microsoft’s model might actually require less training. This AI was developed with Chinese researchers where based on 200 voice samples, they were able to create realistic-sounding speech.
This was accomplished by relying on Transformers, which for those unfamiliar are deep neural networks designed to emulate the neurons in our brain. By using Transformers, it helps it to process information more efficiently. So far based on the results, it seems that the new model has scored 99.84% in terms of word intelligibility, although it has been reported that it still sounds a tad robotic, which you can hear for yourself via the samples posted on GitHub.
We have to admit that they sound pretty damn realistic, much like how Google’s Duplex AI technology is also just as convincing. While such tech advancements are welcome, we also have to concern ourselves about AI sounding (and looking) too realistic due to concerns about how it might be abused to spread misinformation.