Apple has been working to make Siri sound less like a robot and more like a human. As it gears up to launch iOS 11 to hundreds of millions of users across the globe, the company has released a white paper which details how it used deep learning to make Siri sound less robotic. The company has even included voice samples in the paper which let you hear the difference.
Several hours of high quality audio was captured so that it could be sliced to create voice responses. Developers also had to work had to get the prosody right, that’s the pattern of stress and intonation in a spoken language.
It’s not just about working this out. It’s about making it work on a mobile device because this level of processing can put a lot of stress on a device’s processor and thus hamper its performance.
Apple relied on machine learning to get around this. Enough training data allowed it to help the system understand just how segments of audio that pair well have to be selected in order to create responses that sound more natural.
Siri will sound much better with iOS 11. Apple has worked with a new female voice actor to record more than 20 hours of speed in US English. It generated between 1 and 2 million audio segments that were used to train the deep learning system. Apple mentions in the paper that test subjects liked the new Siri responses compared to the previous one.
Check out Apple’s white paper on the matter to hear just how different Siri sounds on iOS 11.