Amazon has announced that the Alexa digital assistant can now technically imitate anyone’s voice based on a few clips, about 1 minute of recording.
If it works as advertised, it would be a technical milestone as computer-generated voices don’t really fool humans and are still a poor choice for video voice-over, for example. You can watch the live demo in the video below (timecode 1:02:38) :
In reality, I found it to transform the voice’s print/tone from the typical machine-generated agent to match the target human voice print. It isn’t easy to quantify the demonstration’s success without knowing the original voice, but it seemed reasonably successful, although still sounding a bit robotic.
The selected sentence was undoubtedly well-chosen for the demo as it lends itself to a slow-paced, almost robotic, reading. The technology is similar to AIs used to transform your pictures in Picasso paintings but applied to an audio stream.
It might sound fun to have Alexa speak with the voice of your favorite celebrity, friend, or family member. However, the Internet has quickly turned its attention to using voice clips from past-away family members. That’s the use case that an Amazon executive was putting forward in the above video.
On the one hand, it may sound like a healing experience to hear the voice of someone close who’s no longer with us. However, it is also a potentially slippery slope with unintended consequences. Many people started to question whether the technology could be misused to impersonate living people and whether we have the right to use voices without consent.
The answer is probably “it depends,” based on the situation and the users. However, one thing is sure: these technologies exist and will get better and better. It’s only a matter of time before synthesized voices are indistinguishable from human ones.