I can’t believe I missed this incredible voice synthesizer Eleven Labs announced back in October. Their new text-to-speech AI model conveys emotion in a way that I haven’t heard any other robotic voice come close to achieving.

This is definitely the kind of thing you need to experience for yourself.

First, let’s set the stage. This first sample was synthesized by Microsoft Azure Speech Studio, one of the leading text-to-speech synthesizers that you can already use:

audio-thumbnail
J.K. Rowling, Harry Potter and the Philosopher's Stone, Fragment 1
0:05
/0:40

Microsoft’s offering is alright, but it’s not nearly good enough to narrate a book. I got annoyed listening to it from just this clip.

Now listen to Eleven Lab’s voice synthesizer read the same passage:

audio-thumbnail
J.K. Rowling, Harry Potter and the Philosopher's Stone, Fragment 1
0:10
/0:36

Incredible! Blows Microsoft’s offering out of the water.

This is the most convincing AI speech synthesis I’ve heard since Google’s I/O 2018 salon demo. Eleven Labs has more samples of their synthesizer reading Harry Potter on their website.

A month after the first release, Eleven Labs returned with another demo... this time showing off their model generating laughter.

This is where it starts to approach the uncanny, if I’m being honest. It reminds me of watching NPCs interact. Blocky and uncoordinated and ... robotic.

audio-thumbnail
Amused
0:03
/0:03

Pause for a second and think about it. An AI synthesized these sounds from nothing. Amazing.

I do have to ask though... why did they use this medieval warrior sounding guy for the Happy version?

audio-thumbnail
Happy
0:08
/0:08

I can think of so many alternatives to the script that was used here. Why are we so happy about defeating enemies? There are so many potential ways to express happiness.

This is a strange choice from Eleven Labs. That minor critique aside — I love what Eleven Labs is doing and can’t wait to experiment with some narration features here on Gold’s Guide.

How long until Eleven Labs is able to generate this kind of speech in an Austrian accent?