The Era of Neural Network Speech Synthesis
4 min read·by Mickel·January 26, 2022
The video of Steve Jobs introducing the Macintosh in 1984 never fails to put a smile on my face. It was a groundbreaking feat of engineering in many ways.
What impressed me the most is how the computer for the first time ever had a personality of its own. It spoke.
Computers, software, and humans have come a long way since 1984.
Today's artificial voices are often indistinguishable from that of a human. In some experiments, humans even mistake computer voices for being human, and vice versa.
We can thank computer science and something called artificial neural networks for that. Thanks, science!
But what exactly are artificial neural networks, you might wonder? I'm glad you asked!
Artificial Neural Networks
Perhaps you have heard about neurons before? They are often mentioned when talking about the human brain.
A neuron, or nerve cell, on its own, is pretty useless to be honest. It's when you get a large enough set of connected neurons that the magic happens.
Networks of neurons in your brain are what make you tick. In computer lingo, it's how your brain processes things.
Now, artificial neural networks are an attempt to model this biological behavior with the help of mathematics.
The applications for neural network models are many:
- When you speak to your phone's virtual assistant, Siri; neural model.
- When you aim your camera at someone with a funny filter applied; neural model.
- When you translate a foreign language using Google Translate; neural model.
Neural networks now exist in everything from your smart fridge to cars, satellites, drones and rockets.
This thought is equally exciting and terrifying, wouldn't you agree?
Generate Speech from Text
Here at Podopi, we use neural models to generate speech from your written website content. We can repurpose your content and launch a podcast in minutes.
Work that would have taken a human many days to complete.
We go to extreme lengths along the way to keep the audio quality as good as it can be. Only to sound as human as possible.
Just imagine setting up recording equipment, doing multiple takes to get everything right, editing the audio, exporting to the correct formats, uploading the audio files to the Internet, and distributing the episodes to all the podcasting platforms for maximum reach.
We do all that for you. In minutes, not weeks or days.
Last week we extended our library of A.I. voices by 354. The update brought support for 43 new languages.
In total, we now give you the option to publish your content as audio with one of our 606 voices in 79 languages.
Hear Me Out, Human.
I know what you think. It's still computers, they must sound bad?
No! Neural networks, remember? Let's hear a couple of examples.
All but one of the audio clips on this page are computer generated.
Are you able to identify the human recording?
A female American English voice.
A male American English voice.
This voice is modelled after a female child.
A female British English voice.
Deep, smart, with a touch of Whiskey.
A female Australian English voice.
This guy is a fast and excited speaker.
Did I mention that we speak 80-ish languages and 134 dialects?
Here is an example of one of our female German voices.
Given that the application of using neural network models to generate speech is fairly new, I think we can expect great advancements in the near future.
It's quite exciting to compare the Steve Jobs video from 1984 with the voices we can provide today. It really speaks (pun intended) of the immense progress of computer-generated speech in the past few decades.
Oh! I almost forgot to mention: none of the voices in this post were recorded by a human. They're all computer-generated.
Now imagine another 38 years of progress.