Siraj Raval generates his own voice with AI using some cutting edge techniques.

This is a relatively new technology and people have started generating not just celebrity voices, but entire musical pieces as well. The technology to generate sounds, both voices & music, has been rapidly improving the past few years thanks to deep learning. In this episode, I’ll first demo some AI generated music. Then, i’ll explain the physics of a waveform and how DeepMind used waveform-based data to generate some pretty realistic sounds in 2016. At the end, I’ll describe the cutting edge of generative sound modeling, a paper released just 2 months ago called “MelNet”. Enjoy!

Generative models, commonly referred to as GANs, are a family of AI architectures whose aim is to create data samples from scratch. They work by capturing the data distributions of the type of things we want to generate.

Here’s an interesting read on the topic.

These kind of models are being heavily researched, and there is a huge amount of hype around them. Just look at the chart that shows the numbers of papers published in the field over the past few years: