Text-to-speech engines are usually multi-stage pipelines that transform the signal into many intermediate representations and require supervision at each step.

When trying to train TTS end-to-end, the alignment problem arises: Which text corresponds to which piece of sound?

This paper uses an alignment module to tackle this problem and produces astonishingly good sound.

Paper: https://arxiv.org/abs/2006.03575
Website: https://deepmind.com/research/publications/End-to-End-Adversarial-Text-to-Speech

Content index:

  • 0:00 – Intro & Overview
  • 1:55 – Problems with Text-to-Speech
  • 3:55 – Adversarial Training
  • 5:20 – End-to-End Training
  • 7:20 – Discriminator Architecture
  • 10:40 – Generator Architecture
  • 12:20 – The Alignment Problem
  • 14:40 – Aligner Architecture
  • 24:00 – Spectrogram Prediction Loss
  • 32:30 – Dynamic Time Warping
  • 38:30 – Conclusion

Lex Fridman interviews David Silver for the Artificial Intelligence podcast..

David Silver leads the reinforcement learning research group at DeepMind and was lead researcher on AlphaGo, AlphaZero and co-lead on AlphaStar, and MuZero and lot of important work in reinforcement learning.

Time Index:

  • 0:00 – Introduction
  • 4:09 – First program
  • 11:11 – AlphaGo
  • 21:42 – Rule of the game of Go
  • 25:37 – Reinforcement learning: personal journey
  • 30:15 – What is reinforcement learning?
  • 43:51 – AlphaGo (continued)
  • 53:40 – Supervised learning and self play in AlphaGo
  • 1:06:12 – Lee Sedol retirement from Go play
  • 1:08:57 – Garry Kasparov
  • 1:14:10 – Alpha Zero and self play
  • 1:31:29 – Creativity in AlphaZero
  • 1:35:21 – AlphaZero applications
  • 1:37:59 – Reward functions
  • 1:40:51 – Meaning of life

Lex Fridman interviews Marcus Hutter ,a senior research scientist at DeepMind and professor at Australian National University.

Throughout his career of research, including with Jürgen Schmidhuber and Shane Legg, he has proposed a lot of interesting ideas in and around the field of artificial general intelligence, including the development of the AIXI model which is a mathematical approach to AGI that incorporates ideas of Kolmogorov complexity, Solomonoff induction, and reinforcement learning. This conversation is part of the Artificial Intelligence podcast.

0:00 – Introduction
3:32 – Universe as a computer
5:48 – Occam’s razor
9:26 – Solomonoff induction
15:05 – Kolmogorov complexity
20:06 – Cellular automata
26:03 – What is intelligence?
35:26 – AIXI – Universal Artificial Intelligence
1:05:24 – Where do rewards come from?
1:12:14 – Reward function for human existence
1:13:32 – Bounded rationality
1:16:07 – Approximation in AIXI
1:18:01 – Godel machines
1:21:51 – Consciousness
1:27:15 – AGI community
1:32:36 – Book recommendations
1:36:07 – Two moments to relive (past and future)

I always knew that reinforcement learning would teach us more about ourselves than any other kind of AI approach. This feeling was backed up in a paper published recently in Nature.

DeepMind, Alphabet’s AI subsidiary, has once again used lessons from reinforcement learning to propose a new theory about the reward mechanisms within our brains.

The hypothesis, supported by initial experimental findings, could not only improve our understanding of mental health and motivation. It could also validate the current direction of AI research toward building more human-like general intelligence.

It turns out the brain’s reward system works in much the same way—a discovery made in the 1990s, inspired by reinforcement-learning algorithms. When a human or animal is about to perform an action, its dopamine neurons make a prediction about the expected reward.

Siraj Raval generates his own voice with AI using some cutting edge techniques.

This is a relatively new technology and people have started generating not just celebrity voices, but entire musical pieces as well. The technology to generate sounds, both voices & music, has been rapidly improving the past few years thanks to deep learning. In this episode, I’ll first demo some AI generated music. Then, i’ll explain the physics of a waveform and how DeepMind used waveform-based data to generate some pretty realistic sounds in 2016. At the end, I’ll describe the cutting edge of generative sound modeling, a paper released just 2 months ago called “MelNet”. Enjoy!

In this video, Lex Fridman interviews Oriol Vinyals, a senior research scientist at Google DeepMind.

From the video description:

Before that he was at Google Brain and Berkeley. His research has been cited over 39,000 times. He is one of the most brilliant and impactful minds in the field of deep learning. He is behind some of the biggest papers and ideas in AI, including sequence to sequence learning, audio generation, image captioning, neural machine translation, and reinforcement learning. He is a co-lead (with David Silver) of the AlphaStar project, creating an agent that defeated a top professional at the game of StarCraft.