Machine Learning Street Talk Tim Scarfe, Yannic Kilcher and Connor Shorten discuss their takeaways from OpenAI’s GPT-3 language model.
OpenAI trained a 175 BILLION parameter autoregressive language model. The paper demonstrates how self-supervised language modelling at this scale can perform many downstream tasks without fine-tuning.
Paper Links:
Content index:
- 00:00:00 Intro
- 00:00:54 ZeRO1+2 (model + Data parallelism) [GPT-3 DOES *NOT* USE THIS] (Connor)
- 00:03:17 Recent history of NLP (Tim)
- 00:06:04 Yannic “Light-speed” Kilcher’s brief overview of GPT-3
- 00:14:25 Reviewing Yannic’s YT comments on his GPT-3 video (Tim)
- 00:20:26 Main show intro
- 00:23:03 Is GPT-3 reasoning?
- 00:28:15 Architecture discussion and autoregressive (GPT*) vs denoising autoencoder (BERT)
- 00:36:18 Utility of GPT-3 in industry
- 00:43:03 Can GPT-3 do math? (reasoning/system 1/system 2)
- 00:51:03 Generalisation
- 00:56:48 Esoterics of language models
- 00:58:46 Architectural trade-offs
- 01:07:37 Memorization machines and intepretability
- 01:17:16 Nearest neighbour probes / watermarks
- 01:20:03 YouTube comments on GPT-3 video
- 01:21:50 GPT-3 news article generation issue
- 01:27:36 Sampling data for language models / bias / fairness / politics
- 01:51:12 Outro