How far can you go with ONLY language modeling?
Can a large enough language model perform NLP task out of the box?
OpenAI take on these and other questions by training a transformer that is an order of magnitude larger than anything that has ever been built before and the results are astounding.
Yannic Kilcher explores.
Paper
Time index:
- 0:00 – Intro & Overview
- 1:20 – Language Models
- 2:45 – Language Modeling Datasets
- 3:20 – Model Size
- 5:35 – Transformer Models
- 7:25 – Fine Tuning
- 10:15 – In-Context Learning
- 17:15 – Start of Experimental Results
- 19:10 – Question Answering
- 23:10 – What I think is happening
- 28:50 – Translation
- 31:30 – Winograd Schemes
- 33:00 – Commonsense Reasoning
- 37:00 – Reading Comprehension
- 37:30 – SuperGLUE
- 40:40 – NLI
- 41:40 – Arithmetic Expressions
- 48:30 – Word Unscrambling
- 50:30 – SAT Analogies
- 52:10 – News Article Generation
- 58:10 – Made-up Words
- 1:01:10 – Training Set Contamination
- 1:03:10 – Task Exampleshttps://arxiv.org/abs/2005.14165
https://github.com/openai/gpt-3