How far can you go with ONLY language modeling?

Can a large enough language model perform NLP task out of the box?

OpenAI take on these and other questions by training a transformer that is an order of magnitude larger than anything that has ever been built before and the results are astounding.

Yannic Kilcher explores.

Paper

Time index:

  • 0:00 – Intro & Overview
  • 1:20 – Language Models
  • 2:45 – Language Modeling Datasets
  • 3:20 – Model Size
  • 5:35 – Transformer Models
  • 7:25 – Fine Tuning
  • 10:15 – In-Context Learning
  • 17:15 – Start of Experimental Results
  • 19:10 – Question Answering
  • 23:10 – What I think is happening
  • 28:50 – Translation
  • 31:30 – Winograd Schemes
  • 33:00 – Commonsense Reasoning
  • 37:00 – Reading Comprehension
  • 37:30 – SuperGLUE
  • 40:40 – NLI
  • 41:40 – Arithmetic Expressions
  • 48:30 – Word Unscrambling
  • 50:30 – SAT Analogies
  • 52:10 – News Article Generation
  • 58:10 – Made-up Words
  • 1:01:10 – Training Set Contamination
  • 1:03:10 – Task Exampleshttps://arxiv.org/abs/2005.14165
    https://github.com/openai/gpt-3

Why is it that we can see these multiple histories play out on the quantum scale, and why do lose sight of them on our macroscopic scale?

Many physicists believe that the answer lies in a process known as quantum decoherence.

Does conscious observation of a quantum system cause the wavefunction to collapse? The upshot is that more and more physicists think that consciousness – and even measurement – doesn’t directly cause wavefunction collapse.

In fact probably there IS no clear Heisenberg cut. The collapse itself may be an illusion, and the alternate histories that the wavefunction represents may continue forever. The question then becomes: why is it that we can see these multiple histories play out on the quantum scale, and why do lose sight of them on our macroscopic scale? Many physicists believe that the answer lies in a process known as quantum decoherence.