Machine Learning Street Talk  Tim Scarfe, Yannic Kilcher and Connor Shorten discuss their takeaways from OpenAI’s GPT-3 language model.

OpenAI trained a 175 BILLION parameter autoregressive language model. The paper demonstrates how self-supervised language modelling at this scale can perform many downstream tasks without fine-tuning. 

Paper Links:

Content index:

  • 00:00:00 Intro
  • 00:00:54 ZeRO1+2 (model + Data parallelism) [GPT-3 DOES *NOT* USE THIS] (Connor)
  • 00:03:17 Recent history of NLP (Tim)
  • 00:06:04 Yannic “Light-speed” Kilcher’s brief overview of GPT-3
  • 00:14:25 Reviewing Yannic’s YT comments on his GPT-3 video (Tim)
  • 00:20:26 Main show intro
  • 00:23:03 Is GPT-3 reasoning?
  • 00:28:15 Architecture discussion and autoregressive (GPT*) vs denoising autoencoder (BERT)
  • 00:36:18 Utility of GPT-3 in industry
  • 00:43:03 Can GPT-3 do math? (reasoning/system 1/system 2)
  • 00:51:03 Generalisation
  • 00:56:48 Esoterics of language models
  • 00:58:46 Architectural trade-offs
  • 01:07:37 Memorization machines and intepretability
  • 01:17:16 Nearest neighbour probes / watermarks
  • 01:20:03 YouTube comments on GPT-3 video
  • 01:21:50 GPT-3 news article generation issue
  • 01:27:36 Sampling data for language models / bias / fairness / politics
  • 01:51:12 Outro

How far can you go with ONLY language modeling?

Can a large enough language model perform NLP task out of the box?

OpenAI take on these and other questions by training a transformer that is an order of magnitude larger than anything that has ever been built before and the results are astounding.

Yannic Kilcher explores.


Time index:

  • 0:00 – Intro & Overview
  • 1:20 – Language Models
  • 2:45 – Language Modeling Datasets
  • 3:20 – Model Size
  • 5:35 – Transformer Models
  • 7:25 – Fine Tuning
  • 10:15 – In-Context Learning
  • 17:15 – Start of Experimental Results
  • 19:10 – Question Answering
  • 23:10 – What I think is happening
  • 28:50 – Translation
  • 31:30 – Winograd Schemes
  • 33:00 – Commonsense Reasoning
  • 37:00 – Reading Comprehension
  • 37:30 – SuperGLUE
  • 40:40 – NLI
  • 41:40 – Arithmetic Expressions
  • 48:30 – Word Unscrambling
  • 50:30 – SAT Analogies
  • 52:10 – News Article Generation
  • 58:10 – Made-up Words
  • 1:01:10 – Training Set Contamination
  • 1:03:10 – Task Examples

Build 2020 starts today and here’s a special Build edition of the AI Show that covers the Bot Framework Composer.

Bot Framework Composer is an open source, integrated bot development environment available as a cross platform application on GitHub.

Bot Framework Composer provides a one stop shop environment that seamlessly integrates several key aspects of building a conversational application including language understanding, dialog modeling, language generation, memory management, and integration with external resources.

In this session, you will learn about advanced understanding and language generation capabilities offered by Bot Framework Composer. The following language understanding topics are covered in this session – flexible slot filling, interruption handling, handling local intents, confirmation & correction experience for language understanding.

You will also learn about building bots with advanced language generation capabilities including conditional response generation, media/ card generation with data binding. Introducing Bot Framework Composer is a recommended prerequisite for this session.

Learn More:

LUIS enables developers to quickly create enterprise-ready, conversational applications that communicate with a user in natural language.

In this video, learn about the improvements in the LUIS portal and the new features that enable you to build more sophisticated models than ever before.

Learn More:

Chatbots are the opportunity of the decade to revolutionize customer interaction.

They leverage the power of natural language processing to create a (hopefully) seamless experience to back end systems and processes — surfacing a 24/7 interface to any enterprise.

Conversational AI serves as a bridge between machine and human interaction. The demand for this technology has been on an upward spiral with organizations increasingly embracing it across the world. According to a report, the size of the global conversational AI market will grow to $15.7 billion by the year 2024, at a Compound Annual Growth Rate of 30.2% during the forecast period.

Powerpoint Designer utilizes machine learning to provide users with redesigned slides to maximize their engagement and visual appeal.

Up to 4.1 million Designer slides are created daily and the Designer team is adding new types of content continuously.

Time Index:

  • [02:39] Demo – PowerPoint suggests design ideas to help users build memorable slides effortlessly
  • [03:28] A behind-the-scenes look at how PowerPoint was built to make intelligent design recommendations
  • [04:47] AI focused on intelligently cropping images in photos and centering the objects, positioning the images, and even using multi-label classifiers to determine the best treatment.
  • [06:00] How PowerPoint is solving for Natural Language Processing (NLP).
  • [07:32] Providing recommendations when image choices don’t meet the users’ needs.
  • [09:30] How Azure Machine Learning helps the dev team scale and increase throughput for data scientists.
  • [11:10] How distributed GPUs helps the team work more quickly and run multiple models at once.

Computers just got a lot better at mimicking human language. Researchers created computer programs that can write long passages of coherent, original text.

Language models like GPT-2, Grover, and CTRL create text passages that seem written by someone fluent in the language, but not in the truth. That AI field, Natural Language Processing (NLP), didn’t exactly set out to create a fake news machine. Rather, it’s the byproduct of a line of research into massive pretrained language models: Machine learning programs that store vast statistical maps of how we use our language. So far, the technology’s creative uses seem to outnumber its malicious ones. But it’s not difficult to imagine how these text-fakes could cause harm, especially as these models become widely shared and deployable by anyone with basic know-how.

Read more here: 

By optimizing BERT for CPU, Microsoft has made inferencing affordable and cost-effective.

According to the published benchmark, BERT inferencing based on an Azure Standard F16s_v2 CPU takes only 9ms which translates to a 17x increase in speed.

Microsoft partnered with NVIDIA to optimize BERT for GPUs powering the Azure NV6 Virtual Machines. The optimization included rewriting and implementing the neural network in TensorRT C++ APIs based on CUDA and CUBLAS libraries. The NV6 family of Azure VMs is powered by NVIDIA Tesla M60 GPUs. Microsoft claims that the improved Bing search platform running on the optimized model on NVIDIA GPUs serves more than one million BERT inferences per second within Bing’s latency limits.

In my Data Point earlier today, I mentioned how Google open sourced ALBERT yesterday.

ALBERT is an NLP model based on its revolutionary BERT model the company released last year.

ALBERT has been released as an open source implementation on top of TensorFlow It reduces model sizes in two ways- by sharing parameters across the hidden layers of the network and by factorising the embedding layer According to a report by i-programmer, Google has made ALBERT (A Lite BERT) […]