Yannic Kilcher explains why transformers are ruining convolutions.

This paper, under review at ICLR, shows that given enough data, a standard Transformer can outperform Convolutional Neural Networks in image recognition tasks, which are classically tasks where CNNs excel. In this Video, I explain the architecture of the Vision Transformer (ViT), the reason why it works better and rant about why double-bline peer review is broken.


  • 0:00 – Introduction
  • 0:30 – Double-Blind Review is Broken
  • 5:20 – Overview
  • 6:55 – Transformers for Images
  • 10:40 – Vision Transformer Architecture
  • 16:30 – Experimental Results
  • 18:45 – What does the Model Learn?
  • 21:00 – Why Transformers are Ruining Everything
  • 27:45 – Inductive Biases in Transformers
  • 29:05 – Conclusion & Comments

Related resources:

  • Paper (Under Review): https://openreview.net/forum?id=YicbFdNTTy

Visual scenes are often comprised of sets of independent objects. Yet, current vision models make no assumptions about the nature of the pictures they look at.

Yannic Kilcher explore a paper on object-centric learning.

By imposing an objectness prior, this paper a module that is able to recognize permutation-invariant sets of objects from pixels in both supervised and unsupervised settings. It does so by introducing a slot attention module that combines an attention mechanism with dynamic routing.

Content index:

  • 0:00 – Intro & Overview
  • 1:40 – Problem Formulation
  • 4:30 – Slot Attention Architecture
  • 13:30 – Slot Attention Algorithm
  • 21:30 – Iterative Routing Visualization
  • 29:15 – Experiments
  • 36:20 – Inference Time Flexibility
  • 38:35 – Broader Impact Statement
  • 42:05 – Conclusion & Comments

Google recently released Quantization Aware Training (QAT) API, which enables developers to train and deploy models with the performance benefits of quantization — i.e., the process of mapping input values from a large set to output values in a smaller set — while retaining close to their original accuracy.

The goal is to support the development of smaller, faster, and more efficient machine learning models well-suited to run on off-the-shelf machines, such as those in medium- and small-business environments where computation resources are at a premium.

Here’s why that’s important:

Often, the process of going from a higher to lower precision tends to be noisy. That’s because quantization squeezes a small range of floating-point values into a fixed number of information buckets, leading to information loss similar to rounding errors when fractional values are represented as integers. (For example, all values in range [2.0, 2.3] might be represented in a single bucket.) Problematically, when the lossy numbers are used in several computations, the losses accumulate and need to be rescaled for the next computation.

Lex Fridman interviews David Silver for the Artificial Intelligence podcast..

David Silver leads the reinforcement learning research group at DeepMind and was lead researcher on AlphaGo, AlphaZero and co-lead on AlphaStar, and MuZero and lot of important work in reinforcement learning.

Time Index:

  • 0:00 – Introduction
  • 4:09 – First program
  • 11:11 – AlphaGo
  • 21:42 – Rule of the game of Go
  • 25:37 – Reinforcement learning: personal journey
  • 30:15 – What is reinforcement learning?
  • 43:51 – AlphaGo (continued)
  • 53:40 – Supervised learning and self play in AlphaGo
  • 1:06:12 – Lee Sedol retirement from Go play
  • 1:08:57 – Garry Kasparov
  • 1:14:10 – Alpha Zero and self play
  • 1:31:29 – Creativity in AlphaZero
  • 1:35:21 – AlphaZero applications
  • 1:37:59 – Reward functions
  • 1:40:51 – Meaning of life

One of the promising frontiers of research right now in chip design is using machine learning techniques to actually help with some of the tasks in the design process.

Here’s an interesting look at what Google is doing in this space.

We will be discussing this at our upcoming The Next AI Platform event in San Jose on March 10 with Elias Fallon, engineering director at Cadence Design Systems. (You can see the full agenda and register to attend at this link; we hope to see you there.) The use of machine learning in chip design was also one of the topics that Jeff Dean, a senior fellow in the Research Group at Google who has helped invent many of the hyperscaler’s key technologies, talked about in his keynote address at this week’s 2020 International Solid State Circuits Conference in San Francisco.