Yannic Kilcher explains why transformers are ruining convolutions.

This paper, under review at ICLR, shows that given enough data, a standard Transformer can outperform Convolutional Neural Networks in image recognition tasks, which are classically tasks where CNNs excel. In this Video, I explain the architecture of the Vision Transformer (ViT), the reason why it works better and rant about why double-bline peer review is broken.


  • 0:00 – Introduction
  • 0:30 – Double-Blind Review is Broken
  • 5:20 – Overview
  • 6:55 – Transformers for Images
  • 10:40 – Vision Transformer Architecture
  • 16:30 – Experimental Results
  • 18:45 – What does the Model Learn?
  • 21:00 – Why Transformers are Ruining Everything
  • 27:45 – Inductive Biases in Transformers
  • 29:05 – Conclusion & Comments

Related resources:

  • Paper (Under Review): https://openreview.net/forum?id=YicbFdNTTy

Siraj Raval has a video exploring a paper about genomics and creating reliable machine learning systems.

Deep learning classifiers make the ladies (and gentlemen) swoon, but they often classify novel data that’s not in the training set incorrectly with high confidence. This has serious real world consequences! In Medicine, this could mean misdiagnosing a patient. In autonomous vehicles, this could mean ignoring a stop sign. Machines are increasingly tasked with making life or death decisions like that, so it’s important that we figure out how to correct this problem! I found a new, relatively obscure yet extremely fascinating paper out of Google Research that tackles this problem head on. In this episode, I’ll explain the work of these researchers, we’ll write some code, do some math, do some visualizations, and by the end I’ll freestyle rap about AI and genomics. I had a lot of fun making this, so I hope you enjoy it!

Likelihood Ratios for Out-of-Distribution Detection paper: https://arxiv.org/pdf/1906.02845.pdf 

The researcher’s code: https://github.com/google-research/google-research/tree/master/genomics_ood