Yannic Kilcher explains why transformers are ruining convolutions.

This paper, under review at ICLR, shows that given enough data, a standard Transformer can outperform Convolutional Neural Networks in image recognition tasks, which are classically tasks where CNNs excel. In this Video, I explain the architecture of the Vision Transformer (ViT), the reason why it works better and rant about why double-bline peer review is broken.

OUTLINE:

  • 0:00 – Introduction
  • 0:30 – Double-Blind Review is Broken
  • 5:20 – Overview
  • 6:55 – Transformers for Images
  • 10:40 – Vision Transformer Architecture
  • 16:30 – Experimental Results
  • 18:45 – What does the Model Learn?
  • 21:00 – Why Transformers are Ruining Everything
  • 27:45 – Inductive Biases in Transformers
  • 29:05 – Conclusion & Comments

Related resources:

  • Paper (Under Review): https://openreview.net/forum?id=YicbFdNTTy

Google introduced Tensor Processing Units or TPUs in four years ago.

TPUs, unlike GPUs, were custom-designed to deal with operations such as matrix multiplications in neural network training.

Here’s a great beginner’s guide to the technology.

Google TPUs can be accessed in two forms — cloud TPU and edge TPU. Cloud TPUs can be accessed from Google Colab notebook, which provides users with TPU pods that sit on Google’s data centres. Whereas, edge TPU is a custom-built development kit that can be used to build specific applications. In the next section, we will see the working of TPUs and its key components.