Yannic Kilcher explains why transformers are ruining convolutions.

This paper, under review at ICLR, shows that given enough data, a standard Transformer can outperform Convolutional Neural Networks in image recognition tasks, which are classically tasks where CNNs excel. In this Video, I explain the architecture of the Vision Transformer (ViT), the reason why it works better and rant about why double-bline peer review is broken.

OUTLINE:

  • 0:00 – Introduction
  • 0:30 – Double-Blind Review is Broken
  • 5:20 – Overview
  • 6:55 – Transformers for Images
  • 10:40 – Vision Transformer Architecture
  • 16:30 – Experimental Results
  • 18:45 – What does the Model Learn?
  • 21:00 – Why Transformers are Ruining Everything
  • 27:45 – Inductive Biases in Transformers
  • 29:05 – Conclusion & Comments

Related resources:

  • Paper (Under Review): https://openreview.net/forum?id=YicbFdNTTy

In this deeplizard episode, learn how to prepare and process our own custom data set of sign language digits, which will be used to train our fine-tuned MobileNet model in a future episode.

VIDEO SECTIONS

  • 00:00 Welcome to DEEPLIZARD – Go to deeplizard.com for learning resources
  • 00:40 Obtain the Data
  • 01:30 Organize the Data
  • 09:42 Process the Data
  • 13:11 Collective Intelligence and the DEEPLIZARD HIVEMIND

deeplizard  introduces MobileNets, a class of light weight deep convolutional neural networks that are vastly smaller in size and faster in performance than many other popular models.

VIDEO SECTIONS

  • 00:00 Welcome to DEEPLIZARD – Go to deeplizard.com for learning resources
  • 00:17 Intro to MobileNets
  • 02:56 Accessing MobileNet with Keras
  • 07:25 Getting Predictions from MobileNet
  • 13:32 Collective Intelligence and the DEEPLIZARD HIVEMIND

vcubingx provides a visual introduction to the structure of an artificial neural network.

The Neural Network, A Visual Introduction | Visualizing Deep Learning, Chapter 1

  • 0:00 Intro
  • 1:55 One input Perceptron
  • 3:30 Two input Perceptron
  • 4:40 Three input Perceptron
  • 5:17 Activation Functions
  • 6:58 Neural Network
  • 9:45 Visualizing 2-2-2 Network
  • 10:59 Visualizing 2-3-2 Network
  • 12:33 Classification
  • 13:05 Outro

In this video, Mandy from deeplizard  demonstrates how to use the fine-tuned VGG16 Keras model that we trained in the last episode to predict on images of cats and dogs in our test set.

Index:

  • 00:00 Welcome to DEEPLIZARD – Go to deeplizard.com for learning resources
  • 00:17 Predict with a Fine-tuned Model
  • 05:40 Plot Predictions With A Confusion Matrix
  • 05:16 Collective Intelligence and the DEEPLIZARD HIVEMIND