Yannic Kilcher explains why transformers are ruining convolutions.
This paper, under review at ICLR, shows that given enough data, a standard Transformer can outperform Convolutional Neural Networks in image recognition tasks, which are classically tasks where CNNs excel. In this Video, I explain the architecture of the Vision Transformer (ViT), the reason why it works better and rant about why double-bline peer review is broken.
OUTLINE:
0:00 – Introduction
0:30 – Double-Blind Review is Broken
5:20 – Overview
6:55 – Transformers for Images
10:40 – Vision Transformer Architecture
16:30 – Experimental Results
18:45 – What does the Model Learn?
21:00 – Why Transformers are Ruining Everything
27:45 – Inductive Biases in Transformers
29:05 – Conclusion & Comments
Related resources:
Paper (Under Review): https://openreview.net/forum?id=YicbFdNTTy
In this episode, Mandy from deeplizard will be building on what we’ve learned about MobileNet combined with the techniques we’ve used for fine-tuning to fine-tune MobileNet for a custom image data set using TensorFlow’s Keras API.
In this deeplizard episode, learn how to prepare and process our own custom data set of sign language digits, which will be used to train our fine-tuned MobileNet model in a future episode.
VIDEO SECTIONS
00:00 Welcome to DEEPLIZARD – Go to deeplizard.com for learning resources
00:40 Obtain the Data
01:30 Organize the Data
09:42 Process the Data
13:11 Collective Intelligence and the DEEPLIZARD HIVEMIND
deeplizard introduces MobileNets, a class of light weight deep convolutional neural networks that are vastly smaller in size and faster in performance than many other popular models.
VIDEO SECTIONS
00:00 Welcome to DEEPLIZARD – Go to deeplizard.com for learning resources
00:17 Intro to MobileNets
02:56 Accessing MobileNet with Keras
07:25 Getting Predictions from MobileNet
13:32 Collective Intelligence and the DEEPLIZARD HIVEMIND
In this video, Mandy from deeplizard demonstrates how to use the fine-tuned VGG16 Keras model that we trained in the last episode to predict on images of cats and dogs in our test set.
Index:
00:00 Welcome to DEEPLIZARD – Go to deeplizard.com for learning resources
00:17 Predict with a Fine-tuned Model
05:40 Plot Predictions With A Confusion Matrix
05:16 Collective Intelligence and the DEEPLIZARD HIVEMIND
deeplizard demonstrates how to create a confusion matrix, which will aid us in being able to visually observe how well a neural network is predicting during inference.
VIDEO SECTIONS
00:00 Welcome to DEEPLIZARD – Go to deeplizard.com for learning resources
00:34 Plotting a Confusion Matrix
02:48 Reading a Confusion Matrix
04:56 Collective Intelligence and the DEEPLIZARD HIVEMIND