Visual scenes are often comprised of sets of independent objects. Yet, current vision models make no assumptions about the nature of the pictures they look at.

Yannic Kilcher explore a paper on object-centric learning.

By imposing an objectness prior, this paper a module that is able to recognize permutation-invariant sets of objects from pixels in both supervised and unsupervised settings. It does so by introducing a slot attention module that combines an attention mechanism with dynamic routing.

Content index:

  • 0:00 – Intro & Overview
  • 1:40 – Problem Formulation
  • 4:30 – Slot Attention Architecture
  • 13:30 – Slot Attention Algorithm
  • 21:30 – Iterative Routing Visualization
  • 29:15 – Experiments
  • 36:20 – Inference Time Flexibility
  • 38:35 – Broader Impact Statement
  • 42:05 – Conclusion & Comments

Yannic Kilcher retraces his first reading of Facebook AI’s DETR paper and explain my process of understanding it.

OUTLINE:

  • 0:00 – Introduction
  • 1:25 – Title
  • 4:10 – Authors
  • 5:55 – Affiliation
  • 7:40 – Abstract
  • 13:50 – Pictures
  • 20:30 – Introduction
  • 22:00 – Related Work
  • 24:00 – Model
  • 30:00 – Experiments
  • 41:50 – Conclusions & Abstract
  • 42:40 – Final Remarks

Original Video about DETR: https://youtu.be/T35ba_VXkMY