Ad

Yannic Kilcher explores a recent innovation at Facebook

Code migration between languages is an expensive and laborious task. To translate from one language to the other, one needs to be an expert at both. Current automatic tools often produce illegible and complicated code. This paper applies unsupervised neural machine translation to source code of Python, C++, and Java and is able to translate between them, without ever being trained in a supervised fashion.

Paper: https://arxiv.org/abs/2006.03511

Content index:

  • 0:00 – Intro & Overview
  • 1:15 – The Transcompiling Problem
  • 5:55 – Neural Machine Translation
  • 8:45 – Unsupervised NMT
  • 12:55 – Shared Embeddings via Token Overlap
  • 20:45 – MLM Objective
  • 25:30 – Denoising Objective
  • 30:10 – Back-Translation Objective
  • 33:00 – Evaluation Dataset
  • 37:25 – Results
  • 41:45 – Tokenization
  • 42:40 – Shared Embeddings
  • 43:30 – Human-Aware Translation
  • 47:25 – Failure Cases
  • 48:05 – Conclusion
tt ads