Machine Learning with Phil show you how to do sentiment analysis with TensorFlow 2 in this natural language processing (NLP) tutorial.

This natural language processing model is relatively straight forward, as it’s just an encoder coupled to some bidirectional layers and a couple dense layers to handle the classification. We’ll compare two different models, one with a single LSTM layer and the other with two LSTM layers and some dropout.

Here’s a talk by Danny Luo Pre-training of Deep Bidirectional Transformers for Language Understanding

We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional representations by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT representations can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial task-specific architecture modifications. BERT is conceptually simple and empirically powerful. It obtains new state-of-the-art results on eleven natural language processing tasks, including pushing the GLUE benchmark to 80.4% (7.6% absolute improvement), MultiNLI accuracy to 86.7 (5.6% absolute improvement) and the SQuAD v1.1 question answering Test F1 to 93.2 (1.5% absolute improvement), outperforming human performance by 2.0%.Toronto Deep Learning Series, 6 November 2018

Paper: https://arxiv.org/abs/1810.04805

Natural language processing (NLP) powered by deep learning is about to change the game for many organizations interested in AI, thanks in particular to BERT (Bidirectional Encoder Representations from Transformers).

Watch this webinar if you want to learn how BERT will power a new wave of language-based applications, from sentiment analysis to automatic text summarization to similarity assessment and more.

Microsoft Research features a talk by Wei Wen on Efficient and Scalable Deep Learning (slides)

In deep learning, researchers keep gaining higher performance by using larger models. However, there are two obstacles blocking the community to build larger models: (1) training larger models is more time-consuming, which slows down model design exploration, and (2) inference of larger models is also slow, which disables their deployment to computation constrained applications. In this talk, I will introduce some of our efforts to remove those obstacles. On the training side, we propose TernGrad to reduce communication bottleneck to scale up distributed deep learning; on the inference side, we propose structurally sparse neural networks to remove redundant neural components for faster inference. At the end, I will very briefly introduce (1) my recent efforts to accelerate AutoML, and (2) future work to utilize my research to overcome scaling issues in Natural Language Processing.

See more on this talk at Microsoft Research:
https://www.microsoft.com/en-us/research/video/efficient-and-scalable-deep-learning/

Yannic Kilcher investigates BERT and the white paper associated with it https://arxiv.org/abs/1810.04805

Abstract:We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional representations by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT representations can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial task-specific architecture modifications. BERT is conceptually simple and empirically powerful. It obtains new state-of-the-art results on eleven natural language processing tasks, including pushing the GLUE benchmark to 80.4% (7.6% absolute improvement), MultiNLI accuracy to 86.7 (5.6% absolute improvement) and the SQuAD v1.1 question answering Test F1 to 93.2 (1.5% absolute improvement), outperforming human performance by 2.0%.