In my Data Point earlier today, I mentioned how Google open sourced ALBERT yesterday.

ALBERT is an NLP model based on its revolutionary BERT model the company released last year.

ALBERT has been released as an open source implementation on top of TensorFlow It reduces model sizes in two ways- by sharing parameters across the hidden layers of the network and by factorising the embedding layer According to a report by i-programmer, Google has made ALBERT (A Lite BERT) […]

Machine Learning with Phil show you how to do sentiment analysis with TensorFlow 2 in this natural language processing (NLP) tutorial.

This natural language processing model is relatively straight forward, as it’s just an encoder coupled to some bidirectional layers and a couple dense layers to handle the classification. We’ll compare two different models, one with a single LSTM layer and the other with two LSTM layers and some dropout.

Here’s a talk by Danny Luo Pre-training of Deep Bidirectional Transformers for Language Understanding

We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional representations by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT representations can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial task-specific architecture modifications. BERT is conceptually simple and empirically powerful. It obtains new state-of-the-art results on eleven natural language processing tasks, including pushing the GLUE benchmark to 80.4% (7.6% absolute improvement), MultiNLI accuracy to 86.7 (5.6% absolute improvement) and the SQuAD v1.1 question answering Test F1 to 93.2 (1.5% absolute improvement), outperforming human performance by 2.0%.Toronto Deep Learning Series, 6 November 2018

Paper: https://arxiv.org/abs/1810.04805

BERT is one of the most popular algorithms in the NLP spectrum known for producing state-of-the-art results in a variety of language modeling tasks.

Built on top of transformers and seq-to-sequence models, the Bidirectional Encoder Representations from Transformers is a powerful NLP modeling technique that sits at the cutting edge.

Here’s a great write up on how to build a BERT classifier model in TF 2.0.

The success of BERT has not only made it the power behind the top search engine known to mankind but also has inspired and paved the way for many new and better models. Given below are some of the popular NLP models and algorithms which were inspired by BERT:

Natural language processing (NLP) powered by deep learning is about to change the game for many organizations interested in AI, thanks in particular to BERT (Bidirectional Encoder Representations from Transformers).

Watch this webinar if you want to learn how BERT will power a new wave of language-based applications, from sentiment analysis to automatic text summarization to similarity assessment and more.

Microsoft Research features a talk by Wei Wen on Efficient and Scalable Deep Learning (slides)

In deep learning, researchers keep gaining higher performance by using larger models. However, there are two obstacles blocking the community to build larger models: (1) training larger models is more time-consuming, which slows down model design exploration, and (2) inference of larger models is also slow, which disables their deployment to computation constrained applications. In this talk, I will introduce some of our efforts to remove those obstacles. On the training side, we propose TernGrad to reduce communication bottleneck to scale up distributed deep learning; on the inference side, we propose structurally sparse neural networks to remove redundant neural components for faster inference. At the end, I will very briefly introduce (1) my recent efforts to accelerate AutoML, and (2) future work to utilize my research to overcome scaling issues in Natural Language Processing.

See more on this talk at Microsoft Research:
https://www.microsoft.com/en-us/research/video/efficient-and-scalable-deep-learning/

Here’s a great article on three techniques for pre-processing raw text input for use in text classification/natural language processing applications.

Modern neural networks cannot interpret labeled text as described above and data must be pre-processed before it can be given to a network for training. One straightforward way to do this is with a bag of words. A bag of words is created by scanning through every element in a data set and creating a dictionary for each unique word seen that can act as an index.