Here’s a talk by Danny Luo Pre-training of Deep Bidirectional Transformers for Language Understanding

We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional representations by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT representations can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial task-specific architecture modifications. BERT is conceptually simple and empirically powerful. It obtains new state-of-the-art results on eleven natural language processing tasks, including pushing the GLUE benchmark to 80.4% (7.6% absolute improvement), MultiNLI accuracy to 86.7 (5.6% absolute improvement) and the SQuAD v1.1 question answering Test F1 to 93.2 (1.5% absolute improvement), outperforming human performance by 2.0%.Toronto Deep Learning Series, 6 November 2018


Malte Pietsch delivers this keynote on “Transfer Learning – Entering a new era in NLP” at PyData Warsaw 2019

Transfer learning has been changing the NLP landscape tremendously since the release of BERT one year ago. Transformers of all kinds have emerged, dominate most research leaderboards and have made their way into industrial applications. In this talk we will dissect the paradigm of transfer learning and its effects on pipelines, modelling and the engineers mindset.

There are 250 billion micro-controllers in the world today. 28.1 billion units were sold in 2018 alone, and IC Insights forecasts annual shipment volume to grow to 38.2 billion by 2023.

What if they all became smart? How would that change our world?


TinyML broadly encapsulates the field of machine learning technologies capable of performing on-device analytics of sensor data at extremely low power. Between hardware advancements and the TinyML community’s recent innovations in machine learning, it is now possible to run increasingly complex deep learning models (the foundation of most modern artificial intelligence applications) directly on microcontrollers. A quick glance under the hood shows this is fundamentally possible because deep learning models are compute-bound, meaning their efficiency is limited by the time it takes to complete a large number of arithmetic operations. Advancements in TinyML have made it possible to run these models on existing microcontroller hardware.

Since it’s still January, we can still make predictions for the year.

2020 will see further democratization of machine learning tools and a lower point of entry for their usage.

This will make data science/AI even more commonplace not only among top tech companies, but also small and medium-sized businesses across various verticals.

However, one aspect which is potentially underrated when looking at the big trends, in terms of the future of data science, is around language frameworks used to make the everyday data science tasks possible. Today, there are two major frameworks, R or Python (or in more pragmatic data science circles, both!). One is praised for having the most beautifully designed data wrangling syntax and plotting libraries, the other for its expressiveness and having the best deep learning libraries available today.

Will “Network Execubots” decide what films and TV shows get made?

The Hollywood Reporter has just reported that Warner Bros. has signed a deal with a tech company to implement an “AI-driven film management” system.

The system, which may sound more like an administrative tool than an industry game changer, will help the major studio decide which projects receive the proverbial green light: a task that’s daunting for humans, but a potential walk in the park for computer algorithms.

According to THR, the system, created by the Los Angeles-based company, Cinelytics, uses “comprehensive data and predictive analytics” to help “guide decision-making at the greenlight stage.” THR also says that Cinelytics’ tech can “assess the value of a star in any territory,” and even predict how well a film will perform in theaters and secondary markets.

This full 6 hour+ course provides a complete introduction to Graph Theory algorithms in computer science.


Course created by William Fiset. Check out his YouTube channel:

⭐️ Course Contents ⭐️
⌨️ (0:00:00) Graph Theory Introduction
⌨️ (0:13:53) Problems in Graph Theory
⌨️ (0:23:15) Depth First Search Algorithm
⌨️ (0:33:18) Breadth First Search Algorithm
⌨️ (0:40:27) Breadth First Search grid shortest path
⌨️ (0:56:23) Topological Sort Algorithm
⌨️ (1:09:52) Shortest/Longest path on a Directed Acyclic Graph (DAG)
⌨️ (1:19:34) Dijkstra’s Shortest Path Algorithm
⌨️ (1:43:17) Dijkstra’s Shortest Path Algorithm | Source Code
⌨️ (1:50:47) Bellman Ford Algorithm
⌨️ (2:05:34) Floyd Warshall All Pairs Shortest Path Algorithm
⌨️ (2:20:54) Floyd Warshall All Pairs Shortest Path Algorithm | Source Code
⌨️ (2:29:19) Bridges and Articulation points Algorithm
⌨️ (2:49:01) Bridges and Articulation points source code
⌨️ (2:57:32) Tarjans Strongly Connected Components algorithm
⌨️ (3:13:56) Tarjans Strongly Connected Components algorithm source code
⌨️ (3:20:12) Travelling Salesman Problem | Dynamic Programming
⌨️ (3:39:59) Travelling Salesman Problem source code | Dynamic Programming
⌨️ (3:52:27) Existence of Eulerian Paths and Circuits
⌨️ (4:01:19) Eulerian Path Algorithm
⌨️ (4:15:47) Eulerian Path Algorithm | Source Code
⌨️ (4:23:00) Prim’s Minimum Spanning Tree Algorithm
⌨️ (4:37:05) Eager Prim’s Minimum Spanning Tree Algorithm
⌨️ (4:50:38) Eager Prim’s Minimum Spanning Tree Algorithm | Source Code
⌨️ (4:58:30) Max Flow Ford Fulkerson | Network Flow
⌨️ (5:11:01) Max Flow Ford Fulkerson | Source Code
⌨️ (5:27:25) Unweighted Bipartite Matching | Network Flow
⌨️ (5:38:11) Mice and Owls problem | Network Flow
⌨️ (5:46:11) Elementary Math problem | Network Flow
⌨️ (5:56:19) Edmonds Karp Algorithm | Network Flow
⌨️ (6:05:18) Edmonds Karp Algorithm | Source Code
⌨️ (6:10:08) Capacity Scaling | Network Flow
⌨️ (6:19:34) Capacity Scaling | Network Flow | Source Code
⌨️ (6:25:04) Dinic’s Algorithm | Network Flow
⌨️ (6:36:09) Dinic’s Algorithm | Network Flow | Source Code

Tim Corey explores Entity Framework, an amazing set of tooling around data access.

With EFCore, that tooling becomes even more powerful. So why is it that I still don’t recommend that people use EFCore?

In this video, he walks you through the best practices of Entity Framework and EFCore and point out the pitfalls to avoid. We will discuss where there are problems and what to do to resolve those problems.

In this episode, Serkant Karaca and Shubha Vijayasarathy from the Azure Event Hubs team talk about how and when to use Azure Event Hubs as the messaging component in our .NET applications. They’ll discuss use cases, cover topics like partitioning  and also show how to use the .NET SDK for Event Hubs.

Useful Links

Siraj Raval has a video exploring a paper about genomics and creating reliable machine learning systems.

Deep learning classifiers make the ladies (and gentlemen) swoon, but they often classify novel data that’s not in the training set incorrectly with high confidence. This has serious real world consequences! In Medicine, this could mean misdiagnosing a patient. In autonomous vehicles, this could mean ignoring a stop sign. Machines are increasingly tasked with making life or death decisions like that, so it’s important that we figure out how to correct this problem! I found a new, relatively obscure yet extremely fascinating paper out of Google Research that tackles this problem head on. In this episode, I’ll explain the work of these researchers, we’ll write some code, do some math, do some visualizations, and by the end I’ll freestyle rap about AI and genomics. I had a lot of fun making this, so I hope you enjoy it!

Likelihood Ratios for Out-of-Distribution Detection paper: 

The researcher’s code: