Here’s an interesting data visualization of neural networks that have been trained on the MNIST data set. All set to catchy music.
Here’s an interesting look at weight agnostic neural networks and what problems they solve. Very interesting read.
Not all neural network architectures are created equal, some perform much better than others for certain tasks. But how important are the weight parameters of a neural network compared to its architecture? In this work, we question to what extent neural network architectures alone, without learning any weight parameters, can encode solutions for a given task.
Here’s an interesting tutorial on prepping image data for CNNs.
It is challenging to know how to best prepare image data when training a convolutional neural network. This involves both scaling the pixel values and use of augmentation techniques during both the training and evaluation of the model. Instead of testing a wide range of options, a useful shortcut […]
Here’s a great video on how to implement the forward method for a convolutional neural network (CNN) in PyTorch.
Here’s a great article on avoiding overfitting in deep neural networks.
No one is too legit to overfit.
Training a deep neural network that can generalize well to new data is a challenging problem. A model with too little capacity cannot learn the problem, whereas a model with too much capacity can learn it too well and overfit the training dataset. Both cases result in a model […]
Of all the machine learning algorithms, the most fascinating are neural networks. They don’t require statistical hypothesis or rigorous data preparation save for normalization.
The power of a neural network lies in its architecture, its activation functions, its regularization, etc.
Here’s an interesting article exploring a particular kind of neural network: the autoencoder.
Fraud detection, a common use of AI, belongs to a more general class of problems — anomaly detection.
An anomaly is a generic, not domain-specific, concept. It refers to any exceptional or unexpected event in the data: a mechanical piece failure, an arrhythmic heartbeat, or a fraudulent transaction.
Basically, identifying a fraud means identifying an anomaly in the realm of a set of legitimate transactions. Like all anomalies, you can never be truly sure of the form a fraudulent transaction will take on. You need to take all possible “unknown” forms into account.
Here’s an interesting article on doing anomaly/fraud detection with a neural autoencoder.
Using a training set of just legitimate transactions, we teach a machine learning algorithm to reproduce the feature vector of each transaction. Then we perform a reality check on such a reproduction. If the distance between the original transaction and the reproduced transaction is below a given threshold, the transaction is considered legitimate; otherwise it is considered a fraud candidate (generative approach). In this case, we just need a training set of “normal” transactions, and we suspect an anomaly from the distance value.
Based on the histograms or on the box plots of the input features, a threshold can be identified. All transactions with input features beyond that threshold will be declared fraud candidates (discriminative approach). Usually, for this approach, a number of fraud and legitimate transaction examples are necessary to build the histograms or the box plots.
Here’s an interesting look at the cutting edge technologies just on the horizon of AI research and what problems they can potentially solve that current techniques can’t.
Artificial intelligence (AI) is dominated by pattern recognition techniques. Recently, major advances have been made in the fields of image recognition, machine translation, audio processing and several others thanks to the development and refinement of deep learning. But deep learning is not the cure for every problem. In fact, […]
Here’s an interesting article on “oscillatory neural networks” and how physicists trained it to perform image recognition.
An oscillatory neural network is a complex interlacing of interacting elements (oscillators) that are able to receive and transmit oscillations of a certain frequency. Receiving signals of various frequencies from preceding elements, the artificial neuron oscillator can synchronize its rhythm with these fluctuations. As a result, […]
Most of GPipe’s performance gains come from better memory allocation for AI models. On second-generation Google Cloud tensor processing units (TPUs), each of which contains eight processor cores and 64 GB memory (8 GB per core), GPipe reduced intermediate memory usage from 6.26 GB to 3.46GB, enabling 318 million parameters on a single accelerator core. Without GPipe, Huang says, a single core can only train up to 82 million model parameters.