In this tutorial, see how you can train a Convolutional Neural Network in PyTorch and convert it into an ONNX model.

Once the model is in in ONNX format, you can import that into other frameworks such as TensorFlow for either inference and reusing the model through transfer learning.

This post is the third in a series of introductory tutorials on the Open Neural Network Exchange (ONNX), an initiative from AWS, Microsoft, and Facebook to define a standard for interoperability across machine learning platforms. See: Part 1 , Part 2 . In this tutorial, we will train a […]

ONNX Runtime inference engine is capable of executing ML models in different HW environments, taking advantage of the neural network acceleration capabilities.

Microsoft and Xilinx worked together to integrate ONNX Runtime with the VitisAI SW libraries for executing ONNX models in the Xilinx U250 FPGAs. We are happy to introduce the preview release of this capability today.

Video index:

[06:15] Demo by PeakSpeed for satellite imaging Orthorectification

Related links:

Other links:

By optimizing BERT for CPU, Microsoft has made inferencing affordable and cost-effective.

According to the published benchmark, BERT inferencing based on an Azure Standard F16s_v2 CPU takes only 9ms which translates to a 17x increase in speed.

Microsoft partnered with NVIDIA to optimize BERT for GPUs powering the Azure NV6 Virtual Machines. The optimization included rewriting and implementing the neural network in TensorRT C++ APIs based on CUDA and CUBLAS libraries. The NV6 family of Azure VMs is powered by NVIDIA Tesla M60 GPUs. Microsoft claims that the improved Bing search platform running on the optimized model on NVIDIA GPUs serves more than one million BERT inferences per second within Bing’s latency limits.

What is the universal inference engine for neural networks?

Microsoft Research just posted this video exploring ONNX.

Tensorflow? PyTorch? Keras? There are many popular frameworks out there for working with Deep Learning and ML models, each with their pros and cons for practical usability for product development and/or research. Once you decide what to use and train a model, now you need to figure out how to deploy it onto your platform and architecture of choice. Cloud? Windows? Linux? IOT? Performance sensitive? How about GPU acceleration? With a landscape of 1,000,001 different combinations for deploying a trained model from some chosen framework into a performant production environment for prediction, we can benefit from some standardization.

Accelerate and optimize machine learning models regardless of training framework using ONNX and ONNX Runtime. This episode introduces both ONNX and ONNX Runtime and provides an example of ONNX Runtime accelerating Bing Semantic Precise Image Search.

Learn more about ONNX:

Follow ONNX on Social Media:

Did you know that you can now train machine learning models with Azure ML once and deploy them in the Cloud (AKS/ACI) and on the edge (Azure IoT Edge) seamlessly thanks to ONNX Runtime inference engine.

In this new episode of the IoT Show, learn about the ONNX Runtime, the Microsoft built inference engine for ONNX models – its cross platform, cross training frameworks and op-par or better performance than existing inference engines.
From the description:
We will show how to train and containerize a machine learning model using Azure Machine Learning then deploy the trained model to a container service in the cloud and to an Azure IoT Edge device with IoT Edge across different HW platform – Intel, NVIDIA and Qualcomm.

In this article from VentureBeat, read about Scott Guthrie’s excitement about ONNX.

“Even today with the ONNX workloads for AI, the compelling part is you can now build custom models or use our models, again using TensorFlow, PyTorch, Keras, whatever framework you want, and then know that you can hardware-accelerate it whether it’s on the latest Nvidia GPU, whether it’s on the new AMD GPUs, whether it’s on Intel FPGA, whether it’s on someone else’s FPGA or new silicon we might release in the future. That to me is more compelling than ‘do we have a better instruction set at the hardware level’ and generally what I find resonates best with customers.”

ONNX is a open format to represent deep learning models that is supported by various frameworks and tools. This format makes it easier to interoperate between frameworks and to maximize the reach of your hardware optimization investments

In this episode, Seth Juarez (@sethjuarez) sits with Rich to show us how we can use the ONNX runtime inside of our .NET applications. He gives us a quick introduction to training a model with PyTorch, and also explains some foundational concepts around prediction accuracy.

Useful Links