In deep learning, researchers keep gaining higher performance by using larger models. However, there are two obstacles blocking the community to build larger models: (1) training larger models is more time-consuming, which slows down model design exploration, and (2) inference of larger models is also slow, which disables their deployment to computation constrained applications. In this talk, I will introduce some of our efforts to remove those obstacles. On the training side, we propose TernGrad to reduce communication bottleneck to scale up distributed deep learning; on the inference side, we propose structurally sparse neural networks to remove redundant neural components for faster inference. At the end, I will very briefly introduce (1) my recent efforts to accelerate AutoML, and (2) future work to utilize my research to overcome scaling issues in Natural Language Processing.
See more on this talk at Microsoft Research: