Last last week, Google’s AI research division open-sourced GPipe, a library for “efficiently” training deep neural networks under Lingvo, a TensorFlow framework for sequence modeling.

Most of GPipe’s performance gains come from better memory allocation for AI models. On second-generation Google Cloud tensor processing units (TPUs), each of which contains eight processor cores and 64 GB memory (8 GB per core), GPipe reduced intermediate memory usage from 6.26 GB to 3.46GB, enabling 318 million parameters on a single accelerator core. Without GPipe, Huang says, a single core can only train up to 82 million model parameters.

tt ads