Talks

Distributed training for neural machine translation

Kenneth Heafield - University of Edinburgh

Friday, October 20, 2017, 11:00 am-12:00 pm

You are subscribed to this talk through .
You are watching this talk through .
You are subscribed to this talk. (unsubscribe, watch)
You are watching this talk. (unwatch, subscribe)
You are not subscribed to this talk. (watch, subscribe)

Abstract

Training a machine translation system takes a week or two of
GPU time. Bored waiting for systems to train, we thought it would be
fun to use multiple GPUs. But that consumes a lot of bandwidth, so
optimizing communication is critical for multiple nodes. Most of the
prior work experimented with vision and speech with convolutional neural
networks that use parameters for every data point. In contrast, machine
translation's recurrent networks are dominated by word embeddings and
only a few words appear in a batch of ~90 sentences, which makes
gradients very skewed. We find these gradients can be exploited to
compress communication and discuss strategies for further optimization,
such as tuning your momentum.

Bio

Kenneth Heafield is a Lecturer (that's en-UK for Assistant
Professor) in computer science at the University of Edinburgh. Motivated
by machine translation problems, he takes a systems-heavy approach to
improving quality and speed of neural systems. He is the creator of the
KenLM library for efficient language modeling. The New York Times
reviewed his t-shirts as "threadbare."

This talk is organized by Marine Carpuat