Mask-predict: Parallel Decoding Of Conditional Masked Language Models | Awesome LLM Papers Add your paper to Awesome LLM Papers

Mask-predict: Parallel Decoding Of Conditional Masked Language Models

Marjan Ghazvininejad, Omer Levy, Yinhan Liu, Luke Zettlemoyer . Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) 2019 – 444 citations

[Paper]   Search on Google Scholar   Search on Semantic Scholar
Compositional Generalization EMNLP Interdisciplinary Approaches Model Architecture Multimodal Semantic Representation Neural Machine Translation

Most machine translation systems generate text autoregressively from left to right. We, instead, use a masked language modeling objective to train a model to predict any subset of the target words, conditioned on both the input text and a partially masked target translation. This approach allows for efficient iterative decoding, where we first predict all of the target words non-autoregressively, and then repeatedly mask out and regenerate the subset of words that the model is least confident about. By applying this strategy for a constant number of iterations, our model improves state-of-the-art performance levels for non-autoregressive and parallel decoding translation models by over 4 BLEU on average. It is also able to reach within about 1 BLEU point of a typical left-to-right transformer model, while decoding significantly faster.

Similar Work