POINTER: Constrained Progressive Text Generation Via Insertion-based Generative Pre-training | Awesome LLM Papers Add your paper to Awesome LLM Papers

POINTER: Constrained Progressive Text Generation Via Insertion-based Generative Pre-training

Yizhe Zhang, Guoyin Wang, Chunyuan Li, Zhe Gan, Chris Brockett, Bill Dolan . Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) 2020 – 67 citations

[Code] [Paper]   Search on Google Scholar   Search on Semantic Scholar
Compositional Generalization Content Enrichment Datasets EMNLP Few Shot Has Code Interdisciplinary Approaches Model Architecture Multimodal Semantic Representation Neural Machine Translation RAG Training Techniques Variational Autoencoders

Large-scale pre-trained language models, such as BERT and GPT-2, have achieved excellent performance in language representation learning and free-form text generation. However, these models cannot be directly employed to generate text under specified lexical constraints. To address this challenge, we present POINTER (PrOgressive INsertion-based TransformER), a simple yet novel insertion-based approach for hard-constrained text generation. The proposed method operates by progressively inserting new tokens between existing tokens in a parallel manner. This procedure is recursively applied until a sequence is completed. The resulting coarse-to-fine hierarchy makes the generation process intuitive and interpretable. We pre-train our model with the proposed progressive insertion-based objective on a 12GB Wikipedia dataset, and fine-tune it on downstream hard-constrained generation tasks. Non-autoregressive decoding yields an empirically logarithmic time complexity during inference time. Experimental results on both News and Yelp datasets demonstrate that POINTER achieves state-of-the-art performance on constrained text generation. We released the pre-trained models and the source code to facilitate future research (https://github.com/dreasysnail/POINTER).

Similar Work