Generating Training Data With Language Models: Towards Zero-shot Language Understanding | Awesome LLM Papers Add your paper to Awesome LLM Papers

Generating Training Data With Language Models: Towards Zero-shot Language Understanding

Yu Meng, Jiaxin Huang, Yu Zhang, Jiawei Han . Arxiv 2022 – 74 citations

[Paper]   Search on Google Scholar   Search on Semantic Scholar
Compositional Generalization Content Enrichment Evaluation Few Shot Fine Tuning Image Text Integration Interactive Environments Interdisciplinary Approaches Model Architecture Multimodal Semantic Representation Neural Machine Translation Productivity Enhancement Prompting Question Answering RAG Training Techniques Variational Autoencoders

Pretrained language models (PLMs) have demonstrated remarkable performance in various natural language processing tasks: Unidirectional PLMs (e.g., GPT) are well known for their superior text generation capabilities; bidirectional PLMs (e.g., BERT) have been the prominent choice for natural language understanding (NLU) tasks. While both types of models have achieved promising few-shot learning performance, their potential for zero-shot learning has been underexplored. In this paper, we present a simple approach that uses both types of PLMs for fully zero-shot learning of NLU tasks without requiring any task-specific data: A unidirectional PLM generates class-conditioned texts guided by prompts, which are used as the training data for fine-tuning a bidirectional PLM. With quality training data selected based on the generation probability and regularization techniques (label smoothing and temporal ensembling) applied to the fine-tuning stage for better generalization and stability, our approach demonstrates strong performance across seven classification tasks of the GLUE benchmark (e.g., 72.3/73.8 on MNLI-m/mm and 92.8 on SST-2), significantly outperforming zero-shot prompting methods and achieving even comparable results to strong few-shot approaches using 32 training samples per class.

Similar Work