Language Self-play For Data-free Training | Awesome LLM Papers Add your paper to Awesome LLM Papers

Language Self-play For Data-free Training

Jakub Grudzien Kuba, Mengting Gu, Qi Ma, Yuandong Tian, Vijai Mohan . No Venue 2025

[Paper]   Search on Google Scholar   Search on Semantic Scholar
Compositional Generalization Interdisciplinary Approaches Multimodal Semantic Representation Reinforcement Learning Tools Training Techniques

Large language models (LLMs) have advanced rapidly in recent years, driven by scale, abundant high-quality training data, and reinforcement learning. Yet this progress faces a fundamental bottleneck: the need for ever more data from which models can continue to learn. In this work, we propose a reinforcement learning approach that removes this dependency by enabling models to improve without additional data. Our method leverages a game-theoretic framework of self-play, where a model’s capabilities are cast as performance in a competitive game and stronger policies emerge by having the model play against itself - a process we call Language Self-Play (LSP). Experiments with Llama-3.2-3B-Instruct on instruction-following benchmarks show that pretrained models can not only enhance their performance on challenging tasks through self-play alone, but can also do so more effectively than data-driven baselines.

Similar Work