RL + Transformer = A General-purpose Problem Solver | Awesome LLM Papers Add your paper to Awesome LLM Papers

RL + Transformer = A General-purpose Problem Solver

Micah Rentschler, Jesse Roberts . No Venue 2025

[Paper]   Search on Google Scholar   Search on Semantic Scholar
Compositional Generalization Efficiency Emergent Abilities Interdisciplinary Approaches Model Architecture Productivity Enhancement Reinforcement Learning Security Training Techniques

What if artificial intelligence could not only solve problems for which it was trained but also learn to teach itself to solve new problems (i.e., meta-learn)? In this study, we demonstrate that a pre-trained transformer fine-tuned with reinforcement learning over multiple episodes develops the ability to solve problems that it has never encountered before - an emergent ability called In-Context Reinforcement Learning (ICRL). This powerful meta-learner not only excels in solving unseen in-distribution environments with remarkable sample efficiency, but also shows strong performance in out-of-distribution environments. In addition, we show that it exhibits robustness to the quality of its training data, seamlessly stitches together behaviors from its context, and adapts to non-stationary environments. These behaviors demonstrate that an RL-trained transformer can iteratively improve upon its own solutions, making it an excellent general-purpose problem solver.

Similar Work