RL + Transformer = A General-purpose Problem Solver

Micah Rentschler, Jesse Roberts . No Venue 2025

[Paper]
Efficiency Emergent Abilities Reinforcement Learning

What if artificial intelligence could not only solve problems for which it was trained but also learn to teach itself to solve new problems (i.e., meta-learn)? In this study, we demonstrate that a pre-trained transformer fine-tuned with reinforcement learning over multiple episodes develops the ability to solve problems that it has never encountered before - an emergent ability called In-Context Reinforcement Learning (ICRL). This powerful meta-learner not only excels in solving unseen in-distribution environments with remarkable sample efficiency, but also shows strong performance in out-of-distribution environments. In addition, we show that it exhibits robustness to the quality of its training data, seamlessly stitches together behaviors from its context, and adapts to non-stationary environments. These behaviors demonstrate that an RL-trained transformer can iteratively improve upon its own solutions, making it an excellent general-purpose problem solver.

Awesome LLM Papers

Stay Updated

RL + Transformer = A General-purpose Problem Solver

Micah Rentschler, Jesse Roberts . No Venue 2025

Similar Work