Skywork-reward: Bag Of Tricks For Reward Modeling In Llms | Awesome LLM Papers Add your paper to Awesome LLM Papers

Skywork-reward: Bag Of Tricks For Reward Modeling In Llms

Chris Yuhao Liu, Liang Zeng, Jiacai Liu, Rui Yan, Jujie He, Chaojie Wang, Shuicheng Yan, Yang Liu, Yahui Zhou . No Venue 2024

[Paper]   Search on Google Scholar   Search on Semantic Scholar
Applications Compositional Generalization Datasets Evaluation Reinforcement Learning

In this report, we introduce a collection of methods to enhance reward modeling for LLMs, focusing specifically on data-centric techniques. We propose effective data selection and filtering strategies for curating high-quality open-source preference datasets, culminating in the Skywork-Reward data collection, which contains only 80K preference pairs – significantly smaller than existing datasets. Using this curated dataset, we developed the Skywork-Reward model series – Skywork-Reward-Gemma-27B and Skywork-Reward-Llama-3.1-8B – with the former currently holding the top position on the RewardBench leaderboard. Notably, our techniques and datasets have directly enhanced the performance of many top-ranked models on RewardBench, highlighting the practical impact of our contributions in real-world preference learning applications.

Similar Work