Related papers: Skywork-Reward: Bag of Tricks for Reward Modeling in LLMs

Skywork-Reward: Bag of Tricks for Reward Modeling in LLMs

URL: http://arxiv.org/abs/2410.18451v1
Date: Thu, 24 Oct 2024 06:06:26 GMT
Title: Skywork-Reward: Bag of Tricks for Reward Modeling in LLMs
Authors: Chris Yuhao Liu, Liang Zeng, Jiacai Liu, Rui Yan, Jujie He, Chaojie Wang, Shuicheng Yan, Yang Liu, Yahui Zhou,
Abstract summary: We propose effective data selection and filtering strategies for curating high-quality open-source preference datasets. We curated the Skywork-Reward data collection, which contains only 80K preference pairs. We developed the Skywork-Reward model series -- Skywork-Reward-Gemma-27B and Skywork-Reward-Llama-3.1-8B -- with the former currently holding the top position on the RewardBench leaderboard.
Score: 54.11217789754743
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this report, we introduce a collection of methods to enhance reward modeling for LLMs, focusing specifically on data-centric techniques. We propose effective data selection and filtering strategies for curating high-quality open-source preference datasets, culminating in the Skywork-Reward data collection, which contains only 80K preference pairs -- significantly smaller than existing datasets. Using this curated dataset, we developed the Skywork-Reward model series -- Skywork-Reward-Gemma-27B and Skywork-Reward-Llama-3.1-8B -- with the former currently holding the top position on the RewardBench leaderboard. Notably, our techniques and datasets have directly enhanced the performance of many top-ranked models on RewardBench, highlighting the practical impact of our contributions in real-world preference learning applications.

Related papers

KaLM-Embedding-V2: Superior Training Techniques and Data Inspire A Versatile Embedding Model [63.13906424204078]
We propose KaLM-Embedding-V2, a series of versatile and compact embedding models.<n>For model architecture, we implement the models on a 0.5B compact size with simple mean-pooling to produce fixed-length embeddings.<n>For training data, we curate over 20 categories for pre-training and 100 categories for fine-tuning and contrastive distillation.
arXiv Detail & Related papers (2025-06-26T01:09:44Z)
DeepRec: Towards a Deep Dive Into the Item Space with Large Language Model Based Recommendation [83.21140655248624]
Large language models (LLMs) have been introduced into recommender systems (RSs)<n>We propose DeepRec, a novel LLM-based RS that enables autonomous multi-turn interactions between LLMs and TRMs for deep exploration of the item space.<n> Experiments on public datasets demonstrate that DeepRec significantly outperforms both traditional and LLM-based baselines.
arXiv Detail & Related papers (2025-05-22T15:49:38Z)
Augmented Relevance Datasets with Fine-Tuned Small LLMs [0.7022492404644501]
This paper explores the use of small, fine-tuned large language models (LLMs) to automate relevance assessment. We fine-tuned small LLMs to enhance relevance assessments, thereby improving dataset creation quality for downstream ranking model training.
arXiv Detail & Related papers (2025-04-14T02:35:00Z)
Building a Family of Data Augmentation Models for Low-cost LLM Fine-tuning on the Cloud [12.651588927599441]
We present a family of data augmentation models designed to significantly improve the efficiency for model fine-tuning. These models, trained based on sufficiently small LLMs, support key functionalities with low inference costs. Experiments and an application study prove the effectiveness of our approach.
arXiv Detail & Related papers (2024-12-06T09:04:12Z)
Star-Agents: Automatic Data Optimization with LLM Agents for Instruction Tuning [71.2981957820888]
We propose a novel Star-Agents framework, which automates the enhancement of data quality across datasets. The framework initially generates diverse instruction data with multiple LLM agents through a bespoke sampling method. The generated data undergo a rigorous evaluation using a dual-model method that assesses both difficulty and quality.
arXiv Detail & Related papers (2024-11-21T02:30:53Z)
STAR: A Simple Training-free Approach for Recommendations using Large Language Models [36.18841135511487]
Recent progress in large language models (LLMs) offers promising new approaches for recommendation system (RecSys) tasks. We propose a framework that utilizes LLMs and can be applied to various recommendation tasks without the need for fine-tuning. Our method achieves Hits@10 performance of +23.8% on Beauty, +37.5% on Toys and Games, and -1.8% on Sports and Outdoors.
arXiv Detail & Related papers (2024-10-21T19:34:40Z)
POINTS: Improving Your Vision-language Model with Affordable Strategies [28.611705477757454]
We train a robust baseline model using latest advancements in vision-language models. We filter pre-training data using perplexity, selecting the lowest perplexity data for training. During visual instruction tuning, we used model soup on different datasets when adding more datasets yielded marginal improvements.
arXiv Detail & Related papers (2024-09-07T13:41:37Z)
Lifelong Personalized Low-Rank Adaptation of Large Language Models for Recommendation [50.837277466987345]
We focus on the field of large language models (LLMs) for recommendation. We propose RecLoRA, which incorporates a Personalized LoRA module that maintains independent LoRAs for different users. We also design a Few2Many Learning Strategy, using a conventional recommendation model as a lens to magnify small training spaces to full spaces.
arXiv Detail & Related papers (2024-08-07T04:20:28Z)
HelpSteer2: Open-source dataset for training top-performing reward models [9.214886217647157]
We develop HelpSteer2, a permissively licensed preference dataset. HelpSteer2 consists of only ten thousand response pairs, an order of fewer than existing preference datasets. We propose SteerLM 2.0, a model alignment approach that can effectively make use of the rich multi-attribute score predicted by our reward models.
arXiv Detail & Related papers (2024-06-12T22:28:08Z)
Monte Carlo Tree Search Boosts Reasoning via Iterative Preference Learning [55.96599486604344]
We introduce an approach aimed at enhancing the reasoning capabilities of Large Language Models (LLMs) through an iterative preference learning process. We use Monte Carlo Tree Search (MCTS) to iteratively collect preference data, utilizing its look-ahead ability to break down instance-level rewards into more granular step-level signals. The proposed algorithm employs Direct Preference Optimization (DPO) to update the LLM policy using this newly generated step-level preference data.
arXiv Detail & Related papers (2024-05-01T11:10:24Z)
Routing to the Expert: Efficient Reward-guided Ensemble of Large Language Models [69.51130760097818]
We propose Zooter, a reward-guided routing method distilling rewards on training queries to train a routing function. We evaluate Zooter on a comprehensive benchmark collection with 26 subsets on different domains and tasks.
arXiv Detail & Related papers (2023-11-15T04:40:43Z)
Retrieval-Enhanced Contrastive Vision-Text Models [61.783728119255365]
We propose to equip vision-text models with the ability to refine their embedding with cross-modal retrieved information from a memory at inference time. Remarkably, we show that this can be done with a light-weight, single-layer, fusion transformer on top of a frozen CLIP. Our experiments validate that our retrieval-enhanced contrastive (RECO) training improves CLIP performance substantially on several challenging fine-grained tasks.
arXiv Detail & Related papers (2023-06-12T15:52:02Z)
Curriculum Learning for Dense Retrieval Distillation [20.25741148622744]
We propose a generic curriculum learning based optimization framework called CL-DRD. CL-DRD controls the difficulty level of training data produced by the re-ranking (teacher) model. Experiments on three public passage retrieval datasets demonstrate the effectiveness of our proposed framework.
arXiv Detail & Related papers (2022-04-28T17:42:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.