Improve LLM-as-a-Judge Ability as a General Ability
- URL: http://arxiv.org/abs/2502.11689v1
- Date: Mon, 17 Feb 2025 11:28:43 GMT
- Title: Improve LLM-as-a-Judge Ability as a General Ability
- Authors: Jiachen Yu, Shaoning Sun, Xiaohui Hu, Jiaxu Yan, Kaidong Yu, Xuelong Li,
- Abstract summary: Large language models (LLMs) can evaluate responses across diverse scenarios, providing accurate preference signals.
Recent studies have raised many methods to train LLM as generative judges, but most of them are data consuming or lack accuracy.
In this work, we implement a two-stage training approach, comprising supervised fine-tuning (SFT) warm-up and direct preference optimization (DPO) enhancement.
- Score: 40.2210529561692
- License:
- Abstract: LLM-as-a-Judge leverages the generative and reasoning capabilities of large language models (LLMs) to evaluate LLM responses across diverse scenarios, providing accurate preference signals. This approach plays a vital role in aligning LLMs with human values, ensuring ethical and reliable AI outputs that align with societal norms. Recent studies have raised many methods to train LLM as generative judges, but most of them are data consuming or lack accuracy, and only focus on LLM's judge ability. In this work, we regard judge ability as a general ability of LLM and implement a two-stage training approach, comprising supervised fine-tuning (SFT) warm-up and direct preference optimization (DPO) enhancement, to achieve judge style adaptation and improve judgment accuracy. Additionally, we introduce an efficient data synthesis method to generate judgmental content. Experimental results demonstrate that our approach, utilizing only about 2% to 40% of the data required by other methods, achieves SOTA performance on RewardBench. Furthermore, our training method enhances the general capabilities of the model by constructing complicated judge task, and the judge signals provided by our model have significantly enhanced the downstream DPO training performance of our internal models in our test to optimize policy model with Judge Model. We also open-source our model weights and training data to facilitate further research.
Related papers
- LLM-Powered Preference Elicitation in Combinatorial Assignment [17.367432304040662]
We study the potential of large language models (LLMs) as proxies for humans to simplify preference elicitation (PE) in assignment.
We propose a framework for LLM proxies that can work in tandem with SOTA ML-powered preference elicitation schemes.
We experimentally evaluate the efficiency of LLM proxies against human queries in the well-studied course allocation domain.
arXiv Detail & Related papers (2025-02-14T17:12:20Z) - Training an LLM-as-a-Judge Model: Pipeline, Insights, and Practical Lessons [9.954960702259918]
This paper introduces Themis, a fine-tuned large language model (LLMs) judge that delivers context-aware evaluations.
We provide a comprehensive overview of the development pipeline for Themis, highlighting its scenario-dependent evaluation prompts.
We introduce two human-labeled benchmarks for meta-evaluation, demonstrating that Themis can achieve high alignment with human preferences in an economical manner.
arXiv Detail & Related papers (2025-02-05T08:35:55Z) - A Little Help Goes a Long Way: Efficient LLM Training by Leveraging Small LMs [74.35290684163718]
A primary challenge in large language model (LLM) development is their onerous pre-training cost.
This paper explores a promising paradigm to improve LLM pre-training efficiency and quality by leveraging a small language model (SLM)
arXiv Detail & Related papers (2024-10-24T14:31:52Z) - Weak-to-Strong Extrapolation Expedites Alignment [135.12769233630362]
We propose a method called ExPO to boost models' alignment with human preference.
We demonstrate that ExPO consistently improves off-the-shelf DPO/RLHF models.
We shed light on the essence of ExPO amplifying the reward signal learned during alignment training.
arXiv Detail & Related papers (2024-04-25T17:39:50Z) - Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models [52.98743860365194]
We propose a new fine-tuning method called Self-Play fIne-tuNing (SPIN)
At the heart of SPIN lies a self-play mechanism, where the LLM refines its capability by playing against instances of itself.
This sheds light on the promise of self-play, enabling the achievement of human-level performance in LLMs without the need for expert opponents.
arXiv Detail & Related papers (2024-01-02T18:53:13Z) - SALMON: Self-Alignment with Instructable Reward Models [80.83323636730341]
This paper presents a novel approach, namely SALMON, to align base language models with minimal human supervision.
We develop an AI assistant named Dromedary-2 with only 6 exemplars for in-context learning and 31 human-defined principles.
arXiv Detail & Related papers (2023-10-09T17:56:53Z) - On Learning to Summarize with Large Language Models as References [101.79795027550959]
Large language models (LLMs) are favored by human annotators over the original reference summaries in commonly used summarization datasets.
We study an LLM-as-reference learning setting for smaller text summarization models to investigate whether their performance can be substantially improved.
arXiv Detail & Related papers (2023-05-23T16:56:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.