Related papers: SimPER: A Minimalist Approach to Preference Alignment without Hyperparameters

SimPER: A Minimalist Approach to Preference Alignment without Hyperparameters

URL: http://arxiv.org/abs/2502.00883v4
Date: Thu, 20 Feb 2025 15:26:44 GMT
Title: SimPER: A Minimalist Approach to Preference Alignment without Hyperparameters
Authors: Teng Xiao, Yige Yuan, Zhengyu Chen, Mingxiao Li, Shangsong Liang, Zhaochun Ren, Vasant G Honavar,
Abstract summary: SimPER is an effective preference optimization algorithm for language model alignment.<n>SimPER is easy to implement and eliminates the need for expensive hyper parameter tuning and a reference model.<n>SimPER consistently and significantly outperforms existing approaches.
Score: 40.64474084442168
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Existing preference optimization objectives for language model alignment require additional hyperparameters that must be extensively tuned to achieve optimal performance, increasing both the complexity and time required for fine-tuning large language models. In this paper, we propose a simple yet effective hyperparameter-free preference optimization algorithm for alignment. We observe that promising performance can be achieved simply by optimizing inverse perplexity, which is calculated as the inverse of the exponentiated average log-likelihood of the chosen and rejected responses in the preference dataset. The resulting simple learning objective, SimPER, is easy to implement and eliminates the need for expensive hyperparameter tuning and a reference model, making it both computationally and memory efficient. Extensive experiments on widely used real-world benchmarks, including MT-Bench, AlpacaEval 2, and 10 key benchmarks of the Open LLM Leaderboard with 5 base models, demonstrate that SimPER consistently and significantly outperforms existing approaches-even without any hyperparameters or a reference model . For example, despite its simplicity, SimPER outperforms state-of-the-art methods by up to 5.7 points on AlpacaEval 2 and achieves the highest average ranking across 10 benchmarks on the Open LLM Leaderboard. The source code for SimPER is publicly available at: https://github.com/tengxiao1/SimPER.

Related papers

ESSA: Evolutionary Strategies for Scalable Alignment [2.589791058467358]
This paper introduces ESSA, a new framework that uses Evolutionary Strategies (ES) to efficiently align Large Language Models (LLMs)<n>ES is well-suited for LLM alignment due to its favorable properties, such as high parallelizability, memory efficiency, robustness to sparse rewards, and fewer data samples required for convergence.<n>Our findings establish ES as a promising and scalable alternative to gradient-based alignment, paving the way for efficient post-training of large language models.
arXiv Detail & Related papers (2025-07-06T16:23:07Z)
Optuna vs Code Llama: Are LLMs a New Paradigm for Hyperparameter Tuning? [42.362388367152256]
Large language models (LLMs) are used to fine-tune a parameter-efficient version of Code Llama using LoRA. Our method achieves competitive or superior results in terms of Root Mean Square Error (RMSE) while significantly reducing computational overhead.
arXiv Detail & Related papers (2025-04-08T13:15:47Z)
Predictable Scale: Part I, Step Law -- Optimal Hyperparameter Scaling Law in Large Language Model Pretraining [59.369484219304866]
In this study, we conduct an unprecedented empirical investigationtext- training over 3,700 Large Language Models (LLMs) from scratch across 100 trillion tokens.<n>We empirically observe that, under fixed model size ($N$) and dataset size ($D$), the hyperparameter landscape exhibits convexity with a broad optimum.<n>Building on this insight, we formally define and empirically validate the Step Law: The optimal learning rate follows a power-law relationship with $N$ and $D$, while the optimal batch size is primarily influenced by $D$ and remains largely invariant to $N$.
arXiv Detail & Related papers (2025-03-06T18:58:29Z)
Align-Pro: A Principled Approach to Prompt Optimization for LLM Alignment [40.71270945505082]
Large language models (LLMs) are increasingly integrated into various societal and decision-making processes.<n>Traditional methods, such as reinforcement learning from human feedback (RLHF), achieve alignment by fine-tuning model parameters.<n>In contrast, prompt optimization is a viable alternative to RLHF for LLM alignment.
arXiv Detail & Related papers (2025-01-07T03:14:39Z)
Step-level Value Preference Optimization for Mathematical Reasoning [6.318873143509028]
We introduce a novel algorithm called Step-level Value Preference Optimization (SVPO) Our method achieves state-of-the-art performance on both in-domain and out-of-domain mathematical reasoning benchmarks.
arXiv Detail & Related papers (2024-06-16T09:06:17Z)
Self-Augmented Preference Optimization: Off-Policy Paradigms for Language Model Alignment [104.18002641195442]
We introduce Self-Augmented Preference Optimization (SAPO), an effective and scalable training paradigm that does not require existing paired data. Building on the self-play concept, which autonomously generates negative responses, we further incorporate an off-policy learning pipeline to enhance data exploration and exploitation.
arXiv Detail & Related papers (2024-05-31T14:21:04Z)
Self-Exploring Language Models: Active Preference Elicitation for Online Alignment [88.56809269990625]
We propose a bilevel objective optimistically biased towards potentially high-reward responses to actively explore out-of-distribution regions. Our experimental results demonstrate that when fine-tuned on Zephyr-7B-SFT and Llama-3-8B-Instruct models, Self-Exploring Language Models (SELM) significantly boosts the performance on instruction-following benchmarks.
arXiv Detail & Related papers (2024-05-29T17:59:07Z)
Using Large Language Models for Hyperparameter Optimization [29.395931874196805]
This paper explores the use of foundational large language models (LLMs) in hyper parameter optimization (HPO) Our empirical evaluations on standard benchmarks reveal that within constrained search budgets, LLMs can match or outperform traditional HPO methods.
arXiv Detail & Related papers (2023-12-07T18:46:50Z)
AdaLomo: Low-memory Optimization with Adaptive Learning Rate [59.64965955386855]
We introduce low-memory optimization with adaptive learning rate (AdaLomo) for large language models. AdaLomo results on par with AdamW, while significantly reducing memory requirements, thereby lowering the hardware barrier to training large language models.
arXiv Detail & Related papers (2023-10-16T09:04:28Z)
Learning Regions of Interest for Bayesian Optimization with Adaptive Level-Set Estimation [84.0621253654014]
We propose a framework, called BALLET, which adaptively filters for a high-confidence region of interest. We show theoretically that BALLET can efficiently shrink the search space, and can exhibit a tighter regret bound than standard BO.
arXiv Detail & Related papers (2023-07-25T09:45:47Z)
Parameter-efficient Tuning of Large-scale Multimodal Foundation Model [68.24510810095802]
We propose A graceful prompt framework for cross-modal transfer (Aurora) to overcome these challenges. Considering the redundancy in existing architectures, we first utilize the mode approximation to generate 0.1M trainable parameters to implement the multimodal prompt tuning. A thorough evaluation on six cross-modal benchmarks shows that it not only outperforms the state-of-the-art but even outperforms the full fine-tuning approach.
arXiv Detail & Related papers (2023-05-15T06:40:56Z)
A Comparative study of Hyper-Parameter Optimization Tools [2.6097538974670935]
We compare the performance of four python libraries, namely Optuna, Hyperopt, Optunity, and sequential model algorithm configuration (SMAC) We found that Optuna has better performance for CASH problem and NeurIPS black-box optimization challenge.
arXiv Detail & Related papers (2022-01-17T14:49:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.