Related papers: Faster WIND: Accelerating Iterative Best-of-$N$ Distillation for LLM Alignment

Faster WIND: Accelerating Iterative Best-of-$N$ Distillation for LLM Alignment

URL: http://arxiv.org/abs/2410.20727v1
Date: Mon, 28 Oct 2024 04:47:39 GMT
Title: Faster WIND: Accelerating Iterative Best-of-$N$ Distillation for LLM Alignment
Authors: Tong Yang, Jincheng Mei, Hanjun Dai, Zixin Wen, Shicong Cen, Dale Schuurmans, Yuejie Chi, Bo Dai,
Abstract summary: This paper reveals a unified game-theoretic connection between iterative BOND and self-play alignment. We establish a novel framework, WIN rate Dominance (WIND), with a series of efficient algorithms for regularized win rate dominance optimization.
Score: 81.84950252537618
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent advances in aligning large language models with human preferences have corroborated the growing importance of best-of-N distillation (BOND). However, the iterative BOND algorithm is prohibitively expensive in practice due to the sample and computation inefficiency. This paper addresses the problem by revealing a unified game-theoretic connection between iterative BOND and self-play alignment, which unifies seemingly disparate algorithmic paradigms. Based on the connection, we establish a novel framework, WIN rate Dominance (WIND), with a series of efficient algorithms for regularized win rate dominance optimization that approximates iterative BOND in the parameter space. We provides provable sample efficiency guarantee for one of the WIND variant with the square loss objective. The experimental results confirm that our algorithm not only accelerates the computation, but also achieves superior sample efficiency compared to existing methods.

Related papers

Direct Preference Optimization with Rating Information: Practical Algorithms and Provable Gains [67.71020482405343]
We study how to design algorithms that can leverage additional information in the form of rating gap.<n>We present new algorithms that can achieve faster statistical rates than DPO in presence of accurate rating gap information.
arXiv Detail & Related papers (2026-01-31T08:38:21Z)
Hard Thresholding Pursuit Algorithms for Least Absolute Deviations Problem [14.123089301194623]
Least absolute deviations (LAD) is a statistical optimality criterion widely utilized in scenarios where a minority of measurements are contaminated by outliers of arbitrary magnitudes.<n>In this paper, we delve into the robustness of the variant of adaptive iterative hard thresholding to outliers, known as graded fast hard thresholding pursuit (GFHTP$_$) algorithm.<n> Numerical experiments reveal that the GFHTP$_$ algorithm consistently outperforms competing algorithms in terms of both robustness and computational efficiency.
arXiv Detail & Related papers (2026-01-10T12:55:59Z)
Towards minimax optimal algorithms for Active Simple Hypothesis Testing [0.0]
We study the Active Simple Hypothesis Testing (ASHT) problem, a simpler variant of the Fixed Budget Best Arm Identification problem. We provide novel game theoretic formulation of the upper bounds of the ASHT problem. We propose an approximately optimal algorithm that is computationally tractable compared to prior work.
arXiv Detail & Related papers (2025-04-26T20:03:53Z)
Enhanced Derivative-Free Optimization Using Adaptive Correlation-Induced Finite Difference Estimators [6.054123928890574]
We develop an algorithm designed to enhance DFO in terms of both gradient estimation efficiency and sample efficiency. We establish the consistency of our proposed algorithm and demonstrate that, despite using a batch of samples per iteration, it achieves the same convergence rate as the KW and SPSA methods.
arXiv Detail & Related papers (2025-02-28T08:05:54Z)
Towards Optimal Multi-draft Speculative Decoding [102.67837141152232]
Multi-Draft Speculative Decoding (MDSD) is a recent approach where, when generating each token, a small draft model generates multiple drafts. This paper discusses the dual of the optimal transport problem, providing a way to efficiently compute the optimal acceptance rate.
arXiv Detail & Related papers (2025-02-26T03:22:44Z)
Fast sparse optimization via adaptive shrinkage [0.6226609932118122]
We develop a proximal method, based on logarithmic regularization, which turns out to be an iterative shrinkage-thresholding algorithm. This adaptivity substantially enhances the trajectory of the algorithm, in a way that yields faster convergence. We validate its fast convergence via numerical experiments and we discuss the performance with respect to state-of-the-art algorithms.
arXiv Detail & Related papers (2025-01-21T15:58:21Z)
A novel algorithm for optimizing bundle adjustment in image sequence alignment [6.322876598831792]
This paper introduces a novel algorithm for optimizing the Bundle Adjustment (BA) model in the context of image sequence alignment for cryo-electron tomography. Extensive experiments on both synthetic and real-world datasets were conducted to evaluate the algorithm's performance.
arXiv Detail & Related papers (2024-11-10T03:19:33Z)
Provably Faster Algorithms for Bilevel Optimization via Without-Replacement Sampling [96.47086913559289]
gradient-based algorithms are widely used in bilevel optimization. We introduce a without-replacement sampling based algorithm which achieves a faster convergence rate. We validate our algorithms over both synthetic and real-world applications.
arXiv Detail & Related papers (2024-11-07T17:05:31Z)
Sample-efficient Bayesian Optimisation Using Known Invariances [56.34916328814857]
We show that vanilla and constrained BO algorithms are inefficient when optimising invariant objectives. We derive a bound on the maximum information gain of these invariant kernels. We use our method to design a current drive system for a nuclear fusion reactor, finding a high-performance solution.
arXiv Detail & Related papers (2024-10-22T12:51:46Z)
BOND: Aligning LLMs with Best-of-N Distillation [63.254031574394965]
We propose Best-of-N Distillation (BOND), a novel RLHF algorithm that seeks to emulate Best-of-N but without its significant computational overhead at inference time. Specifically, BOND is a distribution matching algorithm that forces the distribution of generations from the policy to get closer to the Best-of-N distribution. We demonstrate the effectiveness of our approach and several design choices through experiments on abstractive summarization and Gemma models.
arXiv Detail & Related papers (2024-07-19T18:38:25Z)
An Algebraically Converging Stochastic Gradient Descent Algorithm for Global Optimization [14.336473214524663]
A key component in the algorithm is the randomness based on the value of the objective function. We prove the convergence of the algorithm with an algebra and tuning in the parameter space. We present several numerical examples to demonstrate the efficiency and robustness of the algorithm.
arXiv Detail & Related papers (2022-04-12T16:27:49Z)
Batch Sequential Adaptive Designs for Global Optimization [5.825138898746968]
Efficient global optimization (EGO) is one of the most popular SAD methods for expensive black-box optimization problems. For those multiple points EGO methods, the heavy computation and points clustering are the obstacles. In this work, a novel batch SAD method, named "accelerated EGO", is forwarded by using a refined sampling/importance resampling (SIR) method. The efficiency of the proposed SAD is validated by nine classic test functions with dimension from 2 to 12.
arXiv Detail & Related papers (2020-10-21T01:11:35Z)
Adaptive Sampling for Best Policy Identification in Markov Decision Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model. The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z)
Single-Timescale Stochastic Nonconvex-Concave Optimization for Smooth Nonlinear TD Learning [145.54544979467872]
We propose two single-timescale single-loop algorithms that require only one data point each step. Our results are expressed in a form of simultaneous primal and dual side convergence.
arXiv Detail & Related papers (2020-08-23T20:36:49Z)
A High-Performance Object Proposals based on Horizontal High Frequency Signal [0.0]
We propose a class-independent object proposal algorithm BIHL. It combines the advantages of window scoring and superpixel merging, which not only improves the localization quality but also speeds up the computational efficiency. Our method is the method with the highest average repeatability among the methods that achieve good repeatability to various disturbances.
arXiv Detail & Related papers (2020-03-13T05:41:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.