Faster WIND: Accelerating Iterative Best-of-$N$ Distillation for LLM Alignment
- URL: http://arxiv.org/abs/2410.20727v1
- Date: Mon, 28 Oct 2024 04:47:39 GMT
- Title: Faster WIND: Accelerating Iterative Best-of-$N$ Distillation for LLM Alignment
- Authors: Tong Yang, Jincheng Mei, Hanjun Dai, Zixin Wen, Shicong Cen, Dale Schuurmans, Yuejie Chi, Bo Dai,
- Abstract summary: This paper reveals a unified game-theoretic connection between iterative BOND and self-play alignment.
We establish a novel framework, WIN rate Dominance (WIND), with a series of efficient algorithms for regularized win rate dominance optimization.
- Score: 81.84950252537618
- License:
- Abstract: Recent advances in aligning large language models with human preferences have corroborated the growing importance of best-of-N distillation (BOND). However, the iterative BOND algorithm is prohibitively expensive in practice due to the sample and computation inefficiency. This paper addresses the problem by revealing a unified game-theoretic connection between iterative BOND and self-play alignment, which unifies seemingly disparate algorithmic paradigms. Based on the connection, we establish a novel framework, WIN rate Dominance (WIND), with a series of efficient algorithms for regularized win rate dominance optimization that approximates iterative BOND in the parameter space. We provides provable sample efficiency guarantee for one of the WIND variant with the square loss objective. The experimental results confirm that our algorithm not only accelerates the computation, but also achieves superior sample efficiency compared to existing methods.
Related papers
- Sample-efficient Bayesian Optimisation Using Known Invariances [56.34916328814857]
We show that vanilla and constrained BO algorithms are inefficient when optimising invariant objectives.
We derive a bound on the maximum information gain of these invariant kernels.
We use our method to design a current drive system for a nuclear fusion reactor, finding a high-performance solution.
arXiv Detail & Related papers (2024-10-22T12:51:46Z) - BOND: Aligning LLMs with Best-of-N Distillation [63.254031574394965]
We propose Best-of-N Distillation (BOND), a novel RLHF algorithm that seeks to emulate Best-of-N but without its significant computational overhead at inference time.
Specifically, BOND is a distribution matching algorithm that forces the distribution of generations from the policy to get closer to the Best-of-N distribution.
We demonstrate the effectiveness of our approach and several design choices through experiments on abstractive summarization and Gemma models.
arXiv Detail & Related papers (2024-07-19T18:38:25Z) - An Algebraically Converging Stochastic Gradient Descent Algorithm for
Global Optimization [14.336473214524663]
A key component in the algorithm is the randomness based on the value of the objective function.
We prove the convergence of the algorithm with an algebra and tuning in the parameter space.
We present several numerical examples to demonstrate the efficiency and robustness of the algorithm.
arXiv Detail & Related papers (2022-04-12T16:27:49Z) - Efficient Optimal Transport Algorithm by Accelerated Gradient descent [20.614477547939845]
We propose a novel algorithm to further improve the efficiency and accuracy based on Nesterov's smoothing technique.
The proposed method achieves faster convergence and better accuracy with the same parameter.
arXiv Detail & Related papers (2021-04-12T20:23:29Z) - Batch Sequential Adaptive Designs for Global Optimization [5.825138898746968]
Efficient global optimization (EGO) is one of the most popular SAD methods for expensive black-box optimization problems.
For those multiple points EGO methods, the heavy computation and points clustering are the obstacles.
In this work, a novel batch SAD method, named "accelerated EGO", is forwarded by using a refined sampling/importance resampling (SIR) method.
The efficiency of the proposed SAD is validated by nine classic test functions with dimension from 2 to 12.
arXiv Detail & Related papers (2020-10-21T01:11:35Z) - Bilevel Optimization: Convergence Analysis and Enhanced Design [63.64636047748605]
Bilevel optimization is a tool for many machine learning problems.
We propose a novel stoc-efficientgradient estimator named stoc-BiO.
arXiv Detail & Related papers (2020-10-15T18:09:48Z) - Adaptive Sampling for Best Policy Identification in Markov Decision
Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model.
The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z) - Single-Timescale Stochastic Nonconvex-Concave Optimization for Smooth
Nonlinear TD Learning [145.54544979467872]
We propose two single-timescale single-loop algorithms that require only one data point each step.
Our results are expressed in a form of simultaneous primal and dual side convergence.
arXiv Detail & Related papers (2020-08-23T20:36:49Z) - A High-Performance Object Proposals based on Horizontal High Frequency
Signal [0.0]
We propose a class-independent object proposal algorithm BIHL.
It combines the advantages of window scoring and superpixel merging, which not only improves the localization quality but also speeds up the computational efficiency.
Our method is the method with the highest average repeatability among the methods that achieve good repeatability to various disturbances.
arXiv Detail & Related papers (2020-03-13T05:41:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.