Faster WIND: Accelerating Iterative Best-of-$N$ Distillation for LLM Alignment
- URL: http://arxiv.org/abs/2410.20727v1
- Date: Mon, 28 Oct 2024 04:47:39 GMT
- Title: Faster WIND: Accelerating Iterative Best-of-$N$ Distillation for LLM Alignment
- Authors: Tong Yang, Jincheng Mei, Hanjun Dai, Zixin Wen, Shicong Cen, Dale Schuurmans, Yuejie Chi, Bo Dai,
- Abstract summary: This paper reveals a unified game-theoretic connection between iterative BOND and self-play alignment.
We establish a novel framework, WIN rate Dominance (WIND), with a series of efficient algorithms for regularized win rate dominance optimization.
- Score: 81.84950252537618
- License:
- Abstract: Recent advances in aligning large language models with human preferences have corroborated the growing importance of best-of-N distillation (BOND). However, the iterative BOND algorithm is prohibitively expensive in practice due to the sample and computation inefficiency. This paper addresses the problem by revealing a unified game-theoretic connection between iterative BOND and self-play alignment, which unifies seemingly disparate algorithmic paradigms. Based on the connection, we establish a novel framework, WIN rate Dominance (WIND), with a series of efficient algorithms for regularized win rate dominance optimization that approximates iterative BOND in the parameter space. We provides provable sample efficiency guarantee for one of the WIND variant with the square loss objective. The experimental results confirm that our algorithm not only accelerates the computation, but also achieves superior sample efficiency compared to existing methods.
Related papers
- A novel algorithm for optimizing bundle adjustment in image sequence alignment [6.322876598831792]
This paper introduces a novel algorithm for optimizing the Bundle Adjustment (BA) model in the context of image sequence alignment for cryo-electron tomography.
Extensive experiments on both synthetic and real-world datasets were conducted to evaluate the algorithm's performance.
arXiv Detail & Related papers (2024-11-10T03:19:33Z) - Provably Faster Algorithms for Bilevel Optimization via Without-Replacement Sampling [96.47086913559289]
gradient-based algorithms are widely used in bilevel optimization.
We introduce a without-replacement sampling based algorithm which achieves a faster convergence rate.
We validate our algorithms over both synthetic and real-world applications.
arXiv Detail & Related papers (2024-11-07T17:05:31Z) - Sample-efficient Bayesian Optimisation Using Known Invariances [56.34916328814857]
We show that vanilla and constrained BO algorithms are inefficient when optimising invariant objectives.
We derive a bound on the maximum information gain of these invariant kernels.
We use our method to design a current drive system for a nuclear fusion reactor, finding a high-performance solution.
arXiv Detail & Related papers (2024-10-22T12:51:46Z) - BOND: Aligning LLMs with Best-of-N Distillation [63.254031574394965]
We propose Best-of-N Distillation (BOND), a novel RLHF algorithm that seeks to emulate Best-of-N but without its significant computational overhead at inference time.
Specifically, BOND is a distribution matching algorithm that forces the distribution of generations from the policy to get closer to the Best-of-N distribution.
We demonstrate the effectiveness of our approach and several design choices through experiments on abstractive summarization and Gemma models.
arXiv Detail & Related papers (2024-07-19T18:38:25Z) - An Algebraically Converging Stochastic Gradient Descent Algorithm for
Global Optimization [14.336473214524663]
A key component in the algorithm is the randomness based on the value of the objective function.
We prove the convergence of the algorithm with an algebra and tuning in the parameter space.
We present several numerical examples to demonstrate the efficiency and robustness of the algorithm.
arXiv Detail & Related papers (2022-04-12T16:27:49Z) - Batch Sequential Adaptive Designs for Global Optimization [5.825138898746968]
Efficient global optimization (EGO) is one of the most popular SAD methods for expensive black-box optimization problems.
For those multiple points EGO methods, the heavy computation and points clustering are the obstacles.
In this work, a novel batch SAD method, named "accelerated EGO", is forwarded by using a refined sampling/importance resampling (SIR) method.
The efficiency of the proposed SAD is validated by nine classic test functions with dimension from 2 to 12.
arXiv Detail & Related papers (2020-10-21T01:11:35Z) - Adaptive Sampling for Best Policy Identification in Markov Decision
Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model.
The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z) - Single-Timescale Stochastic Nonconvex-Concave Optimization for Smooth
Nonlinear TD Learning [145.54544979467872]
We propose two single-timescale single-loop algorithms that require only one data point each step.
Our results are expressed in a form of simultaneous primal and dual side convergence.
arXiv Detail & Related papers (2020-08-23T20:36:49Z) - A High-Performance Object Proposals based on Horizontal High Frequency
Signal [0.0]
We propose a class-independent object proposal algorithm BIHL.
It combines the advantages of window scoring and superpixel merging, which not only improves the localization quality but also speeds up the computational efficiency.
Our method is the method with the highest average repeatability among the methods that achieve good repeatability to various disturbances.
arXiv Detail & Related papers (2020-03-13T05:41:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.