Related papers: TreeBoN: Enhancing Inference-Time Alignment with Speculative Tree-Search and Best-of-N Sampling

TreeBoN: Enhancing Inference-Time Alignment with Speculative Tree-Search and Best-of-N Sampling

URL: http://arxiv.org/abs/2410.16033v3
Date: Wed, 30 Oct 2024 00:02:08 GMT
Title: TreeBoN: Enhancing Inference-Time Alignment with Speculative Tree-Search and Best-of-N Sampling
Authors: Jiahao Qiu, Yifu Lu, Yifan Zeng, Jiacheng Guo, Jiayi Geng, Huazheng Wang, Kaixuan Huang, Yue Wu, Mengdi Wang,
Abstract summary: Inference-time alignment enhances the performance of large language models without requiring additional training or fine-tuning. Best-of-N (BoN) sampling, as a simple yet powerful approach, generates multiple responses and selects the best one. We propose TreeBoN, a novel framework that integrates a speculative tree-search strategy into Best-of-N (BoN) Sampling.
Score: 39.019269570224004
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Inference-time alignment enhances the performance of large language models without requiring additional training or fine-tuning but presents challenges due to balancing computational efficiency with high-quality output. Best-of-N (BoN) sampling, as a simple yet powerful approach, generates multiple responses and selects the best one, achieving improved performance but with a high computational cost. We propose TreeBoN, a novel framework that integrates a speculative tree-search strategy into Best-of-N (BoN) Sampling. TreeBoN maintains a set of parent nodes, iteratively branching and pruning low-quality responses, thereby reducing computational overhead while maintaining high output quality. Our approach also leverages token-level rewards from Direct Preference Optimization (DPO) to guide tree expansion and prune low-quality paths. We evaluate TreeBoN using AlpacaFarm, HH-RLHF, UltraFeedback, GSM8K, and TutorEval datasets, demonstrating consistent improvements. Specifically, TreeBoN achieves the highest win rate of 65% on TutorEval and around 60% win rates across other different datasets, outperforming standard BoN with the same computational cost and showcasing its scalability and alignment efficacy.

Related papers

Sampling-Efficient Test-Time Scaling: Self-Estimating the Best-of-N Sampling in Early Decoding [64.2888389315149]
Test-time scaling improves large language model performance by adding extra compute during decoding. Best-of-N sampling serves as a common scaling technique, broadening the search space for finding better solutions. We propose Self-Truncation Best-of-N (ST-BoN), a novel decoding method that avoids fully generating all samplings.
arXiv Detail & Related papers (2025-03-03T11:21:01Z)
Can a Single Tree Outperform an Entire Forest? [5.448070998907116]
The prevailing mindset is that a single decision tree underperforms classic random forests in testing accuracy. This study challenges such a mindset by significantly improving the testing accuracy of an oblique regression tree. Our approach reformulates tree training as a differentiable unconstrained optimization task.
arXiv Detail & Related papers (2024-11-26T00:18:18Z)
Learning Deep Tree-based Retriever for Efficient Recommendation: Theory and Method [76.31185707649227]
We propose a Deep Tree-based Retriever (DTR) for efficient recommendation. DTR frames the training task as a softmax-based multi-class classification over tree nodes at the same level. To mitigate the suboptimality induced by the labeling of non-leaf nodes, we propose a rectification method for the loss function.
arXiv Detail & Related papers (2024-08-21T05:09:53Z)
BOND: Aligning LLMs with Best-of-N Distillation [63.254031574394965]
We propose Best-of-N Distillation (BOND), a novel RLHF algorithm that seeks to emulate Best-of-N but without its significant computational overhead at inference time. Specifically, BOND is a distribution matching algorithm that forces the distribution of generations from the policy to get closer to the Best-of-N distribution. We demonstrate the effectiveness of our approach and several design choices through experiments on abstractive summarization and Gemma models.
arXiv Detail & Related papers (2024-07-19T18:38:25Z)
Variational Best-of-N Alignment [57.617866305771756]
Best-of-N (BoN) is a popular and effective algorithm for aligning language models to human preferences. We propose to fine-tune the language model to mimic what BoN does during inference. Our approach is analogous to mean-field variational inference and, thus, we term it variational BoN (vBoN)
arXiv Detail & Related papers (2024-07-08T15:59:44Z)
Adaptive Split Balancing for Optimal Random Forest [8.916614661563893]
We propose a new random forest algorithm that constructs the trees using a novel adaptive split-balancing method. Our method achieves optimality in simple, smooth scenarios while adaptively learning the tree structure from the data.
arXiv Detail & Related papers (2024-02-17T09:10:40Z)
Reinforcement Learning for Node Selection in Branch-and-Bound [52.2648997215667]
Current state-of-the-art selectors utilize either hand-crafted ensembles that automatically switch between naive sub-node selectors, or learned node selectors that rely on individual node data. We propose a novel simulation technique that uses reinforcement learning (RL) while considering the entire tree state, rather than just isolated nodes.
arXiv Detail & Related papers (2023-09-29T19:55:56Z)
ForestPrune: Compact Depth-Controlled Tree Ensembles [7.538482310185135]
We present ForestPrune, a novel framework to post-process tree ensembles by pruning depth layers from individual trees. We develop a specialized optimization algorithm to efficiently obtain high-quality solutions to problems under ForestPrune. Our experiments demonstrate that ForestPrune produces parsimonious models that outperform models extracted by existing post-processing algorithms.
arXiv Detail & Related papers (2022-05-31T22:04:18Z)
bsnsing: A decision tree induction method based on recursive optimal boolean rule composition [2.28438857884398]
This paper proposes a new mixed-integer programming (MIP) formulation to optimize split rule selection in the decision tree induction process. It develops an efficient search solver that is able to solve practical instances faster than commercial solvers.
arXiv Detail & Related papers (2022-05-30T17:13:57Z)
Improved Branch and Bound for Neural Network Verification via Lagrangian Decomposition [161.09660864941603]
We improve the scalability of Branch and Bound (BaB) algorithms for formally proving input-output properties of neural networks. We present a novel activation-based branching strategy and a BaB framework, named Branch and Dual Network Bound (BaDNB) BaDNB outperforms previous complete verification systems by a large margin, cutting average verification times by factors up to 50 on adversarial properties.
arXiv Detail & Related papers (2021-04-14T09:22:42Z)
MurTree: Optimal Classification Trees via Dynamic Programming and Search [61.817059565926336]
We present a novel algorithm for learning optimal classification trees based on dynamic programming and search. Our approach uses only a fraction of the time required by the state-of-the-art and can handle datasets with tens of thousands of instances.
arXiv Detail & Related papers (2020-07-24T17:06:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.