Related papers: Uncertainty-Guided Optimization on Large Language Model Search Trees

Uncertainty-Guided Optimization on Large Language Model Search Trees

URL: http://arxiv.org/abs/2407.03951v2
Date: Wed, 9 Oct 2024 08:16:18 GMT
Title: Uncertainty-Guided Optimization on Large Language Model Search Trees
Authors: Julia Grosse, Ruotian Wu, Ahmad Rashid, Philipp Hennig, Pascal Poupart, Agustinus Kristiadi,
Abstract summary: Tree search algorithms such as greedy and beam search are the standard when it comes to finding sequences of maximum likelihood in the decoding processes of large language models (LLMs) We define prior beliefs over LLMs' transition probabilities and obtain posterior beliefs over the most promising paths in each iteration. Unlike expensive simulation-based non-myopic methods like the Monte Carlo tree search, our method only requires samples from the beliefs.
Score: 42.71167208999792
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Tree search algorithms such as greedy and beam search are the standard when it comes to finding sequences of maximum likelihood in the decoding processes of large language models (LLMs). However, they are myopic since they do not take the complete root-to-leaf path into account. Moreover, they are agnostic to prior knowledge available about the process: For example, it does not consider that the objective being maximized is a probability and thereby has specific properties like being bound in the unit interval. Taking a probabilistic approach, we define prior beliefs over LLMs' transition probabilities and obtain posterior beliefs over the most promising paths in each iteration. These beliefs are useful for defining a sample-based, non-myopic acquisition function that allows for a more data-efficient exploration scheme than standard search algorithms on LLMs. Crucially, unlike expensive simulation-based non-myopic methods like the Monte Carlo tree search, our method only requires samples from the beliefs. Our formulation thus views LLM decoding as Bayesian optimization on trees. We discuss how to select the prior and the acquisition function, and demonstrate in experiments with various LLMs that our method achieves higher efficiency than recent baselines: Our method achieves the same or a higher likelihood while expanding fewer nodes.

Related papers

Visualising Policy-Reward Interplay to Inform Zeroth-Order Preference Optimisation of Large Language Models [0.36326779753373206]
Zeroth-Order (ZO) optimisation, using function evaluations instead of gradients, reduces memory usage but suffers from slow convergence in high-dimensional models. We introduce ZOPrO, a novel ZO algorithm designed for Preference optimisation in LLMs. We demonstrate that our method consistently enhances reward signals while achieving convergence times comparable to first-order methods.
arXiv Detail & Related papers (2025-03-05T12:49:48Z)
Discovering Preference Optimization Algorithms with and for Large Language Models [50.843710797024805]
offline preference optimization is a key method for enhancing and controlling the quality of Large Language Model (LLM) outputs. We perform objective discovery to automatically discover new state-of-the-art preference optimization algorithms without (expert) human intervention. Experiments demonstrate the state-of-the-art performance of DiscoPOP, a novel algorithm that adaptively blends logistic and exponential losses.
arXiv Detail & Related papers (2024-06-12T16:58:41Z)
Latent Logic Tree Extraction for Event Sequence Explanation from LLMs [19.90330712436838]
Modern high-stakes systems, such as healthcare or robotics, often generate vast streaming event sequences. Our goal is to design an efficient, plug-and-play tool to elicit logic tree-based explanations from Large Language Models (LLMs) to provide customized insights into each observed event sequence. In the online setting, our locally built, lightweight model will iteratively extract the most relevant rules from LLMs for each sequence using only a few iterations.
arXiv Detail & Related papers (2024-06-03T09:10:42Z)
Autonomous Tree-search Ability of Large Language Models [58.68735916408101]
Large Language Models have excelled in remarkable reasoning capabilities with advanced prompting techniques. Recent works propose to utilize external programs to define search logic, such that LLMs can perform passive tree search to solve more challenging reasoning tasks. We propose a new concept called autonomous tree-search ability of LLM, which can automatically generate a response containing search trajectories for the correct answer.
arXiv Detail & Related papers (2023-10-14T14:14:38Z)
Alphazero-like Tree-Search can Guide Large Language Model Decoding and Training [37.79247073276239]
Recent works like Tree-of-Thought (ToT) and Reasoning via Planning (RAP) aim to augment the reasoning capabilities of LLMs. We present an AlphaZero-like tree-search learning framework for LLMs (termed TS-LLM) We show how tree-search with a learned value function can guide LLM decoding.
arXiv Detail & Related papers (2023-09-29T12:20:19Z)
Algorithm of Thoughts: Enhancing Exploration of Ideas in Large Language Models [17.059322033670124]
We propose a novel strategy that propels Large Language Models through algorithmic reasoning pathways. Our results suggest that instructing an LLM using an algorithm can lead to performance surpassing that of the algorithm itself.
arXiv Detail & Related papers (2023-08-20T22:36:23Z)
Efficient Non-Parametric Optimizer Search for Diverse Tasks [93.64739408827604]
We present the first efficient scalable and general framework that can directly search on the tasks of interest. Inspired by the innate tree structure of the underlying math expressions, we re-arrange the spaces into a super-tree. We adopt an adaptation of the Monte Carlo method to tree search, equipped with rejection sampling and equivalent- form detection.
arXiv Detail & Related papers (2022-09-27T17:51:31Z)
Local policy search with Bayesian optimization [73.0364959221845]
Reinforcement learning aims to find an optimal policy by interaction with an environment. Policy gradients for local search are often obtained from random perturbations. We develop an algorithm utilizing a probabilistic model of the objective function and its gradient.
arXiv Detail & Related papers (2021-06-22T16:07:02Z)
Adaptive Sampling for Best Policy Identification in Markov Decision Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model. The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.