Related papers: Cascade Reward Sampling for Efficient Decoding-Time Alignment

Cascade Reward Sampling for Efficient Decoding-Time Alignment

URL: http://arxiv.org/abs/2406.16306v1
Date: Mon, 24 Jun 2024 04:08:35 GMT
Title: Cascade Reward Sampling for Efficient Decoding-Time Alignment
Authors: Bolian Li, Yifan Wang, Ananth Grama, Ruqi Zhang,
Abstract summary: We proposeCARDS to generate high-reward and high-likelihood text. CARDS guarantees the generation of high-reward and high-likelihood text with significantly low costs. experiments demonstrate substantial gains in both generation efficiency and alignment ratings.
Score: 18.537156067913283
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: Aligning large language models (LLMs) with human preferences is critical for their deployment. Recently, decoding-time alignment has emerged as an effective plug-and-play technique that requires no fine-tuning of model parameters. However, generating text that achieves both high reward and high likelihood remains a significant challenge. Existing methods often fail to generate high-reward text or incur substantial computational costs. In this paper, we propose Cascade Reward Sampling (CARDS) to address both issues, guaranteeing the generation of high-reward and high-likelihood text with significantly low costs. Based on our analysis of reward models (RMs) on incomplete text and our observation that high-reward prefixes induce high-reward complete text, we use rejection sampling to iteratively generate small semantic segments to form such prefixes. The segment length is dynamically determined by the predictive uncertainty of LLMs. This strategy guarantees desirable prefixes for subsequent generations and significantly reduces wasteful token re-generations and the number of reward model scoring. Our experiments demonstrate substantial gains in both generation efficiency and alignment ratings compared to the baselines, achieving five times faster text generation and 99\% win-ties in GPT-4/Claude-3 helpfulness evaluation.

Related papers

Review, Refine, Repeat: Understanding Iterative Decoding of AI Agents with Dynamic Evaluation and Selection [71.92083784393418]
Inference-time methods such as Best-of-N (BON) sampling offer a simple yet effective alternative to improve performance. We propose Iterative Agent Decoding (IAD) which combines iterative refinement with dynamic candidate evaluation and selection guided by a verifier.
arXiv Detail & Related papers (2025-04-02T17:40:47Z)
Reward-Guided Speculative Decoding for Efficient LLM Reasoning [80.55186052123196]
We introduce Reward-Guided Speculative Decoding (RSD), a novel framework aimed at improving the efficiency of inference in large language models (LLMs) RSD incorporates a controlled bias to prioritize high-reward outputs, in contrast to existing speculative decoding methods that enforce strict unbiasedness. RSD delivers significant efficiency gains against decoding with the target model only, while achieving significant better accuracy than parallel decoding method on average.
arXiv Detail & Related papers (2025-01-31T17:19:57Z)
Threshold Selection for Iterative Decoding of $(v,w)$-regular Binary Codes [84.0257274213152]
Iterative bit flipping decoders are an efficient choice for sparse $(v,w)$-regular codes. We propose concrete criteria for threshold determination, backed by a closed form model.
arXiv Detail & Related papers (2025-01-23T17:38:22Z)
Constrained Decoding with Speculative Lookaheads [13.085794785286305]
We propose constrained decoding with speculative lookaheads (CSL) CSL is motivated by the recently proposed idea of speculative decoding that uses a much smaller draft LLM for generation and a larger target LLM for verification. We evaluate CDSL in two constraint decoding tasks with three LLM families and achieve 2.2x to 12.15x speedup over CDLH without significant performance reduction.
arXiv Detail & Related papers (2024-12-09T22:29:57Z)
A Theoretical Perspective for Speculative Decoding Algorithm [60.79447486066416]
One effective way to accelerate inference is emphSpeculative Decoding, which employs a small model to sample a sequence of draft tokens and a large model to validate. This paper tackles this gap by conceptualizing the decoding problem via markov chain abstraction and studying the key properties, emphoutput quality and inference acceleration, from a theoretical perspective.
arXiv Detail & Related papers (2024-10-30T01:53:04Z)
Faster WIND: Accelerating Iterative Best-of-$N$ Distillation for LLM Alignment [81.84950252537618]
This paper reveals a unified game-theoretic connection between iterative BOND and self-play alignment. We establish a novel framework, WIN rate Dominance (WIND), with a series of efficient algorithms for regularized win rate dominance optimization.
arXiv Detail & Related papers (2024-10-28T04:47:39Z)
WARP-LCA: Efficient Convolutional Sparse Coding with Locally Competitive Algorithm [1.4186974630564675]
We show that WARP-LCA converges faster by orders of magnitude and reaches better minima compared to conventional LCA. We demonstrate that WARP-LCA exhibits superior properties in terms of reconstruction and denoising quality as well as robustness when applied in deep recognition pipelines.
arXiv Detail & Related papers (2024-10-24T14:47:36Z)
Path-Consistency: Prefix Enhancement for Efficient Inference in LLM [3.309813585671485]
textitpath-consistency mitigates both the errors and redundancies from random or less useful sampling in self-consistency. textitpath-consistency achieves significant acceleration in inference latency ranging from $7.8%$ to $40.5%$.
arXiv Detail & Related papers (2024-08-25T01:45:53Z)
Hierarchical Skip Decoding for Efficient Autoregressive Text Generation [9.16858904192541]
We propose a novel decoding strategy named Hierarchical Skip Decoding (HSD) for efficient autoregressive text generation. With almost half of the layers skipped, HSD can sustain 90% of the text quality compared to vanilla autoregressive decoding.
arXiv Detail & Related papers (2024-03-22T02:44:05Z)
Quantum Algorithm Exploration using Application-Oriented Performance Benchmarks [0.0]
The QED-C suite of Application-Oriented Benchmarks provides the ability to gauge performance characteristics of quantum computers. We investigate challenges in broadening the relevance of this benchmarking methodology to applications of greater complexity.
arXiv Detail & Related papers (2024-02-14T06:55:50Z)
Reward-Augmented Decoding: Efficient Controlled Text Generation With a Unidirectional Reward Model [47.722856876213946]
Reward-Augmented Decoding (RAD) is a text generation procedure that uses a small unidirectional reward model to encourage a language model to generate text that has certain properties. By using a unidirectional reward model, RAD can cache activations from prior generation steps to decrease computational overhead.
arXiv Detail & Related papers (2023-10-14T07:19:47Z)
Quality-Aware Translation Models: Efficient Generation and Quality Estimation in a Single Model [77.19693792957614]
We propose to make neural machine translation (NMT) models quality-aware by training them to estimate the quality of their own output. We obtain quality gains similar or even superior to quality reranking approaches, but with the efficiency of single pass decoding.
arXiv Detail & Related papers (2023-10-10T15:33:51Z)
Contrastive Decoding Improves Reasoning in Large Language Models [55.16503283583076]
We show that Contrastive Decoding achieves large out-of-the-box improvements over greedy decoding on a variety of reasoning tasks. We show that Contrastive Decoding leads LLaMA-65B to outperform LLaMA 2, GPT-3.5 and PaLM 2-L on the HellaSwag commonsense reasoning benchmark.
arXiv Detail & Related papers (2023-09-17T00:29:32Z)
Mitigating the Learning Bias towards Repetition by Self-Contrastive Training for Open-Ended Generation [92.42032403795879]
We show that pretrained language models (LMs) such as GPT2 still tend to generate repetitive texts. We attribute their overestimation of token-level repetition probabilities to the learning bias. We find that LMs use longer-range dependencies to predict repetitive tokens than non-repetitive ones, which may be the cause of sentence-level repetition loops.
arXiv Detail & Related papers (2023-07-04T07:53:55Z)
KEST: Kernel Distance Based Efficient Self-Training for Improving Controllable Text Generation [24.47531522553703]
We propose KEST, a novel and efficient self-training framework to handle these problems. KEST utilizes a kernel-based loss, rather than standard cross entropy, to learn from the soft pseudo text produced by a shared non-autoregressive generator. Experiments on three controllable generation tasks demonstrate that KEST significantly improves control accuracy while maintaining comparable text fluency and generation diversity against several strong baselines.
arXiv Detail & Related papers (2023-06-17T19:40:57Z)
Fine-tuning Strategies for Faster Inference using Speech Self-Supervised Models: A Comparative Study [25.58608455210458]
Self-supervised learning (SSL) has allowed substantial progress in Automatic Speech Recognition (ASR) performance in low-resource settings. This article explores different approaches that may be deployed during the fine-tuning to reduce the computations needed in the SSL encoder.
arXiv Detail & Related papers (2023-03-12T19:52:34Z)
Progressive Generation of Long Text with Pretrained Language Models [83.62523163717448]
Large-scale language models (LMs) pretrained on massive corpora of text, such as GPT-2, are powerful open-domain text generators. It is still challenging for such models to generate coherent long passages of text, especially when the models are fine-tuned to the target domain on a small corpus. We propose a simple but effective method of generating text in a progressive manner, inspired by generating images from low to high resolution.
arXiv Detail & Related papers (2020-06-28T21:23:05Z)
POINTER: Constrained Progressive Text Generation via Insertion-based Generative Pre-training [93.79766670391618]
We present POINTER, a novel insertion-based approach for hard-constrained text generation. The proposed method operates by progressively inserting new tokens between existing tokens in a parallel manner. The resulting coarse-to-fine hierarchy makes the generation process intuitive and interpretable.
arXiv Detail & Related papers (2020-05-01T18:11:54Z)
BERT-ATTACK: Adversarial Attack Against BERT Using BERT [77.82947768158132]
Adrial attacks for discrete data (such as texts) are more challenging than continuous data (such as images) We propose textbfBERT-Attack, a high-quality and effective method to generate adversarial samples. Our method outperforms state-of-the-art attack strategies in both success rate and perturb percentage.
arXiv Detail & Related papers (2020-04-21T13:30:02Z)
Self-Adversarial Learning with Comparative Discrimination for Text Generation [111.18614166615968]
We propose a novel self-adversarial learning (SAL) paradigm for improving GANs' performance in text generation. During training, SAL rewards the generator when its currently generated sentence is found to be better than its previously generated samples. Experiments on text generation benchmark datasets show that our proposed approach substantially improves both the quality and the diversity.
arXiv Detail & Related papers (2020-01-31T07:50:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.