Plan2Align: Predictive Planning Based Test-Time Preference Alignment for Large Language Models
- URL: http://arxiv.org/abs/2502.20795v2
- Date: Tue, 27 May 2025 11:26:15 GMT
- Title: Plan2Align: Predictive Planning Based Test-Time Preference Alignment for Large Language Models
- Authors: Kuang-Da Wang, Teng-Ruei Chen, Yu Heng Hung, Guo-Xun Ko, Shuoyang Ding, Yueh-Hua Wu, Yu-Chiang Frank Wang, Chao-Han Huck Yang, Wen-Chih Peng, Ping-Chun Hsieh,
- Abstract summary: We propose Plan2Align, a test-time alignment framework that formulates text generation as a predictive planning problem.<n>Plan2Align adapts Model Predictive Control (MPC) to iteratively refine output by rolling out multiple complete responses and optimizing each segment.<n>Compared to existing training-time and test-time alignment methods on LLaMA-3.1 8B, Plan2Align achieves comparable or superior results.
- Score: 41.51666974003913
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Aligning Large Language Models with Preference Fine-Tuning is often resource-intensive. Test-time alignment techniques that do not modify the underlying models, such as prompting and guided decodings, offer a lightweight alternative. However, existing test-time alignment methods primarily improve short responses and fail to ensure coherence over extended contexts due to the myopic nature of token-level alignment. Moreover, these methods often incur a slowdown during inference. To address these challenges, we propose Plan2Align, a test-time alignment framework that formulates text generation as a predictive planning problem. Plan2Align adapts Model Predictive Control (MPC) to iteratively refine output by rolling out multiple complete responses and optimizing each segment. To more rigorously evaluate the effectiveness and efficiency, we focus on the more challenging task of long-text generation. Experiments on the long-form response subset of the HH-RLHF dataset and the WMT'24 Discourse-Level Literary Translation demonstrate that Plan2Align significantly enhances the performance of base LLMs. Compared to existing training-time and test-time alignment methods on LLaMA-3.1 8B, Plan2Align achieves comparable or superior results, while also delivering improved inference efficiency relative to prior test-time alignment approaches.
Related papers
- Can Prompt Difficulty be Online Predicted for Accelerating RL Finetuning of Reasoning Models? [62.579951798437115]
This work investigates iterative approximate evaluation for arbitrary prompts.<n>It introduces Model Predictive Prompt Selection (MoPPS), a Bayesian risk-predictive framework.<n>MoPPS reliably predicts prompt difficulty and accelerates training with significantly reduced rollouts.
arXiv Detail & Related papers (2025-07-07T03:20:52Z) - Multilingual Contextualization of Large Language Models for Document-Level Machine Translation [30.005159724115824]
Large language models (LLMs) have demonstrated strong performance in sentence-level machine translation.
We propose a method to improve LLM-based long-document translation through targeted fine-tuning on high-quality document-level data.
Our approach supports multiple translation paradigms, including direct document-to-document and chunk-level translation.
arXiv Detail & Related papers (2025-04-16T14:52:22Z) - DiffPO: Diffusion-styled Preference Optimization for Efficient Inference-Time Alignment of Large Language Models [50.32663816994459]
Diffusion-styled Preference Optimization (model) provides an efficient and policy-agnostic solution for aligning LLMs with humans.<n>modelavoids the time latency associated with token-level generation.<n>Experiments on AlpacaEval 2, MT-bench, and HH-RLHF demonstrate that modelachieves superior alignment performance across various settings.
arXiv Detail & Related papers (2025-03-06T09:21:54Z) - Words Matter: Leveraging Individual Text Embeddings for Code Generation in CLIP Test-Time Adaptation [21.20806568508201]
We show how to leverage class text information to mitigate distribution drifts encountered by vision-language models (VLMs) during test-time inference.<n>We propose to generate pseudo-labels for the test-time samples by exploiting generic class text embeddings as fixed centroids of a label assignment problem.<n>Experiments on multiple popular test-time adaptation benchmarks presenting diverse complexity empirically show the superiority of CLIP-OT.
arXiv Detail & Related papers (2024-11-26T00:15:37Z) - In-context Demonstration Matters: On Prompt Optimization for Pseudo-Supervision Refinement [71.60563181678323]
Large language models (LLMs) have achieved great success across diverse tasks, and fine-tuning is sometimes needed to further enhance generation quality.<n>To handle these challenges, a direct solution is to generate high-confidence'' data from unsupervised downstream tasks.<n>We propose a novel approach, pseudo-supervised demonstrations aligned prompt optimization (PAPO) algorithm, which jointly refines both the prompt and the overall pseudo-supervision.
arXiv Detail & Related papers (2024-10-04T03:39:28Z) - TS-HTFA: Advancing Time Series Forecasting via Hierarchical Text-Free Alignment with Large Language Models [14.411646409316624]
We introduce textbfHierarchical textbfText-textbfFree textbfAlignment (textbfTS-HTFA), a novel method for time-series forecasting.<n>We replace paired text data with adaptive virtual text based on QR decomposition word embeddings and learnable prompt.<n>Experiments on multiple time-series benchmarks demonstrate that HTFA achieves state-of-the-art performance.
arXiv Detail & Related papers (2024-09-23T12:57:24Z) - Towards Chapter-to-Chapter Context-Aware Literary Translation via Large Language Models [16.96647110733261]
discourse phenomena in existing document-level translation datasets are sparse.
Most existing document-level corpora and context-aware machine translation methods rely on an unrealistic assumption on sentence-level alignments.
We propose a more pragmatic and challenging setting for context-aware translation, termed chapter-to-chapter (Ch2Ch) translation.
arXiv Detail & Related papers (2024-07-12T04:18:22Z) - Don't Forget Your Reward Values: Language Model Alignment via
Value-based Calibration [26.467379188463028]
We propose a novel textbfValue-based textbfCalitextbfBration (VCB) method to better align Large Language Models with human preferences.
Experimental results demonstrate that VCB surpasses existing alignment methods on AI assistant and summarization datasets.
arXiv Detail & Related papers (2024-02-25T08:45:10Z) - Contextual Refinement of Translations: Large Language Models for Sentence and Document-Level Post-Editing [12.843274390224853]
Large Language Models (LLM's) have demonstrated considerable success in various Natural Language Processing tasks.
We show that they have yet to attain state-of-the-art performance in Neural Machine Translation.
We propose adapting LLM's as Automatic Post-Editors (APE) rather than direct translators.
arXiv Detail & Related papers (2023-10-23T12:22:15Z) - On Pitfalls of Test-Time Adaptation [82.8392232222119]
Test-Time Adaptation (TTA) has emerged as a promising approach for tackling the robustness challenge under distribution shifts.
We present TTAB, a test-time adaptation benchmark that encompasses ten state-of-the-art algorithms, a diverse array of distribution shifts, and two evaluation protocols.
arXiv Detail & Related papers (2023-06-06T09:35:29Z) - Dual-Alignment Pre-training for Cross-lingual Sentence Embedding [79.98111074307657]
We propose a dual-alignment pre-training (DAP) framework for cross-lingual sentence embedding.
We introduce a novel representation translation learning (RTL) task, where the model learns to use one-side contextualized token representation to reconstruct its translation counterpart.
Our approach can significantly improve sentence embedding.
arXiv Detail & Related papers (2023-05-16T03:53:30Z) - Modeling Context With Linear Attention for Scalable Document-Level
Translation [72.41955536834702]
We investigate the efficacy of a recent linear attention model on document translation and augment it with a sentential gate to promote a recency inductive bias.
We show that sentential gating further improves translation quality on IWSLT.
arXiv Detail & Related papers (2022-10-16T03:41:50Z) - End-to-End Training for Back-Translation with Categorical Reparameterization Trick [0.0]
Back-translation is an effective semi-supervised learning framework in neural machine translation (NMT)
A pre-trained NMT model translates monolingual sentences and makes synthetic bilingual sentence pairs for the training of the other NMT model.
The discrete property of translated sentences prevents information gradient from flowing between the two NMT models.
arXiv Detail & Related papers (2022-02-17T06:31:03Z) - Meta Back-translation [111.87397401837286]
We propose a novel method to generate pseudo-parallel data from a pre-trained back-translation model.
Our method is a meta-learning algorithm which adapts a pre-trained back-translation model so that the pseudo-parallel data it generates would train a forward-translation model to do well on a validation set.
arXiv Detail & Related papers (2021-02-15T20:58:32Z) - Unsupervised Cross-lingual Adaptation for Sequence Tagging and Beyond [58.80417796087894]
Cross-lingual adaptation with multilingual pre-trained language models (mPTLMs) mainly consists of two lines of works: zero-shot approach and translation-based approach.
We propose a novel framework to consolidate the zero-shot approach and the translation-based approach for better adaptation performance.
arXiv Detail & Related papers (2020-10-23T13:47:01Z) - Improving Text Generation with Student-Forcing Optimal Transport [122.11881937642401]
We propose using optimal transport (OT) to match the sequences generated in training and testing modes.
An extension is also proposed to improve the OT learning, based on the structural and contextual information of the text sequences.
The effectiveness of the proposed method is validated on machine translation, text summarization, and text generation tasks.
arXiv Detail & Related papers (2020-10-12T19:42:25Z) - Learning Contextualized Sentence Representations for Document-Level
Neural Machine Translation [59.191079800436114]
Document-level machine translation incorporates inter-sentential dependencies into the translation of a source sentence.
We propose a new framework to model cross-sentence dependencies by training neural machine translation (NMT) to predict both the target translation and surrounding sentences of a source sentence.
arXiv Detail & Related papers (2020-03-30T03:38:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.