Related papers: Leveraging Importance Sampling to Detach Alignment Modules from Large Language Models

Leveraging Importance Sampling to Detach Alignment Modules from Large Language Models

URL: http://arxiv.org/abs/2505.19700v1
Date: Mon, 26 May 2025 08:53:02 GMT
Title: Leveraging Importance Sampling to Detach Alignment Modules from Large Language Models
Authors: Yi Liu, Dianqing Liu, Mingye Zhu, Junbo Guo, Yongdong Zhang, Zhendong Mao,
Abstract summary: Traditional alignment methods often require retraining large pretrained models.<n>We propose a novel textitResidual Alignment Model (textitRAM) that formalizes the alignment process as a type of importance sampling.<n>We develop a resampling algorithm with iterative token-level decoding to address the common first-token latency issue in comparable methods.
Score: 50.19188692497892
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The widespread adoption of large language models (LLMs) across industries has increased the demand for high-quality and customizable outputs. However, traditional alignment methods often require retraining large pretrained models, making it difficult to quickly adapt and optimize LLMs for diverse applications. To address this limitation, we propose a novel \textit{Residual Alignment Model} (\textit{RAM}) that formalizes the alignment process as a type of importance sampling. In this framework, the unaligned upstream model serves as the proposal distribution, while the alignment process is framed as secondary sampling based on an autoregressive alignment module that acts as an estimator of the importance weights. This design enables a natural detachment of the alignment module from the target aligned model, improving flexibility and scalability. Based on this model, we derive an efficient sequence-level training strategy for the alignment module, which operates independently of the proposal module. Additionally, we develop a resampling algorithm with iterative token-level decoding to address the common first-token latency issue in comparable methods. Experimental evaluations on two leading open-source LLMs across diverse tasks, including instruction following, domain adaptation, and preference optimization, demonstrate that our approach consistently outperforms baseline models.

Related papers

A Novel Self-Evolution Framework for Large Language Models [18.62332474172811]
We propose a novel Dual-Phase Self-Evolution framework to jointly optimize user preference adaptation and domain-specific competence.<n>Experiments across general NLP benchmarks and long-term dialogue tasks demonstrate that DPSE consistently outperforms Supervised Fine-Tuning, Preference Optimization, and Memory-Augmented baselines.
arXiv Detail & Related papers (2025-07-21T06:30:39Z)
Well Begun is Half Done: Low-resource Preference Alignment by Weak-to-Strong Decoding [26.416630784362525]
Large Language Models (LLMs) require alignment with human preferences to avoid generating offensive, false, or meaningless content.<n>We propose a novel framework, Weak-to-Strong Decoding (WSD), to enhance the alignment ability of base models.<n>We also collect a new dataset, GenAligner, to fine-tune a small-sized Pilot-3B as the draft model.
arXiv Detail & Related papers (2025-06-09T05:21:22Z)
Relative Overfitting and Accept-Reject Framework [5.465098504510676]
We propose an ensemble framework that governs how models are segmented to ensure performance improvement.<n>We detail the patterns of this framework within the domain of NLP and briefly describe its to other fields, such as computer vision (CV) and AI for science.
arXiv Detail & Related papers (2025-05-12T17:36:14Z)
DiffPO: Diffusion-styled Preference Optimization for Efficient Inference-Time Alignment of Large Language Models [50.32663816994459]
Diffusion-styled Preference Optimization (model) provides an efficient and policy-agnostic solution for aligning LLMs with humans.<n>modelavoids the time latency associated with token-level generation.<n>Experiments on AlpacaEval 2, MT-bench, and HH-RLHF demonstrate that modelachieves superior alignment performance across various settings.
arXiv Detail & Related papers (2025-03-06T09:21:54Z)
Few-shot Steerable Alignment: Adapting Rewards and LLM Policies with Neural Processes [50.544186914115045]
Large language models (LLMs) are increasingly embedded in everyday applications.<n> Ensuring their alignment with the diverse preferences of individual users has become a critical challenge.<n>We present a novel framework for few-shot steerable alignment.
arXiv Detail & Related papers (2024-12-18T16:14:59Z)
Inference time LLM alignment in single and multidomain preference spectrum [16.849200702288307]
We introduce inference-time model alignment method that learns encoded representations of preference dimensions. These representations are computed by subtraction of the base model from the aligned model as in model editing. Even though the preference dimensions can span various levels, here we focus on three gradual response levels across three specialized domains.
arXiv Detail & Related papers (2024-10-24T23:31:39Z)
Duo-LLM: A Framework for Studying Adaptive Computation in Large Language Models [16.16372459671255]
Large Language Models (LLMs) typically generate outputs token by token using a fixed compute budget. We propose a novel framework that integrates smaller auxiliary modules within each Feed-Forward Network layer of the LLM. We show that trained routers operate differently from oracles and often yield suboptimal solutions.
arXiv Detail & Related papers (2024-10-01T16:10:21Z)
Adaptive Draft-Verification for Efficient Large Language Model Decoding [24.347886232342862]
Large language model (LLM) decoding involves generating a sequence of tokens based on a given context. The typical autoregressive decoding method requires a separate forward pass through the model for each token generated. We introduce ADED, which accelerates LLM decoding without requiring fine-tuning.
arXiv Detail & Related papers (2024-06-27T22:20:39Z)
Aligning Large Language Models via Fine-grained Supervision [20.35000061196631]
Pre-trained large-scale language models (LLMs) excel at producing coherent articles, yet their outputs may be untruthful, toxic, or fail to align with user expectations. Current approaches focus on using reinforcement learning with human feedback to improve model alignment. We propose a method to enhance LLM alignment through fine-grained token-level supervision.
arXiv Detail & Related papers (2024-06-04T20:21:45Z)
Entropy-Regularized Token-Level Policy Optimization for Language Agent Reinforcement [67.1393112206885]
Large Language Models (LLMs) have shown promise as intelligent agents in interactive decision-making tasks. We introduce Entropy-Regularized Token-level Policy Optimization (ETPO), an entropy-augmented RL method tailored for optimizing LLMs at the token level. We assess the effectiveness of ETPO within a simulated environment that models data science code generation as a series of multi-step interactive tasks.
arXiv Detail & Related papers (2024-02-09T07:45:26Z)
Amortizing intractable inference in large language models [56.92471123778389]
We use amortized Bayesian inference to sample from intractable posterior distributions. We empirically demonstrate that this distribution-matching paradigm of LLM fine-tuning can serve as an effective alternative to maximum-likelihood training. As an important application, we interpret chain-of-thought reasoning as a latent variable modeling problem.
arXiv Detail & Related papers (2023-10-06T16:36:08Z)
Can SAM Boost Video Super-Resolution? [78.29033914169025]
We propose a simple yet effective module -- SAM-guidEd refinEment Module (SEEM) This light-weight plug-in module is specifically designed to leverage the attention mechanism for the generation of semantic-aware feature. We apply our SEEM to two representative methods, EDVR and BasicVSR, resulting in consistently improved performance with minimal implementation effort.
arXiv Detail & Related papers (2023-05-11T02:02:53Z)
When to Update Your Model: Constrained Model-based Reinforcement Learning [50.74369835934703]
We propose a novel and general theoretical scheme for a non-decreasing performance guarantee of model-based RL (MBRL) Our follow-up derived bounds reveal the relationship between model shifts and performance improvement. A further example demonstrates that learning models from a dynamically-varying number of explorations benefit the eventual returns.
arXiv Detail & Related papers (2022-10-15T17:57:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.