Related papers: Advantage-Guided Distillation for Preference Alignment in Small Language Models

Advantage-Guided Distillation for Preference Alignment in Small Language Models

URL: http://arxiv.org/abs/2502.17927v2
Date: Wed, 05 Mar 2025 05:46:28 GMT
Title: Advantage-Guided Distillation for Preference Alignment in Small Language Models
Authors: Shiping Gao, Fanqi Wan, Jiajian Guo, Xiaojun Quan, Qifan Wang,
Abstract summary: We propose to utilize a well-aligned teacher LLM to guide the alignment process for Small Language Models.<n>Our experimental results show that these two approaches appreciably improve the alignment of SLMs and narrow the performance gap with larger counterparts.
Score: 37.1672515839325
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Alignment techniques enable Large Language Models (LLMs) to generate outputs that align with human preferences and play a crucial role in their effectiveness. However, their impact often diminishes when applied to Small Language Models (SLMs), likely due to the limited capacity of these models. Instead of directly applying existing alignment techniques to SLMs, we propose to utilize a well-aligned teacher LLM to guide the alignment process for these models, thereby facilitating the transfer of the teacher's knowledge of human preferences to the student model. To achieve this, we first explore a straightforward approach, Dual-Constrained Knowledge Distillation (DCKD), that employs knowledge distillation with two KL-divergence constraints from the aligned teacher to the unaligned student. To further enhance the student's ability to distinguish between preferred and dispreferred responses, we then propose Advantage-Guided Distillation for Preference Alignment (ADPA), which leverages an advantage function from the aligned teacher to deliver more nuanced, distribution-level reward signals for the student's alignment. Our experimental results show that these two approaches appreciably improve the alignment of SLMs and narrow the performance gap with larger counterparts. Among them, ADPA demonstrates superior performance and achieves even greater effectiveness when integrated with DCKD. Our code is available at https://github.com/SLIT-AI/ADPA.

Related papers

Cultivating Helpful, Personalized, and Creative AI Tutors: A Framework for Pedagogical Alignment using Reinforcement Learning [17.558663729465692]
EduAlign is a framework designed to guide large language models (LLMs) toward becoming more effective and responsible educational assistants.<n>In the first stage, we curate a dataset of 8k educational interactions and annotate them-both manually and automatically-along three key educational dimensions: Helpfulness, Personalization, and Creativity.<n>In the second stage, we leverage HPC-RM as a reward signal to fine-tune a pre-trained LLM using Group Relative Policy Optimization (GRPO) on a set of 2k diverse prompts.
arXiv Detail & Related papers (2025-07-27T15:56:29Z)
Guiding Cross-Modal Representations with MLLM Priors via Preference Alignment [11.460393501694021]
We introduce MAPLE (Modality-Aligned Preference Learning for Embeddings), a novel framework that guides cross modal representation learning.<n>MaPLE formulates the learning process as reinforcement learning with two key components: automatic preference data construction using off-the-shelf MLLM, and a new Relative Preference Alignment (RPA) loss.<n> Experimental results show that our preference-guided alignment achieves substantial gains in fine-grained cross-modal retrieval.
arXiv Detail & Related papers (2025-06-08T02:33:35Z)
DistiLLM-2: A Contrastive Approach Boosts the Distillation of LLMs [58.4911494598431]
DistiLLM-2 is a contrastive approach that simultaneously increases the likelihood of teacher responses and decreases that of student responses. Our experiments show that DistiLLM-2 not only builds high-performing student models across a wide range of tasks, but also supports diverse applications.
arXiv Detail & Related papers (2025-03-10T08:51:32Z)
Capturing Nuanced Preferences: Preference-Aligned Distillation for Small Language Models [22.613040767122225]
We propose a Preference-Aligned Distillation framework, which models teacher's preference knowledge as a probability distribution over all potential preferences.<n>Experiments on four mainstream alignment benchmarks demonstrate that PAD consistently and significantly outperforms existing approaches.
arXiv Detail & Related papers (2025-02-20T05:18:23Z)
TAS: Distilling Arbitrary Teacher and Student via a Hybrid Assistant [52.0297393822012]
We introduce an assistant model as a bridge to facilitate smooth feature knowledge transfer between heterogeneous teachers and students. Within our proposed design principle, the assistant model combines the advantages of cross-architecture inductive biases and module functions. Our proposed method is evaluated across some homogeneous model pairs and arbitrary heterogeneous combinations of CNNs, ViTs, spatial KDs.
arXiv Detail & Related papers (2024-10-16T08:02:49Z)
Direct Preference Knowledge Distillation for Large Language Models [73.50849692633953]
We propose Direct Preference Knowledge Distillation (DPKD) for large language models (LLMs) We re-formulate KD of LLMs into two stages: first optimizing and objective consisting of implicit reward and reverse KL divergence. We prove the value and effectiveness of the introduced implicit reward and output preference in KD through experiments and theoretical analysis.
arXiv Detail & Related papers (2024-06-28T09:23:40Z)
Adversarial Moment-Matching Distillation of Large Language Models [3.9160947065896803]
Knowledge distillation (KD) has been shown to be highly effective in guiding a student model with a larger teacher model. We propose an adversarial training algorithm to jointly estimate the moment-matching distance and optimize the student policy to minimize it. Results from both task-agnostic instruction-following experiments and task-specific experiments demonstrate the effectiveness of our method.
arXiv Detail & Related papers (2024-06-05T05:27:29Z)
PLaD: Preference-based Large Language Model Distillation with Pseudo-Preference Pairs [47.35598271306371]
Large Language Models (LLMs) have exhibited impressive capabilities in various tasks, yet their vast parameter sizes restrict their applicability in resource-constrained settings. Knowledge distillation (KD) offers a viable solution by transferring expertise from large teacher models to compact student models. We present PLaD, a novel preference-based LLM distillation framework.
arXiv Detail & Related papers (2024-06-05T03:08:25Z)
Self-Refine Instruction-Tuning for Aligning Reasoning in Language Models [0.8133739801185272]
The alignments of reasoning abilities between smaller and larger Language Models are largely conducted via Supervised Fine-Tuning (SFT) We propose the Self-refine Instruction-tuning method that elicits Smaller Language Models to self-refine their abilities. Results obtained on commonsense and math reasoning tasks show that this approach significantly outperforms Instruction-tuning in both in-domain and out-domain scenarios.
arXiv Detail & Related papers (2024-05-01T09:10:27Z)
Weakly Supervised Semantic Segmentation via Alternative Self-Dual Teaching [82.71578668091914]
This paper establishes a compact learning framework that embeds the classification and mask-refinement components into a unified deep model. We propose a novel alternative self-dual teaching (ASDT) mechanism to encourage high-quality knowledge interaction.
arXiv Detail & Related papers (2021-12-17T11:56:56Z)
Contrastive Distillation on Intermediate Representations for Language Model Compression [89.31786191358802]
We propose Contrastive Distillation on Intermediate Representations (CoDIR) as a principled knowledge distillation framework. By learning to distinguish positive sample from a large set of negative samples, CoDIR facilitates the student's exploitation of rich information in teacher's hidden layers. CoDIR can be readily applied to compress large-scale language models in both pre-training and finetuning stages, and achieves superb performance on the GLUE benchmark.
arXiv Detail & Related papers (2020-09-29T17:31:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.