Anchored Alignment for Self-Explanations Enhancement
- URL: http://arxiv.org/abs/2410.13216v1
- Date: Thu, 17 Oct 2024 04:42:48 GMT
- Title: Anchored Alignment for Self-Explanations Enhancement
- Authors: Luis Felipe Villa-Arenas, Ata Nizamoglu, Qianli Wang, Sebastian Möller, Vera Schmitt,
- Abstract summary: We introduce a methodology for alignment designed to enhance the ability of large language models to articulate their reasoning.
Our methodology comprises three key components: explanation quality assessment, self-instruction dataset generation, and model alignment.
- Score: 10.322090458234735
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this work, we introduce a methodology for alignment designed to enhance the ability of large language models (LLMs) to articulate their reasoning (self-explanation) even in the absence of annotated rationale explanations. Our alignment methodology comprises three key components: explanation quality assessment, self-instruction dataset generation, and model alignment. Additionally, we present a novel technique called Alignment with Anchor Preference Pairs, which improves the selection of preference pairs by categorizing model outputs into three groups: consistently correct, consistently incorrect, and variable. By applying tailored strategies to each category, we enhance the effectiveness of Direct Preference Optimization (DPO). Our experimental results demonstrate that this approach significantly improves explanation quality while maintaining accuracy compared to other fine-tuning strategies.
Related papers
- AIR: A Systematic Analysis of Annotations, Instructions, and Response Pairs in Preference Dataset [95.45316956434608]
Preference learning is critical for aligning large language models with human values.
Our work shifts preference dataset design from ad hoc scaling to component-aware optimization.
arXiv Detail & Related papers (2025-04-04T17:33:07Z) - UMB@PerAnsSumm 2025: Enhancing Perspective-Aware Summarization with Prompt Optimization and Supervised Fine-Tuning [8.095763327154335]
We present our approach to the PerAnsSumm Shared Task, which involves perspective span identification and perspective-aware summarization.
For span identification, we adopt ensemble learning that integrates three transformer models through averaging to exploit individual model strengths.
For summarization, we design a suite of Chain-of-Thought (CoT) prompting strategies that incorporate keyphrases and guide information to structure summary generation into manageable steps.
arXiv Detail & Related papers (2025-03-14T06:29:51Z) - Like Father, Like Son: Kinship-Aware Preference Mapping (KARMA) for Automatic Alignment in Large Language Models [2.970904425631548]
Kinship-Aware pReference MApping (KARMA) is a novel framework that pairs responses from models with comparable competencies.
By constraining preference comparisons to outputs of similar complexity and quality, KARMA enhances the informativeness of preference data.
Empirical evaluations demonstrate that our kinship-aware approach leads to more consistent and interpretable alignment outcomes.
arXiv Detail & Related papers (2025-02-26T01:36:40Z) - PIPA: Preference Alignment as Prior-Informed Statistical Estimation [57.24096291517857]
We introduce Pior-Informed Preference Alignment (PIPA), a unified, RL-free probabilistic framework.
PIPA accommodates both paired and unpaired data, as well as answer and step-level annotations.
By integrating different types of prior information, we developed two variations of PIPA: PIPA-M and PIPA-N.
arXiv Detail & Related papers (2025-02-09T04:31:30Z) - Re-evaluating Group Robustness via Adaptive Class-Specific Scaling [47.41034887474166]
Group distributionally robust optimization is a prominent algorithm used to mitigate spurious correlations and address dataset bias.
Existing approaches have reported improvements in robust accuracies but come at the cost of average accuracy due to inherent trade-offs.
We propose a class-specific scaling strategy, directly applicable to existing debiasing algorithms with no additional training.
We develop an instance-wise adaptive scaling technique to alleviate this trade-off, even leading to improvements in both robust and average accuracies.
arXiv Detail & Related papers (2024-12-19T16:01:51Z) - Dynamic Rewarding with Prompt Optimization Enables Tuning-free Self-Alignment of Language Models [54.381650481255235]
We introduce a new tuning-free approach for self-alignment, Dynamic Rewarding with Prompt Optimization (O)
Our approach leverages a search-based optimization framework that allows LLMs to iteratively self-improve and craft the optimal alignment instructions.
Empirical evaluations on eight recent LLMs, both open and closed-sourced, demonstrate that DRPO significantly enhances alignment performance.
arXiv Detail & Related papers (2024-11-13T16:15:38Z) - Self-supervised Preference Optimization: Enhance Your Language Model with Preference Degree Awareness [27.43137305486112]
We propose a novel Self-supervised Preference Optimization (SPO) framework, which constructs a self-supervised preference degree loss combined with the alignment loss.
The results demonstrate that SPO can be seamlessly integrated with existing preference optimization methods to achieve state-of-the-art performance.
arXiv Detail & Related papers (2024-09-26T12:37:26Z) - Preference Alignment Improves Language Model-Based TTS [76.70693823683091]
preference alignment algorithms adjust LMs to align with the preferences of reward models, enhancing the desirability of the generated content.
With a 1.15B parameter LM-based TTS model, we demonstrate that preference alignment consistently improves intelligibility, speaker similarity, and proxy subjective evaluation scores.
arXiv Detail & Related papers (2024-09-19T01:58:19Z) - Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment [57.03947082589616]
Large Language Models (LLMs) are often aligned using contrastive alignment objectives and preference pair datasets.
We study this and find that preference data gives a better learning signal when the underlying responses are contrastive.
We introduce Contrastive Learning from AI Revisions (CLAIR), a data-creation method which leads to more contrastive preference pairs.
Our best model, trained on 32K CLAIR preferences with APO, improves Llama-3-8B-Instruct by 7.65%, closing the gap with GPT4-turbo by 45%.
arXiv Detail & Related papers (2024-08-12T16:24:51Z) - Learn Your Reference Model for Real Good Alignment [3.091688550418396]
offline methods for Large Language Models (LLMs) alignment are susceptible to overoptimization.
We propose a new paradigm of offline alignment methods, called Trust Region, which dynamically updates the reference policy throughout the training process.
Our results show that TR alignment methods effectively mitigate overoptimization, enabling models to maintain strong performance even when substantially deviating from the initial reference policy.
arXiv Detail & Related papers (2024-04-15T10:44:31Z) - Preference optimization of protein language models as a multi-objective
binder design paradigm [0.0]
We present a multi-objective binder design paradigm based on instruction fine-tuning and direct preference optimization.
We show the proposed alignment strategy enables ProtGPT2 to effectively design binders conditioned on specified receptors and a drug developability criterion.
arXiv Detail & Related papers (2024-03-07T03:36:03Z) - Controllable Preference Optimization: Toward Controllable Multi-Objective Alignment [103.12563033438715]
Alignment in artificial intelligence pursues consistency between model responses and human preferences as well as values.
Existing alignment techniques are mostly unidirectional, leading to suboptimal trade-offs and poor flexibility over various objectives.
We introduce controllable preference optimization (CPO), which explicitly specifies preference scores for different objectives.
arXiv Detail & Related papers (2024-02-29T12:12:30Z) - Contextualization Distillation from Large Language Model for Knowledge
Graph Completion [51.126166442122546]
We introduce the Contextualization Distillation strategy, a plug-in-and-play approach compatible with both discriminative and generative KGC frameworks.
Our method begins by instructing large language models to transform compact, structural triplets into context-rich segments.
Comprehensive evaluations across diverse datasets and KGC techniques highlight the efficacy and adaptability of our approach.
arXiv Detail & Related papers (2024-01-28T08:56:49Z) - Heterogeneous Calibration: A post-hoc model-agnostic framework for
improved generalization [8.815439276597818]
We introduce the notion of heterogeneous calibration that applies a post-hoc model-agnostic transformation to model outputs for improving AUC performance on binary classification tasks.
We refer to simple patterns as heterogeneous partitions of the feature space and show theoretically that perfectly calibrating each partition separately optimize AUC.
While the theoretical optimality of this framework holds for any model, we focus on deep neural networks (DNNs) and test the simplest instantiation of this paradigm on a variety of open-source datasets.
arXiv Detail & Related papers (2022-02-10T05:08:50Z) - Local and Global Context-Based Pairwise Models for Sentence Ordering [0.0]
In this paper, we put forward a set of robust local and global context-based pairwise ordering strategies.
Our proposed encoding method utilizes the paragraph's rich global contextual information to predict the pairwise order.
Analysis of the two proposed decoding strategies helps better explain error propagation in pairwise models.
arXiv Detail & Related papers (2021-10-08T17:57:59Z) - Optimization-Inspired Learning with Architecture Augmentations and
Control Mechanisms for Low-Level Vision [74.9260745577362]
This paper proposes a unified optimization-inspired learning framework to aggregate Generative, Discriminative, and Corrective (GDC) principles.
We construct three propagative modules to effectively solve the optimization models with flexible combinations.
Experiments across varied low-level vision tasks validate the efficacy and adaptability of GDC.
arXiv Detail & Related papers (2020-12-10T03:24:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.