Related papers: Beyond Under-Alignment: Atomic Preference Enhanced Factuality Tuning for Large Language Models

Beyond Under-Alignment: Atomic Preference Enhanced Factuality Tuning for Large Language Models

URL: http://arxiv.org/abs/2406.12416v2
Date: Thu, 27 Jun 2024 12:07:55 GMT
Title: Beyond Under-Alignment: Atomic Preference Enhanced Factuality Tuning for Large Language Models
Authors: Hongbang Yuan, Yubo Chen, Pengfei Cao, Zhuoran Jin, Kang Liu, Jun Zhao,
Abstract summary: We evaluate the factuality of different models tuned by various preference learning algorithms. We propose textbfAPEFT (textbfAtomic textbfPreference textbfEnhanced textbfFactuality textbfTuning) to enhance model's awareness of factuality.
Score: 19.015202590038996
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models (LLMs) have achieved remarkable success but still tend to generate factually erroneous responses, a phenomenon known as hallucination. A recent trend is to use preference learning to fine-tune models to align with factuality. However, existing work primarily evaluates fine-tuned models on in-domain (ID) datasets and the factuality on out-of-domain (OOD) datasets remains underexplored. In this paper, we conduct a comprehensive evaluation of the factuality of different models tuned by various preference learning algorithms and demonstrate that their performance on OOD datasets either increases minimally or decreases. Subsequently, we reveal that the main cause of model's failure to uphold factuality under a distribution shift is \textbf{under-alignment}, rather than \textbf{over-alignment}, by analyzing the token distribution shift of the models before and after tuning. Finally, we propose \textbf{APEFT} (\textbf{A}tomic \textbf{P}reference \textbf{E}nhanced \textbf{F}actuality \textbf{T}uning), a framework that enhances model's awareness of factuality at the granularity of individual facts. Extensive experiments demonstrate that APEFT improves model performance by an average of $\boldsymbol{3.45\%}$ on both ID and OOD datasets, which is highly effective.

Related papers

Nonparametric Data Attribution for Diffusion Models [57.820618036556084]
Data attribution for generative models seeks to quantify the influence of individual training examples on model outputs.<n>We propose a nonparametric attribution method that operates entirely on data, measuring influence via patch-level similarity between generated and training images.
arXiv Detail & Related papers (2025-10-16T03:37:16Z)
Active Learning of Model Discrepancy with Bayesian Experimental Design [0.0]
We propose an efficient approach to learn the model discrepancy based on the data from a sequential experimental design (BED)<n>We show that the proposed method is efficient and robust to the active learning of high-dimensional model discrepancy, using data suggested by the sequential BED.<n>We also demonstrate that the proposed method is compatible with both classical numerical solvers and modern auto-differentiable solvers.
arXiv Detail & Related papers (2025-02-07T22:54:20Z)
Rethinking Relation Extraction: Beyond Shortcuts to Generalization with a Debiased Benchmark [53.876493664396506]
Benchmarks are crucial for evaluating machine learning algorithm performance, facilitating comparison and identifying superior solutions. This paper addresses the issue of entity bias in relation extraction tasks, where models tend to rely on entity mentions rather than context. We propose a debiased relation extraction benchmark DREB that breaks the pseudo-correlation between entity mentions and relation types through entity replacement. To establish a new baseline on DREB, we introduce MixDebias, a debiasing method combining data-level and model training-level techniques.
arXiv Detail & Related papers (2025-01-02T17:01:06Z)
A Collaborative Ensemble Framework for CTR Prediction [73.59868761656317]
We propose a novel framework, Collaborative Ensemble Training Network (CETNet), to leverage multiple distinct models. Unlike naive model scaling, our approach emphasizes diversity and collaboration through collaborative learning. We validate our framework on three public datasets and a large-scale industrial dataset from Meta.
arXiv Detail & Related papers (2024-11-20T20:38:56Z)
MITA: Bridging the Gap between Model and Data for Test-time Adaptation [68.62509948690698]
Test-Time Adaptation (TTA) has emerged as a promising paradigm for enhancing the generalizability of models. We propose Meet-In-The-Middle based MITA, which introduces energy-based optimization to encourage mutual adaptation of the model and data from opposing directions.
arXiv Detail & Related papers (2024-10-12T07:02:33Z)
A Plug-and-Play Method for Rare Human-Object Interactions Detection by Bridging Domain Gap [50.079224604394]
We present a novel model-agnostic framework called textbfContext-textbfEnhanced textbfFeature textbfAment (CEFA) CEFA consists of a feature alignment module and a context enhancement module. Our method can serve as a plug-and-play module to improve the detection performance of HOI models on rare categories.
arXiv Detail & Related papers (2024-07-31T08:42:48Z)
Low-rank finetuning for LLMs: A fairness perspective [54.13240282850982]
Low-rank approximation techniques have become the de facto standard for fine-tuning Large Language Models. This paper investigates the effectiveness of these methods in capturing the shift of fine-tuning datasets from the initial pre-trained data distribution. We show that low-rank fine-tuning inadvertently preserves undesirable biases and toxic behaviors.
arXiv Detail & Related papers (2024-05-28T20:43:53Z)
Feature Protection For Out-of-distribution Generalization [24.072876186625855]
We show that protecting pre-trained features leads to a fine-tuned model more robust to generalization. We show that protecting pre-trained features leads to a fine-tuned model more robust to OOD generalization.
arXiv Detail & Related papers (2024-05-25T03:00:06Z)
Entity-level Factual Adaptiveness of Fine-tuning based Abstractive Summarization Models [31.84120883461332]
We analyze the robustness of fine-tuning based summarization models to the knowledge conflict. We introduce a controllable counterfactual data augmentation method.
arXiv Detail & Related papers (2024-02-23T07:53:39Z)
A PAC-Bayesian Perspective on the Interpolating Information Criterion [54.548058449535155]
We show how a PAC-Bayes bound is obtained for a general class of models, characterizing factors which influence performance in the interpolating regime. We quantify how the test error for overparameterized models achieving effectively zero training error depends on the quality of the implicit regularization imposed by e.g. the combination of model, parameter-initialization scheme.
arXiv Detail & Related papers (2023-11-13T01:48:08Z)
Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution [67.9215891673174]
We propose score entropy as a novel loss that naturally extends score matching to discrete spaces. We test our Score Entropy Discrete Diffusion models on standard language modeling tasks.
arXiv Detail & Related papers (2023-10-25T17:59:12Z)
FD-Align: Feature Discrimination Alignment for Fine-tuning Pre-Trained Models in Few-Shot Learning [21.693779973263172]
In this paper, we introduce a fine-tuning approach termed Feature Discrimination Alignment (FD-Align) Our method aims to bolster the model's generalizability by preserving the consistency of spurious features. Once fine-tuned, the model can seamlessly integrate with existing methods, leading to performance improvements.
arXiv Detail & Related papers (2023-10-23T17:12:01Z)
The Bayesian Context Trees State Space Model for time series modelling and forecasting [8.37609145576126]
A hierarchical Bayesian framework is introduced for developing rich mixture models for real-valued time series. At the top level, meaningful discrete states are identified as appropriately quantised values of some of the most recent samples. At the bottom level, a different, arbitrary model for real-valued time series - a base model - is associated with each state.
arXiv Detail & Related papers (2023-08-02T02:40:42Z)
Preserving Knowledge Invariance: Rethinking Robustness Evaluation of Open Information Extraction [50.62245481416744]
We present the first benchmark that simulates the evaluation of open information extraction models in the real world. We design and annotate a large-scale testbed in which each example is a knowledge-invariant clique. By further elaborating the robustness metric, a model is judged to be robust if its performance is consistently accurate on the overall cliques.
arXiv Detail & Related papers (2023-05-23T12:05:09Z)
Relating Regularization and Generalization through the Intrinsic Dimension of Activations [11.00580615194563]
We show that common regularization techniques uniformly decrease the last-layer ID (LLID) of validation set activations for image classification models. We also examine the LLID over the course of training of models that exhibit grokking.
arXiv Detail & Related papers (2022-11-23T19:00:00Z)
Disfluency Detection with Unlabeled Data and Small BERT Models [3.04133054437883]
We focus on the disfluency detection task, focusing on small, fast, on-device models based on the BERT architecture. We demonstrate it is possible to train disfluency detection models as small as 1.3 MiB, while retaining high performance.
arXiv Detail & Related papers (2021-04-21T21:24:32Z)
Elastic weight consolidation for better bias inoculation [24.12790037712358]
Elastic weight consolidation (EWC) allows fine-tuning of models to mitigate biases. EWC dominates standard fine-tuning, yielding models with lower levels of forgetting on the original (biased) dataset.
arXiv Detail & Related papers (2020-04-29T17:45:12Z)
Decomposed Adversarial Learned Inference [118.27187231452852]
We propose a novel approach, Decomposed Adversarial Learned Inference (DALI) DALI explicitly matches prior and conditional distributions in both data and code spaces. We validate the effectiveness of DALI on the MNIST, CIFAR-10, and CelebA datasets.
arXiv Detail & Related papers (2020-04-21T20:00:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.