An Uncertainty-Driven Adaptive Self-Alignment Framework for Large Language Models
- URL: http://arxiv.org/abs/2507.17477v1
- Date: Wed, 23 Jul 2025 13:00:00 GMT
- Title: An Uncertainty-Driven Adaptive Self-Alignment Framework for Large Language Models
- Authors: Haoran Sun, Zekun Zhang, Shaoning Zeng,
- Abstract summary: Large Language Models (LLMs) have demonstrated remarkable progress in instruction following and general-purpose reasoning.<n>High-quality alignment with human intent and safety norms without human annotations remains a fundamental challenge.<n>We propose an Uncertainty-Driven Adaptive Self-Alignment framework designed to improve LLM alignment in a fully automated manner.
- Score: 18.62332474172811
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Language Models (LLMs) have demonstrated remarkable progress in instruction following and general-purpose reasoning. However, achieving high-quality alignment with human intent and safety norms without human annotations remains a fundamental challenge. In this work, we propose an Uncertainty-Driven Adaptive Self-Alignment (UDASA) framework designed to improve LLM alignment in a fully automated manner. UDASA first generates multiple responses for each input and quantifies output uncertainty across three dimensions: semantics, factuality, and value alignment. Based on these uncertainty scores, the framework constructs preference pairs and categorizes training samples into three stages, conservative, moderate, and exploratory, according to their uncertainty difference. The model is then optimized progressively across these stages. In addition, we conduct a series of preliminary studies to validate the core design assumptions and provide strong empirical motivation for the proposed framework. Experimental results show that UDASA outperforms existing alignment methods across multiple tasks, including harmlessness, helpfulness, truthfulness, and controlled sentiment generation, significantly improving model performance.
Related papers
- Test-Time Consistency in Vision Language Models [26.475993408532304]
Vision-Language Models (VLMs) have achieved impressive performance across a wide range of multimodal tasks.<n>Recent benchmarks, such as MM-R3, highlight that even state-of-the-art VLMs can produce divergent predictions across semantically equivalent inputs.<n>We propose a simple and effective test-time consistency framework that enhances semantic consistency without supervised re-training.
arXiv Detail & Related papers (2025-06-27T17:09:44Z) - QA-LIGN: Aligning LLMs through Constitutionally Decomposed QA [49.9801383018588]
We introduce QA-LIGN, an automatic symbolic reward decomposition approach.<n>Instead of training a black-box reward model that outputs a monolithic score, QA-LIGN formulates principle-specific evaluation questions.<n>Experiments aligning an uncensored large language model with a set of constitutional principles demonstrate that QA-LIGN offers greater transparency and adaptability.
arXiv Detail & Related papers (2025-06-09T18:24:57Z) - Bounded Rationality for LLMs: Satisficing Alignment at Inference-Time [52.230936493691985]
We propose SITAlign, an inference framework that addresses the multifaceted nature of alignment by maximizing a primary objective while satisfying threshold-based constraints on secondary criteria.<n>We provide theoretical insights by deriving sub-optimality bounds of our satisficing based inference alignment approach.
arXiv Detail & Related papers (2025-05-29T17:56:05Z) - Seeing is Believing, but How Much? A Comprehensive Analysis of Verbalized Calibration in Vision-Language Models [15.158475816860427]
Uncertainty is essential for assessing the reliability and trustworthiness of modern AI systems.<n> verbalized uncertainty, where models express their confidence through natural language, has emerged as a lightweight and interpretable solution.<n>However, its effectiveness in vision-language models (VLMs) remains insufficiently studied.
arXiv Detail & Related papers (2025-05-26T17:16:36Z) - Data-Driven Calibration of Prediction Sets in Large Vision-Language Models Based on Inductive Conformal Prediction [0.0]
We propose a model-agnostic uncertainty quantification method that integrates dynamic threshold calibration and cross-modal consistency verification.<n>We show that the framework achieves stable performance across varying calibration-to-test split ratios, underscoring its robustness for real-world deployment in healthcare, autonomous systems, and other safety-sensitive domains.<n>This work bridges the gap between theoretical reliability and practical applicability in multi-modal AI systems, offering a scalable solution for hallucination detection and uncertainty-aware decision-making.
arXiv Detail & Related papers (2025-04-24T15:39:46Z) - Evaluation of Seismic Artificial Intelligence with Uncertainty [0.0]
We develop an evaluation framework for evaluating and comparing deep learning models (DLMs)<n>Our framework helps practitioners choose the best model for their problem and set performance expectations.
arXiv Detail & Related papers (2025-01-15T16:45:51Z) - SMLE: Safe Machine Learning via Embedded Overapproximation [4.129133569151574]
We consider the task of training differentiable ML models guaranteed to satisfy designer-chosen properties.
This is very challenging, due to the computational complexity of rigorously verifying and enforcing compliance in modern neural models.
We provide an innovative approach based on three components: 1) a general, simple architecture enabling efficient verification with a conservative semantic.
We evaluate our approach on properties defined by linear inequalities in regression, and on mutually exclusive classes in multilabel classification.
arXiv Detail & Related papers (2024-09-30T17:19:57Z) - Uncertainty Aware Learning for Language Model Alignment [97.36361196793929]
We propose uncertainty-aware learning (UAL) to improve the model alignment of different task scenarios.
We implement UAL in a simple fashion -- adaptively setting the label smoothing value of training according to the uncertainty of individual samples.
Experiments on widely used benchmarks demonstrate that our UAL significantly and consistently outperforms standard supervised fine-tuning.
arXiv Detail & Related papers (2024-06-07T11:37:45Z) - Controllable Preference Optimization: Toward Controllable Multi-Objective Alignment [103.12563033438715]
Alignment in artificial intelligence pursues consistency between model responses and human preferences as well as values.
Existing alignment techniques are mostly unidirectional, leading to suboptimal trade-offs and poor flexibility over various objectives.
We introduce controllable preference optimization (CPO), which explicitly specifies preference scores for different objectives.
arXiv Detail & Related papers (2024-02-29T12:12:30Z) - Towards Calibrated Robust Fine-Tuning of Vision-Language Models [97.19901765814431]
This work proposes a robust fine-tuning method that improves both OOD accuracy and confidence calibration simultaneously in vision language models.
We show that both OOD classification and OOD calibration errors have a shared upper bound consisting of two terms of ID data.
Based on this insight, we design a novel framework that conducts fine-tuning with a constrained multimodal contrastive loss enforcing a larger smallest singular value.
arXiv Detail & Related papers (2023-11-03T05:41:25Z) - Understanding and Constructing Latent Modality Structures in Multi-modal
Representation Learning [53.68371566336254]
We argue that the key to better performance lies in meaningful latent modality structures instead of perfect modality alignment.
Specifically, we design 1) a deep feature separation loss for intra-modality regularization; 2) a Brownian-bridge loss for inter-modality regularization; and 3) a geometric consistency loss for both intra- and inter-modality regularization.
arXiv Detail & Related papers (2023-03-10T14:38:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.