From Attribution to Action: Jointly ALIGNing Predictions and Explanations
- URL: http://arxiv.org/abs/2511.06944v1
- Date: Mon, 10 Nov 2025 10:52:17 GMT
- Title: From Attribution to Action: Jointly ALIGNing Predictions and Explanations
- Authors: Dongsheng Hong, Chao Chen, Yanhui Chen, Shanshan Lin, Zhihao Chen, Xiangwen Liao,
- Abstract summary: We propose ALIGN, a novel framework that jointly trains a classifier and a masker in an iterative manner.<n>By leveraging high-quality masks as guidance, ALIGN improves both interpretability and generalizability, showing its superiority across various settings.
- Score: 7.1383591932321115
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Explanation-guided learning (EGL) has shown promise in aligning model predictions with interpretable reasoning, particularly in computer vision tasks. However, most approaches rely on external annotations or heuristic-based segmentation to supervise model explanations, which can be noisy, imprecise and difficult to scale. In this work, we provide both empirical and theoretical evidence that low-quality supervision signals can degrade model performance rather than improve it. In response, we propose ALIGN, a novel framework that jointly trains a classifier and a masker in an iterative manner. The masker learns to produce soft, task-relevant masks that highlight informative regions, while the classifier is optimized for both prediction accuracy and alignment between its saliency maps and the learned masks. By leveraging high-quality masks as guidance, ALIGN improves both interpretability and generalizability, showing its superiority across various settings. Experiments on the two domain generalization benchmarks, VLCS and Terra Incognita, show that ALIGN consistently outperforms six strong baselines in both in-distribution and out-of-distribution settings. Besides, ALIGN also yields superior explanation quality concerning sufficiency and comprehensiveness, highlighting its effectiveness in producing accurate and interpretable models.
Related papers
- Specificity-aware reinforcement learning for fine-grained open-world classification [54.85385270439992]
Classifying fine-grained visual concepts under open-world settings demands models to be both accurate and specific.<n>We propose a novel specificity-aware reinforcement learning framework, SpeciaRL, to fine-tune reasoning LMMs on fine-grained image classification.
arXiv Detail & Related papers (2026-03-03T17:52:39Z) - Explanation-Guided Adversarial Training for Robust and Interpretable Models [23.590037545621755]
We propose Explanation-Guided Adversarial Training (EGAT) to improve prediction performance, robustness, and explanation quality.<n>EGAT generates adversarial examples on the fly while imposing explanation-based constraints on the model.<n>We show that EGAT consistently outperforms competitive baselines in both clean accuracy and adversarial accuracy +37%.
arXiv Detail & Related papers (2026-03-02T14:52:52Z) - GLOW: Graph-Language Co-Reasoning for Agentic Workflow Performance Prediction [51.83437071408662]
We propose GLOW, a unified framework for AW performance prediction.<n>GLOW combines the graph-structure modeling capabilities of GNNs with the reasoning power of LLMs.<n>Experiments on FLORA-Bench show that GLOW outperforms state-of-the-art baselines in prediction accuracy and ranking utility.
arXiv Detail & Related papers (2025-12-11T13:30:46Z) - Self-Supervised Learning on Molecular Graphs: A Systematic Investigation of Masking Design [11.43518417965958]
Self-supervised learning plays a central role in molecular representation learning.<n>Recent innovations in masking-based pretraining are introduced as obscurings and lack principled evaluation.<n>This work cast the entire pretrain-finetune workflow into a unified probabilistic framework.
arXiv Detail & Related papers (2025-12-08T00:52:46Z) - Explaining Fine Tuned LLMs via Counterfactuals A Knowledge Graph Driven Framework [13.533352973355013]
Low-Rank Adaptation (LoRA) has enabled large language models to acquire domain-specific knowledge with remarkable efficiency.<n>This work introduces a novel framework that explains fine-tuned LLMs via counterfactuals grounded in knowledge graphs.
arXiv Detail & Related papers (2025-09-25T14:37:40Z) - AIM: Amending Inherent Interpretability via Self-Supervised Masking [57.17600766859953]
We propose "Amending Inherent Interpretability via Self-Supervised Masking" (AIM)<n>AIM promotes the network's utilization of genuine features over spurious alternatives without requiring additional annotations.<n>We validate AIM across a range of challenging datasets that test both out-of-distribution generalization and fine-grained visual understanding.
arXiv Detail & Related papers (2025-08-15T14:29:59Z) - GrAInS: Gradient-based Attribution for Inference-Time Steering of LLMs and VLMs [56.93583799109029]
GrAInS is an inference-time steering approach that operates across both language-only and vision-language models and tasks.<n>During inference, GrAInS hidden activations at transformer layers guided by token-level attribution signals, and normalizes activations to preserve representational scale.<n>It consistently outperforms both fine-tuning and existing steering baselines.
arXiv Detail & Related papers (2025-07-24T02:34:13Z) - Simple yet Effective Semi-supervised Knowledge Distillation from Vision-Language Models via Dual-Head Optimization [47.38380084735716]
vision-supervised models (VLMs), pre-trained on massive image-text pairs, have demonstrated remarkable zero-/few-shot performance.<n>Knowledge distillation (KD) offers a natural framework for transferring VLM capabilities, but it suffers from gradient conflicts between supervised and distillation losses.<n>We propose Dual-Head Optimization (DHO), which introduces dual prediction heads for each distinct signal.
arXiv Detail & Related papers (2025-05-12T15:39:51Z) - Harmonizing Visual Representations for Unified Multimodal Understanding and Generation [53.01486796503091]
We present emphHarmon, a unified autoregressive framework that harmonizes understanding and generation tasks with a shared MAR encoder.<n>Harmon achieves state-of-the-art image generation results on the GenEval, MJHQ30K and WISE benchmarks.
arXiv Detail & Related papers (2025-03-27T20:50:38Z) - Improving Network Interpretability via Explanation Consistency Evaluation [56.14036428778861]
We propose a framework that acquires more explainable activation heatmaps and simultaneously increase the model performance.
Specifically, our framework introduces a new metric, i.e., explanation consistency, to reweight the training samples adaptively in model learning.
Our framework then promotes the model learning by paying closer attention to those training samples with a high difference in explanations.
arXiv Detail & Related papers (2024-08-08T17:20:08Z) - CRoFT: Robust Fine-Tuning with Concurrent Optimization for OOD Generalization and Open-Set OOD Detection [42.33618249731874]
We show that minimizing the magnitude of energy scores on training data leads to domain-consistent Hessians of classification loss.
We have developed a unified fine-tuning framework that allows for concurrent optimization of both tasks.
arXiv Detail & Related papers (2024-05-26T03:28:59Z) - Understanding Masked Autoencoders via Hierarchical Latent Variable
Models [109.35382136147349]
Masked autoencoder (MAE) has recently achieved prominent success in a variety of vision tasks.
Despite the emergence of intriguing empirical observations on MAE, a theoretically principled understanding is still lacking.
arXiv Detail & Related papers (2023-06-08T03:00:10Z) - Be Your Own Neighborhood: Detecting Adversarial Example by the
Neighborhood Relations Built on Self-Supervised Learning [64.78972193105443]
This paper presents a novel AE detection framework, named trustworthy for predictions.
performs the detection by distinguishing the AE's abnormal relation with its augmented versions.
An off-the-shelf Self-Supervised Learning (SSL) model is used to extract the representation and predict the label.
arXiv Detail & Related papers (2022-08-31T08:18:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.