Toward Faithfulness-guided Ensemble Interpretation of Neural Network
- URL: http://arxiv.org/abs/2509.04588v1
- Date: Thu, 04 Sep 2025 18:09:17 GMT
- Title: Toward Faithfulness-guided Ensemble Interpretation of Neural Network
- Authors: Siyu Zhang, Kenneth Mcmillan,
- Abstract summary: Interpretable and faithful explanations for specific neural inferences are crucial for understanding and evaluating model behavior.<n>Our work introduces textbfFaithfulness-guided textbfEnsemble textbfInterpretation (textbfFEI), an innovative framework that enhances the breadth and effectiveness of faithfulness.
- Score: 4.811608153296386
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Interpretable and faithful explanations for specific neural inferences are crucial for understanding and evaluating model behavior. Our work introduces \textbf{F}aithfulness-guided \textbf{E}nsemble \textbf{I}nterpretation (\textbf{FEI}), an innovative framework that enhances the breadth and effectiveness of faithfulness, advancing interpretability by providing superior visualization. Through an analysis of existing evaluation benchmarks, \textbf{FEI} employs a smooth approximation to elevate quantitative faithfulness scores. Diverse variations of \textbf{FEI} target enhanced faithfulness in hidden layer encodings, expanding interpretability. Additionally, we propose a novel qualitative metric that assesses hidden layer faithfulness. In extensive experiments, \textbf{FEI} surpasses existing methods, demonstrating substantial advances in qualitative visualization and quantitative faithfulness scores. Our research establishes a comprehensive framework for elevating faithfulness in neural network explanations, emphasizing both breadth and precision
Related papers
- Analyzing Latent Concepts in Code Language Models [10.214183897113118]
We propose Code Concept Analysis (CoCoA): a global post-hoc interpretability framework.<n>CoCoA uncovers emergent lexical, syntactic, and semantic structures in a code language model's representation space.<n>We propose a hybrid annotation pipeline that combines static analysis tool-based syntactic alignment with prompt-engineered large language models.
arXiv Detail & Related papers (2025-10-01T03:53:21Z) - DEFNet: Multitasks-based Deep Evidential Fusion Network for Blind Image Quality Assessment [5.517243185525322]
Blind image quality assessment (BIQA) methods often incorporate auxiliary tasks to improve performance.<n>We propose a multitasks-based Deep Evidential Fusion Network (DEFNet) for BIQA, which performs multitask optimization with the assistance of scene and distortion type classification tasks.
arXiv Detail & Related papers (2025-07-25T16:36:45Z) - ParamMute: Suppressing Knowledge-Critical FFNs for Faithful Retrieval-Augmented Generation [91.20492150248106]
We investigate the internal mechanisms behind unfaithful generation and identify a subset of mid-to-deep feed-forward networks (FFNs) that are disproportionately activated in such cases.<n>We propose Parametric Knowledge Muting through FFN Suppression (ParamMute), a framework that improves contextual faithfulness by suppressing the activation of unfaithfulness-associated FFNs.<n> Experimental results show that ParamMute significantly enhances faithfulness across both CoFaithfulQA and the established ConFiQA benchmark, achieving substantial reductions in reliance on parametric memory.
arXiv Detail & Related papers (2025-02-21T15:50:41Z) - Dynamic Cross-Modal Alignment for Robust Semantic Location Prediction [0.0]
This paper introduces textitContextualized Vision-Language Alignment (CoVLA), a discriminative framework designed to address the challenges of contextual ambiguity and modality discrepancy inherent in this task.<n>Experiments on a benchmark dataset demonstrate that CoVLA significantly outperforms state-of-the-art methods, achieving improvements of 2.3% in accuracy and 2.5% in F1-score.
arXiv Detail & Related papers (2024-12-13T05:29:37Z) - On the Fairness, Diversity and Reliability of Text-to-Image Generative Models [68.62012304574012]
multimodal generative models have sparked critical discussions on their reliability, fairness and potential for misuse.<n>We propose an evaluation framework to assess model reliability by analyzing responses to global and local perturbations in the embedding space.<n>Our method lays the groundwork for detecting unreliable, bias-injected models and tracing the provenance of embedded biases.
arXiv Detail & Related papers (2024-11-21T09:46:55Z) - Improving Network Interpretability via Explanation Consistency Evaluation [56.14036428778861]
We propose a framework that acquires more explainable activation heatmaps and simultaneously increase the model performance.
Specifically, our framework introduces a new metric, i.e., explanation consistency, to reweight the training samples adaptively in model learning.
Our framework then promotes the model learning by paying closer attention to those training samples with a high difference in explanations.
arXiv Detail & Related papers (2024-08-08T17:20:08Z) - Understanding Before Recommendation: Semantic Aspect-Aware Review Exploitation via Large Language Models [53.337728969143086]
Recommendation systems harness user-item interactions like clicks and reviews to learn their representations.
Previous studies improve recommendation accuracy and interpretability by modeling user preferences across various aspects and intents.
We introduce a chain-based prompting approach to uncover semantic aspect-aware interactions.
arXiv Detail & Related papers (2023-12-26T15:44:09Z) - Towards Better Visualizing the Decision Basis of Networks via Unfold and Conquer Attribution Guidance [29.016425469068587]
We propose a novel framework, Unfold and Conquer Guidance (UCAG), which enhances the explainability of the network decision.<n>UCAG sequentially complies with the confidence of slices of the image, leading to providing an abundant and clear interpretation.<n>We conduct numerous evaluations to validate the performance in several metrics.
arXiv Detail & Related papers (2023-12-21T03:43:19Z) - Towards General Visual-Linguistic Face Forgery Detection [95.73987327101143]
Deepfakes are realistic face manipulations that can pose serious threats to security, privacy, and trust.
Existing methods mostly treat this task as binary classification, which uses digital labels or mask signals to train the detection model.
We propose a novel paradigm named Visual-Linguistic Face Forgery Detection(VLFFD), which uses fine-grained sentence-level prompts as the annotation.
arXiv Detail & Related papers (2023-07-31T10:22:33Z) - Explaining Language Models' Predictions with High-Impact Concepts [11.47612457613113]
We propose a complete framework for extending concept-based interpretability methods to NLP.
We optimize for features whose existence causes the output predictions to change substantially.
Our method achieves superior results on predictive impact, usability, and faithfulness compared to the baselines.
arXiv Detail & Related papers (2023-05-03T14:48:27Z) - ARBEx: Attentive Feature Extraction with Reliability Balancing for Robust Facial Expression Learning [5.648318448953635]
ARBEx is a novel attentive feature extraction framework driven by Vision Transformer.
We employ learnable anchor points in the embedding space with label distributions and multi-head self-attention mechanism to optimize performance against weak predictions.
Our strategy outperforms current state-of-the-art methodologies, according to extensive experiments conducted in a variety of contexts.
arXiv Detail & Related papers (2023-05-02T15:10:01Z) - A Fine-grained Interpretability Evaluation Benchmark for Neural NLP [44.08113828762984]
This benchmark covers three representative NLP tasks: sentiment analysis, textual similarity and reading comprehension.
We provide token-level rationales that are carefully annotated to be sufficient, compact and comprehensive.
We conduct experiments on three typical models with three saliency methods, and unveil their strengths and weakness in terms of interpretability.
arXiv Detail & Related papers (2022-05-23T07:37:04Z) - Trusted Multi-View Classification with Dynamic Evidential Fusion [73.35990456162745]
We propose a novel multi-view classification algorithm, termed trusted multi-view classification (TMC)
TMC provides a new paradigm for multi-view learning by dynamically integrating different views at an evidence level.
Both theoretical and experimental results validate the effectiveness of the proposed model in accuracy, robustness and trustworthiness.
arXiv Detail & Related papers (2022-04-25T03:48:49Z) - Trusted Multi-View Classification [76.73585034192894]
We propose a novel multi-view classification method, termed trusted multi-view classification.
It provides a new paradigm for multi-view learning by dynamically integrating different views at an evidence level.
The proposed algorithm jointly utilizes multiple views to promote both classification reliability and robustness.
arXiv Detail & Related papers (2021-02-03T13:30:26Z) - Towards Faithfully Interpretable NLP Systems: How should we define and
evaluate faithfulness? [58.13152510843004]
With the growing popularity of deep-learning based NLP models, comes a need for interpretable systems.
What is interpretability, and what constitutes a high-quality interpretation?
We call for more clearly differentiating between different desired criteria an interpretation should satisfy, and focus on the faithfulness criteria.
arXiv Detail & Related papers (2020-04-07T20:15:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.