Faithfulness Measurable Masked Language Models
- URL: http://arxiv.org/abs/2310.07819v3
- Date: Tue, 27 Aug 2024 21:37:57 GMT
- Title: Faithfulness Measurable Masked Language Models
- Authors: Andreas Madsen, Siva Reddy, Sarath Chandar,
- Abstract summary: A common approach to explaining NLP models is to use importance measures that express which tokens are important for a prediction.
One such metric is if tokens are truly important, then masking them should result in worse model performance.
This work proposes an inherently faithfulness measurable model that addresses these challenges.
- Score: 35.40666730867487
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: A common approach to explaining NLP models is to use importance measures that express which tokens are important for a prediction. Unfortunately, such explanations are often wrong despite being persuasive. Therefore, it is essential to measure their faithfulness. One such metric is if tokens are truly important, then masking them should result in worse model performance. However, token masking introduces out-of-distribution issues, and existing solutions that address this are computationally expensive and employ proxy models. Furthermore, other metrics are very limited in scope. This work proposes an inherently faithfulness measurable model that addresses these challenges. This is achieved using a novel fine-tuning method that incorporates masking, such that masking tokens become in-distribution by design. This differs from existing approaches, which are completely model-agnostic but are inapplicable in practice. We demonstrate the generality of our approach by applying it to 16 different datasets and validate it using statistical in-distribution tests. The faithfulness is then measured with 9 different importance measures. Because masking is in-distribution, importance measures that themselves use masking become consistently more faithful. Additionally, because the model makes faithfulness cheap to measure, we can optimize explanations towards maximal faithfulness; thus, our model becomes indirectly inherently explainable.
Related papers
- Towards Faithful Natural Language Explanations: A Study Using Activation Patching in Large Language Models [29.67884478799914]
Large Language Models (LLMs) are capable of generating persuasive Natural Language Explanations (NLEs) to justify their answers.
Recent studies have proposed various methods to measure the faithfulness of NLEs, typically by inserting perturbations at the explanation or feature level.
We argue that these approaches are neither comprehensive nor correctly designed according to the established definition of faithfulness.
arXiv Detail & Related papers (2024-10-18T03:45:42Z) - Cycles of Thought: Measuring LLM Confidence through Stable Explanations [53.15438489398938]
Large language models (LLMs) can reach and even surpass human-level accuracy on a variety of benchmarks, but their overconfidence in incorrect responses is still a well-documented failure mode.
We propose a framework for measuring an LLM's uncertainty with respect to the distribution of generated explanations for an answer.
arXiv Detail & Related papers (2024-06-05T16:35:30Z) - Uncertainty in Language Models: Assessment through Rank-Calibration [65.10149293133846]
Language Models (LMs) have shown promising performance in natural language generation.
It is crucial to correctly quantify their uncertainty in responding to given inputs.
We develop a novel and practical framework, termed $Rank$-$Calibration$, to assess uncertainty and confidence measures for LMs.
arXiv Detail & Related papers (2024-04-04T02:31:05Z) - Fairness Without Harm: An Influence-Guided Active Sampling Approach [32.173195437797766]
We aim to train models that mitigate group fairness disparity without causing harm to model accuracy.
The current data acquisition methods, such as fair active learning approaches, typically require annotating sensitive attributes.
We propose a tractable active data sampling algorithm that does not rely on training group annotations.
arXiv Detail & Related papers (2024-02-20T07:57:38Z) - Incorporating Attribution Importance for Improving Faithfulness Metrics [36.02988430743367]
Feature attribution methods (FAs) are popular approaches for providing insights into the model reasoning process of making predictions.
We propose a simple yet effective soft erasure criterion.
Our experiments show that our soft-sufficiency and soft-comprehensiveness metrics consistently prefer more faithful explanations.
arXiv Detail & Related papers (2023-05-17T18:05:49Z) - Improving Identity-Robustness for Face Models [9.721206532236515]
We explore using face-recognition embedding vectors, as proxies for identities, to enforce such robustness.
We do so by weighting samples according to their conditional inverse density (CID) in the proxy embedding space.
Our experiments suggest that such a simple sample weighting scheme, not only improves the training robustness, it often improves the overall performance.
arXiv Detail & Related papers (2023-04-07T20:41:10Z) - VisFIS: Visual Feature Importance Supervision with
Right-for-the-Right-Reason Objectives [84.48039784446166]
We show that model FI supervision can meaningfully improve VQA model accuracy as well as performance on several Right-for-the-Right-Reason metrics.
Our best performing method, Visual Feature Importance Supervision (VisFIS), outperforms strong baselines on benchmark VQA datasets.
Predictions are more accurate when explanations are plausible and faithful, and not when they are plausible but not faithful.
arXiv Detail & Related papers (2022-06-22T17:02:01Z) - Extreme Masking for Learning Instance and Distributed Visual
Representations [50.152264456036114]
The paper presents a scalable approach for learning distributed representations over individual tokens and a holistic instance representation simultaneously.
We use self-attention blocks to represent distributed tokens, followed by cross-attention blocks to aggregate the holistic instance.
Our model, named ExtreMA, follows the plain BYOL approach where the instance representation from the unmasked subset is trained to predict that from the intact input.
arXiv Detail & Related papers (2022-06-09T17:59:43Z) - Masksembles for Uncertainty Estimation [60.400102501013784]
Deep neural networks have amply demonstrated their prowess but estimating the reliability of their predictions remains challenging.
Deep Ensembles are widely considered as being one of the best methods for generating uncertainty estimates but are very expensive to train and evaluate.
MC-Dropout is another popular alternative, which is less expensive, but also less reliable.
arXiv Detail & Related papers (2020-12-15T14:39:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.