Related papers: Dynamic Top-k Estimation Consolidates Disagreement between Feature Attribution Methods

Dynamic Top-k Estimation Consolidates Disagreement between Feature Attribution Methods

URL: http://arxiv.org/abs/2310.05619v2
Date: Fri, 3 Nov 2023 12:11:17 GMT
Title: Dynamic Top-k Estimation Consolidates Disagreement between Feature Attribution Methods
Authors: Jonathan Kamp, Lisa Beinborn, Antske Fokkens
Abstract summary: We find that perturbation-based methods and Vanilla Gradient exhibit highest agreement on most method--method and method--human agreement metrics with a static k. This is the first evidence that sequential properties of attribution scores are informative for consolidating attribution signals for human interpretation.
Score: 5.202524136984542
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Feature attribution scores are used for explaining the prediction of a text classifier to users by highlighting a k number of tokens. In this work, we propose a way to determine the number of optimal k tokens that should be displayed from sequential properties of the attribution scores. Our approach is dynamic across sentences, method-agnostic, and deals with sentence length bias. We compare agreement between multiple methods and humans on an NLI task, using fixed k and dynamic k. We find that perturbation-based methods and Vanilla Gradient exhibit highest agreement on most method--method and method--human agreement metrics with a static k. Their advantage over other methods disappears with dynamic ks which mainly improve Integrated Gradient and GradientXInput. To our knowledge, this is the first evidence that sequential properties of attribution scores are informative for consolidating attribution signals for human interpretation.

Related papers

Hierarchical Indexing for Retrieval-Augmented Opinion Summarization [60.5923941324953]
We propose a method for unsupervised abstractive opinion summarization that combines the attributability and scalability of extractive approaches with the coherence and fluency of Large Language Models (LLMs) Our method, HIRO, learns an index structure that maps sentences to a path through a semantically organized discrete hierarchy. At inference time, we populate the index and use it to identify and retrieve clusters of sentences containing popular opinions from input reviews.
arXiv Detail & Related papers (2024-03-01T10:38:07Z)
DenoSent: A Denoising Objective for Self-Supervised Sentence Representation Learning [59.4644086610381]
We propose a novel denoising objective that inherits from another perspective, i.e., the intra-sentence perspective. By introducing both discrete and continuous noise, we generate noisy sentences and then train our model to restore them to their original form. Our empirical evaluations demonstrate that this approach delivers competitive results on both semantic textual similarity (STS) and a wide range of transfer tasks.
arXiv Detail & Related papers (2024-01-24T17:48:45Z)
Enhancing Coherence of Extractive Summarization with Multitask Learning [40.349019691412465]
This study proposes a multitask learning architecture for extractive summarization with coherence boosting. The architecture contains an extractive summarizer and coherent discriminator module. Experiments show that our proposed method significantly improves the proportion of consecutive sentences in the extracted summaries.
arXiv Detail & Related papers (2023-05-22T09:20:58Z)
Retrieval-Augmented Classification with Decoupled Representation [31.662843145399044]
We propose a $k$-nearest-neighbor (KNN)-based method for retrieval augmented classifications. We find that shared representation for classification and retrieval hurts performance and leads to training instability. We evaluate our method on a wide range of classification datasets.
arXiv Detail & Related papers (2023-03-23T06:33:06Z)
Easy to Decide, Hard to Agree: Reducing Disagreements Between Saliency Methods [0.15039745292757667]
We show that saliency methods exhibit weak rank correlations even when applied to the same model instance. Regularization techniques that increase faithfulness of attention explanations also increase agreement between saliency methods.
arXiv Detail & Related papers (2022-11-15T18:18:34Z)
Concrete Score Matching: Generalized Score Matching for Discrete Data [109.12439278055213]
"Concrete score" is a generalization of the (Stein) score for discrete settings. "Concrete Score Matching" is a framework to learn such scores from samples.
arXiv Detail & Related papers (2022-11-02T00:41:37Z)
Pruned Graph Neural Network for Short Story Ordering [0.7087237546722617]
Organizing sentences into an order that maximizes coherence is known as sentence ordering. We propose a new method for constructing sentence-entity graphs of short stories to create the edges between sentences. We also observe that replacing pronouns with their referring entities effectively encodes sentences in sentence-entity graphs.
arXiv Detail & Related papers (2022-03-13T22:25:17Z)
Sequential Recommendation via Stochastic Self-Attention [68.52192964559829]
Transformer-based approaches embed items as vectors and use dot-product self-attention to measure the relationship between items. We propose a novel textbfSTOchastic textbfSelf-textbfAttention(STOSA) to overcome these issues. We devise a novel Wasserstein Self-Attention module to characterize item-item position-wise relationships in sequences.
arXiv Detail & Related papers (2022-01-16T12:38:45Z)
Variable Instance-Level Explainability for Text Classification [9.147707153504117]
We propose a method for extracting variable-length explanations using a set of different feature scoring methods at instance-level. Our method consistently provides more faithful explanations compared to previous fixed-length and fixed-feature scoring methods for rationale extraction.
arXiv Detail & Related papers (2021-04-16T16:53:48Z)
Explaining and Improving Model Behavior with k Nearest Neighbor Representations [107.24850861390196]
We propose using k nearest neighbor representations to identify training examples responsible for a model's predictions. We show that kNN representations are effective at uncovering learned spurious associations. Our results indicate that the kNN approach makes the finetuned model more robust to adversarial inputs.
arXiv Detail & Related papers (2020-10-18T16:55:25Z)
Weakly-Supervised Aspect-Based Sentiment Analysis via Joint Aspect-Sentiment Topic Embedding [71.2260967797055]
We propose a weakly-supervised approach for aspect-based sentiment analysis. We learn sentiment, aspect> joint topic embeddings in the word embedding space. We then use neural models to generalize the word-level discriminative information.
arXiv Detail & Related papers (2020-10-13T21:33:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.