Uniform Discretized Integrated Gradients: An effective attribution based method for explaining large language models
- URL: http://arxiv.org/abs/2412.03886v1
- Date: Thu, 05 Dec 2024 05:39:03 GMT
- Title: Uniform Discretized Integrated Gradients: An effective attribution based method for explaining large language models
- Authors: Swarnava Sinha Roy, Ayan Kundu,
- Abstract summary: Integrated Gradients is a well-known technique for explaining deep learning models.
In this paper, we propose a method called Uniform Discretized Integrated Gradients (UDIG)
We evaluate our method on two types of NLP tasks- Sentiment Classification and Question Answering against three metrics viz Log odds, Comprehensiveness and Sufficiency.
- Score: 0.0
- License:
- Abstract: Integrated Gradients is a well-known technique for explaining deep learning models. It calculates feature importance scores by employing a gradient based approach computing gradients of the model output with respect to input features and accumulating them along a linear path. While this works well for continuous features spaces, it may not be the most optimal way to deal with discrete spaces like word embeddings. For interpreting LLMs (Large Language Models), there exists a need for a non-linear path where intermediate points, whose gradients are to be computed, lie close to actual words in the embedding space. In this paper, we propose a method called Uniform Discretized Integrated Gradients (UDIG) based on a new interpolation strategy where we choose a favorable nonlinear path for computing attribution scores suitable for predictive language models. We evaluate our method on two types of NLP tasks- Sentiment Classification and Question Answering against three metrics viz Log odds, Comprehensiveness and Sufficiency. For sentiment classification, we have used the SST2, IMDb and Rotten Tomatoes datasets for benchmarking and for Question Answering, we have used the fine-tuned BERT model on SQuAD dataset. Our approach outperforms the existing methods in almost all the metrics.
Related papers
- Class Gradient Projection For Continual Learning [99.105266615448]
Catastrophic forgetting is one of the most critical challenges in Continual Learning (CL)
We propose Class Gradient Projection (CGP), which calculates the gradient subspace from individual classes rather than tasks.
arXiv Detail & Related papers (2023-11-25T02:45:56Z) - GRANDE: Gradient-Based Decision Tree Ensembles for Tabular Data [9.107782510356989]
We propose a novel approach for learning hard, axis-aligned decision tree ensembles using end-to-end gradient descent.
Grande is based on a dense representation of tree ensembles, which affords to use backpropagation with a straight-through operator.
We demonstrate that our method outperforms existing gradient-boosting and deep learning frameworks on most datasets.
arXiv Detail & Related papers (2023-09-29T10:49:14Z) - The Languini Kitchen: Enabling Language Modelling Research at Different
Scales of Compute [66.84421705029624]
We introduce an experimental protocol that enables model comparisons based on equivalent compute, measured in accelerator hours.
We pre-process an existing large, diverse, and high-quality dataset of books that surpasses existing academic benchmarks in quality, diversity, and document length.
This work also provides two baseline models: a feed-forward model derived from the GPT-2 architecture and a recurrent model in the form of a novel LSTM with ten-fold throughput.
arXiv Detail & Related papers (2023-09-20T10:31:17Z) - Neural Gradient Learning and Optimization for Oriented Point Normal
Estimation [53.611206368815125]
We propose a deep learning approach to learn gradient vectors with consistent orientation from 3D point clouds for normal estimation.
We learn an angular distance field based on local plane geometry to refine the coarse gradient vectors.
Our method efficiently conducts global gradient approximation while achieving better accuracy and ability generalization of local feature description.
arXiv Detail & Related papers (2023-09-17T08:35:11Z) - Geometrically Guided Integrated Gradients [0.3867363075280543]
We introduce an interpretability method called "geometrically-guided integrated gradients"
Our method explores the model's dynamic behavior from multiple scaled versions of the input and captures the best possible attribution for each input.
We also propose a "model perturbation" sanity check to complement the traditionally used "model randomization" test.
arXiv Detail & Related papers (2022-06-13T05:05:43Z) - Bi-level Alignment for Cross-Domain Crowd Counting [113.78303285148041]
Current methods rely on external data for training an auxiliary task or apply an expensive coarse-to-fine estimation.
We develop a new adversarial learning based method, which is simple and efficient to apply.
We evaluate our approach on five real-world crowd counting benchmarks, where we outperform existing approaches by a large margin.
arXiv Detail & Related papers (2022-05-12T02:23:25Z) - Locally Aggregated Feature Attribution on Natural Language Model
Understanding [12.233103741197334]
Locally Aggregated Feature Attribution (LAFA) is a novel gradient-based feature attribution method for NLP models.
Instead of relying on obscure reference tokens, it smooths gradients by aggregating similar reference texts derived from language model embeddings.
For evaluation purpose, we also design experiments on different NLP tasks including Entity Recognition and Sentiment Analysis on public datasets.
arXiv Detail & Related papers (2022-04-22T18:59:27Z) - Efficient Nearest Neighbor Language Models [114.40866461741795]
Non-parametric neural language models (NLMs) learn predictive distributions of text utilizing an external datastore.
We show how to achieve up to a 6x speed-up in inference speed while retaining comparable performance.
arXiv Detail & Related papers (2021-09-09T12:32:28Z) - Discretized Integrated Gradients for Explaining Language Models [43.2877233809206]
Integrated Gradients (IG) is a prominent attribution-based explanation algorithm.
We propose Discretized Integrated Gradients (DIG) which allows effective attribution along non-linear paths.
arXiv Detail & Related papers (2021-08-31T07:36:34Z) - Differentiable Segmentation of Sequences [2.1485350418225244]
We build on advances in learning continuous warping functions and propose a novel family of warping functions based on the two-sided power (TSP) distribution.
Our formulation includes the important class of segmented generalized linear models as a special case.
We use our approach to model the spread of COVID-19 with Poisson regression, apply it on a change point detection task, and learn classification models with concept drift.
arXiv Detail & Related papers (2020-06-23T15:51:48Z) - Spatial Pyramid Based Graph Reasoning for Semantic Segmentation [67.47159595239798]
We apply graph convolution into the semantic segmentation task and propose an improved Laplacian.
The graph reasoning is directly performed in the original feature space organized as a spatial pyramid.
We achieve comparable performance with advantages in computational and memory overhead.
arXiv Detail & Related papers (2020-03-23T12:28:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.