An Attribution Method for Siamese Encoders
- URL: http://arxiv.org/abs/2310.05703v3
- Date: Wed, 29 Nov 2023 15:12:00 GMT
- Title: An Attribution Method for Siamese Encoders
- Authors: Lucas M\"oller, Dmitry Nikolaev, Sebastian Pad\'o
- Abstract summary: This paper derives a local attribution method for Siamese encoders by generalizing the principle of integrated gradients to models with multiple inputs.
A pilot study shows that in an ST few token-pairs can often explain large fractions of predictions, and it focuses on nouns and verbs.
- Score: 2.1163800956183776
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Despite the success of Siamese encoder models such as sentence transformers
(ST), little is known about the aspects of inputs they pay attention to. A
barrier is that their predictions cannot be attributed to individual features,
as they compare two inputs rather than processing a single one. This paper
derives a local attribution method for Siamese encoders by generalizing the
principle of integrated gradients to models with multiple inputs. The solution
takes the form of feature-pair attributions, and can be reduced to a
token-token matrix for STs. Our method involves the introduction of integrated
Jacobians and inherits the advantageous formal properties of integrated
gradients: it accounts for the model's full computation graph and is guaranteed
to converge to the actual prediction. A pilot study shows that in an ST few
token-pairs can often explain large fractions of predictions, and it focuses on
nouns and verbs. For accurate predictions, it however needs to attend to the
majority of tokens and parts of speech.
Related papers
- The Foundations of Tokenization: Statistical and Computational Concerns [51.370165245628975]
Tokenization is the practice of converting strings of characters over an alphabet into sequences of tokens over a vocabulary.
This paper lays the foundations of tokenization from a formal perspective.
arXiv Detail & Related papers (2024-07-16T11:12:28Z) - Understanding and Mitigating Tokenization Bias in Language Models [6.418593476658017]
State-of-the-art language models are autoregressive and operate on subword units known as tokens.
We show that popular encoding schemes induce a sampling bias that cannot be mitigated with more training or data.
We propose a novel algorithm to obtain unbiased estimates from any language model trained on tokenized data.
arXiv Detail & Related papers (2024-06-24T17:38:02Z) - TokenUnify: Scalable Autoregressive Visual Pre-training with Mixture Token Prediction [61.295716741720284]
TokenUnify is a novel pretraining method that integrates random token prediction, next-token prediction, and next-all token prediction.
Cooperated with TokenUnify, we have assembled a large-scale electron microscopy (EM) image dataset with ultra-high resolution.
This dataset includes over 120 million annotated voxels, making it the largest neuron segmentation dataset to date.
arXiv Detail & Related papers (2024-05-27T05:45:51Z) - Approximate Attributions for Off-the-Shelf Siamese Transformers [2.1163800956183776]
Siamese encoders such as sentence transformers are among the least understood deep models.
We propose a model with exact attribution ability that retains the original model's predictive performance.
We also propose a way to compute approximate attributions for off-the-shelf models.
arXiv Detail & Related papers (2024-02-05T10:49:05Z) - Object Recognition as Next Token Prediction [99.40793702627396]
We present an approach to pose object recognition as next token prediction.
The idea is to apply a language decoder that auto-regressively predicts the text tokens from image embeddings to form labels.
arXiv Detail & Related papers (2023-12-04T18:58:40Z) - Token Fusion: Bridging the Gap between Token Pruning and Token Merging [71.84591084401458]
Vision Transformers (ViTs) have emerged as powerful backbones in computer vision, outperforming many traditional CNNs.
computational overhead, largely attributed to the self-attention mechanism, makes deployment on resource-constrained edge devices challenging.
We introduce "Token Fusion" (ToFu), a method that amalgamates the benefits of both token pruning and token merging.
arXiv Detail & Related papers (2023-12-02T04:29:19Z) - On the Robustness of Text Vectorizers [9.904746542801838]
In natural language processing, models typically contain a first embedding layer, transforming a sequence of tokens into vector representations.
While the robustness with respect to changes of continuous inputs is well-understood, the situation is less clear when considering discrete changes.
Our work formally proves that popular embedding schemes, such as concatenation, TF-IDF, and paragraph Vector (a.k.a. doc2vec), exhibit robustness in the H"older or Lipschitz sense with respect to the Hamming distance.
arXiv Detail & Related papers (2023-03-09T16:37:37Z) - Fast End-to-End Speech Recognition via a Non-Autoregressive Model and
Cross-Modal Knowledge Transferring from BERT [72.93855288283059]
We propose a non-autoregressive speech recognition model called LASO (Listen Attentively, and Spell Once)
The model consists of an encoder, a decoder, and a position dependent summarizer (PDS)
arXiv Detail & Related papers (2021-02-15T15:18:59Z) - Autoencoding Variational Autoencoder [56.05008520271406]
We study the implications of this behaviour on the learned representations and also the consequences of fixing it by introducing a notion of self consistency.
We show that encoders trained with our self-consistency approach lead to representations that are robust (insensitive) to perturbations in the input introduced by adversarial attacks.
arXiv Detail & Related papers (2020-12-07T14:16:14Z) - Learning to Faithfully Rationalize by Construction [36.572594249534866]
In many settings it is important to be able to understand why a model made a particular prediction.
We propose a simpler variant of this approach that provides faithful explanations by construction.
In both automatic and manual evaluations we find that variants of this simple framework yield superior to end-to-end' approaches.
arXiv Detail & Related papers (2020-04-30T21:45:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.