Related papers: Approximate Attributions for Off-the-Shelf Siamese Transformers

Approximate Attributions for Off-the-Shelf Siamese Transformers

URL: http://arxiv.org/abs/2402.02883v1
Date: Mon, 5 Feb 2024 10:49:05 GMT
Title: Approximate Attributions for Off-the-Shelf Siamese Transformers
Authors: Lucas M\"oller and Dmitry Nikolaev and Sebastian Pad\'o
Abstract summary: Siamese encoders such as sentence transformers are among the least understood deep models. We propose a model with exact attribution ability that retains the original model's predictive performance. We also propose a way to compute approximate attributions for off-the-shelf models.
Score: 2.1163800956183776
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Siamese encoders such as sentence transformers are among the least understood deep models. Established attribution methods cannot tackle this model class since it compares two inputs rather than processing a single one. To address this gap, we have recently proposed an attribution method specifically for Siamese encoders (M\"oller et al., 2023). However, it requires models to be adjusted and fine-tuned and therefore cannot be directly applied to off-the-shelf models. In this work, we reassess these restrictions and propose (i) a model with exact attribution ability that retains the original model's predictive performance and (ii) a way to compute approximate attributions for off-the-shelf models. We extensively compare approximate and exact attributions and use them to analyze the models' attendance to different linguistic aspects. We gain insights into which syntactic roles Siamese transformers attend to, confirm that they mostly ignore negation, explore how they judge semantically opposite adjectives, and find that they exhibit lexical bias.

Related papers

Token-level Ensembling of Models with Different Vocabularies [16.094010998574753]
Model ensembling is a technique to combine the predicted distributions of two or more models. This paper proposes an inference-time only algorithm that allows for ensembling models with different vocabularies.
arXiv Detail & Related papers (2025-02-28T17:41:27Z)
Predicting the Performance of Black-box LLMs through Self-Queries [60.87193950962585]
Large language models (LLMs) are increasingly relied on in AI systems, predicting when they make mistakes is crucial. In this paper, we extract features of LLMs in a black-box manner by using follow-up prompts and taking the probabilities of different responses as representations. We demonstrate that training a linear model on these low-dimensional representations produces reliable predictors of model performance at the instance level.
arXiv Detail & Related papers (2025-01-02T22:26:54Z)
Promises and Pitfalls of Generative Masked Language Modeling: Theoretical Framework and Practical Guidelines [74.42485647685272]
We focus on Generative Masked Language Models (GMLMs) We train a model to fit conditional probabilities of the data distribution via masking, which are subsequently used as inputs to a Markov Chain to draw samples from the model. We adapt the T5 model for iteratively-refined parallel decoding, achieving 2-3x speedup in machine translation with minimal sacrifice in quality.
arXiv Detail & Related papers (2024-07-22T18:00:00Z)
Meaning Representations from Trajectories in Autoregressive Models [106.63181745054571]
We propose to extract meaning representations from autoregressive language models by considering the distribution of all possible trajectories extending an input text. This strategy is prompt-free, does not require fine-tuning, and is applicable to any pre-trained autoregressive model. We empirically show that the representations obtained from large models align well with human annotations, outperform other zero-shot and prompt-free methods on semantic similarity tasks, and can be used to solve more complex entailment and containment tasks that standard embeddings cannot handle.
arXiv Detail & Related papers (2023-10-23T04:35:58Z)
CLIMAX: An exploration of Classifier-Based Contrastive Explanations [5.381004207943597]
We propose a novel post-hoc model XAI technique that provides contrastive explanations justifying the classification of a black box. Our method, which we refer to as CLIMAX, is based on local classifiers. We show that we achieve better consistency as compared to baselines such as LIME, BayLIME, and SLIME.
arXiv Detail & Related papers (2023-07-02T22:52:58Z)
Analyzing Transformers in Embedding Space [59.434807802802105]
We present a theoretical analysis where all parameters of a trained Transformer are interpreted by projecting them into the embedding space. We show that parameters of both pretrained and fine-tuned models can be interpreted in embedding space. Our findings open the door to interpretation methods that, at least in part, abstract away from model specifics and operate in the embedding space only.
arXiv Detail & Related papers (2022-09-06T14:36:57Z)
Bayesian Neural Network Inference via Implicit Models and the Posterior Predictive Distribution [0.8122270502556371]
We propose a novel approach to perform approximate Bayesian inference in complex models such as Bayesian neural networks. The approach is more scalable to large data than Markov Chain Monte Carlo. We see this being useful in applications such as surrogate and physics-based models.
arXiv Detail & Related papers (2022-09-06T02:43:19Z)
What do Toothbrushes do in the Kitchen? How Transformers Think our World is Structured [137.83584233680116]
We investigate what extent transformer-based language models allow for extracting knowledge about object relations. We show that the models combined with the different similarity measures differ greatly in terms of the amount of knowledge they allow for extracting. Surprisingly, static models perform almost as well as contextualized models -- in some cases even better.
arXiv Detail & Related papers (2022-04-12T10:00:20Z)
xFAIR: Better Fairness via Model-based Rebalancing of Protected Attributes [15.525314212209564]
Machine learning software can generate models that inappropriately discriminate against specific protected social groups. We propose xFAIR, a model-based extrapolation method, that is capable of both mitigating bias and explaining the cause.
arXiv Detail & Related papers (2021-10-03T22:10:14Z)
Equivalence of Segmental and Neural Transducer Modeling: A Proof of Concept [56.46135010588918]
We prove that the widely used class of RNN-Transducer models and segmental models (direct HMM) are equivalent. It is shown that blank probabilities translate into segment length probabilities and vice versa.
arXiv Detail & Related papers (2021-04-13T11:20:48Z)
BODAME: Bilevel Optimization for Defense Against Model Extraction [10.877450596327407]
We consider an adversarial setting to prevent model extraction under the assumption that will make best guess on the service provider's attacker. We formulate a surrogate model using the predictions of the true model. We give a tractable transformation and an algorithm for more complicated models that are learned by using gradient descent-based algorithms.
arXiv Detail & Related papers (2021-03-11T17:08:31Z)
To what extent do human explanations of model behavior align with actual model behavior? [91.67905128825402]
We investigated the extent to which human-generated explanations of models' inference decisions align with how models actually make these decisions. We defined two alignment metrics that quantify how well natural language human explanations align with model sensitivity to input words. We find that a model's alignment with human explanations is not predicted by the model's accuracy on NLI.
arXiv Detail & Related papers (2020-12-24T17:40:06Z)
Learning Invariances for Interpretability using Supervised VAE [0.0]
We learn model invariances as a means of interpreting a model. We propose a supervised form of variational auto-encoders (VAEs) We show how combining our model with feature attribution methods it is possible to reach a more fine-grained understanding about the decision process of the model.
arXiv Detail & Related papers (2020-07-15T10:14:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.