Related papers: Attention-likelihood relationship in transformers

Related papers

Am I Blue or Is My Hobby Counting Teardrops? Expression Leakage in Large Language Models as a Symptom of Irrelevancy Disruption [32.655632394093345]
We introduce expression leakage, a novel phenomenon where large language models generate sentimentally charged expressions that are semantically unrelated to the input context.<n>Our experiments show that, as the model scales in the parameter space, the expression leakage reduces within the same LLM family.<n>In addition, our experiments indicate that, when negative sentiment is injected in the prompt, it disrupts the generation process more than the positive sentiment, causing a higher expression leakage rate.
arXiv Detail & Related papers (2025-08-03T10:29:19Z)
Token Activation Map to Visually Explain Multimodal LLMs [23.774995444587667]
We propose an estimated causal inference method to mitigate the interference of context to achieve high-quality MLLM explanation.<n>We term this method Token Activation Map (TAM) to highlight the consideration of interactions between tokens.<n>Our TAM method significantly outperforms existing SoTA methods, showcasing high-quality visualization results.
arXiv Detail & Related papers (2025-06-29T14:50:45Z)
Counterfactual reasoning: an analysis of in-context emergence [49.58529868457226]
Large-scale neural language models (LMs) exhibit remarkable performance in in-context learning.<n>This work studies in-context counterfactual reasoning in language models, that is, to predict the consequences of changes under hypothetical scenarios.
arXiv Detail & Related papers (2025-06-05T16:02:07Z)
Factual Self-Awareness in Language Models: Representation, Robustness, and Scaling [56.26834106704781]
Factual incorrectness in generated content is one of the primary concerns in ubiquitous deployment of large language models (LLMs)<n>We provide evidence supporting the presence of LLMs' internal compass that dictate the correctness of factual recall at the time of generation.<n>Scaling experiments across model sizes and training dynamics highlight that self-awareness emerges rapidly during training and peaks in intermediate layers.
arXiv Detail & Related papers (2025-05-27T16:24:02Z)
ExpliCa: Evaluating Explicit Causal Reasoning in Large Language Models [75.05436691700572]
We introduce ExpliCa, a new dataset for evaluating Large Language Models (LLMs) in explicit causal reasoning. We tested seven commercial and open-source LLMs on ExpliCa through prompting and perplexity-based metrics. Surprisingly, models tend to confound temporal relations with causal ones, and their performance is also strongly influenced by the linguistic order of the events.
arXiv Detail & Related papers (2025-02-21T14:23:14Z)
Can adversarial attacks by large language models be attributed? [1.3812010983144802]
Attributing outputs from Large Language Models in adversarial settings presents significant challenges that are likely to grow in importance. We investigate this attribution problem using formal language theory, specifically language identification in the limit as introduced by Gold and extended by Angluin. Our results show that due to the non-identifiability of certain language classes it is theoretically impossible to attribute outputs to specific LLMs with certainty.
arXiv Detail & Related papers (2024-11-12T18:28:57Z)
Rolling the DICE on Idiomaticity: How LLMs Fail to Grasp Context [12.781022584125925]
We construct a novel, controlled contrastive dataset designed to test whether LLMs can effectively use context to disambiguate idiomatic meaning. Our findings reveal that LLMs often fail to resolve idiomaticity when it is required to attend to the surrounding context. We make our code and dataset publicly available.
arXiv Detail & Related papers (2024-10-21T14:47:37Z)
Explaining Text Similarity in Transformer Models [52.571158418102584]
Recent advances in explainable AI have made it possible to mitigate limitations by leveraging improved explanations for Transformers. We use BiLRP, an extension developed for computing second-order explanations in bilinear similarity models, to investigate which feature interactions drive similarity in NLP models. Our findings contribute to a deeper understanding of different semantic similarity tasks and models, highlighting how novel explainable AI methods enable in-depth analyses and corpus-level insights.
arXiv Detail & Related papers (2024-05-10T17:11:31Z)
Evaluating Transformer's Ability to Learn Mildly Context-Sensitive Languages [6.227678387562755]
Recent studies suggest that self-attention is theoretically limited in learning even some regular and context-free languages. We test the Transformer's ability to learn mildly context-sensitive languages of varying complexities. Our analyses show that the learned self-attention patterns and representations modeled dependency relations and demonstrated counting behavior.
arXiv Detail & Related papers (2023-09-02T08:17:29Z)
Bring Your Own Data! Self-Supervised Evaluation for Large Language Models [52.15056231665816]
We propose a framework for self-supervised evaluation of Large Language Models (LLMs) We demonstrate self-supervised evaluation strategies for measuring closed-book knowledge, toxicity, and long-range context dependence. We find strong correlations between self-supervised and human-supervised evaluations.
arXiv Detail & Related papers (2023-06-23T17:59:09Z)
Understanding and Mitigating Spurious Correlations in Text Classification with Neighborhood Analysis [69.07674653828565]
Machine learning models have a tendency to leverage spurious correlations that exist in the training set but may not hold true in general circumstances. In this paper, we examine the implications of spurious correlations through a novel perspective called neighborhood analysis. We propose a family of regularization methods, NFL (doN't Forget your Language) to mitigate spurious correlations in text classification.
arXiv Detail & Related papers (2023-05-23T03:55:50Z)
Explaining Emergent In-Context Learning as Kernel Regression [61.57151500616111]
Large language models (LLMs) have initiated a paradigm shift in transfer learning. In this paper, we investigate the reason why a transformer-based language model can accomplish in-context learning after pre-training. We find that during ICL, the attention and hidden features in LLMs match the behaviors of a kernel regression.
arXiv Detail & Related papers (2023-05-22T06:45:02Z)
Characterizing Attribution and Fluency Tradeoffs for Retrieval-Augmented Large Language Models [6.425088990363101]
We examine the relationship between fluency and attribution in Large Language Models prompted with retrieved evidence. We show that larger models tend to do much better in both fluency and attribution. We propose a recipe that could allow smaller models to both close the gap with larger models and preserve the benefits of top-k retrieval.
arXiv Detail & Related papers (2023-02-11T02:43:34Z)
What Are You Token About? Dense Retrieval as Distributions Over the Vocabulary [68.77983831618685]
We propose to interpret the vector representations produced by dual encoders by projecting them into the model's vocabulary space. We show that the resulting projections contain rich semantic information, and draw connection between them and sparse retrieval.
arXiv Detail & Related papers (2022-12-20T16:03:25Z)
Did the Cat Drink the Coffee? Challenging Transformers with Generalized Event Knowledge [59.22170796793179]
Transformers Language Models (TLMs) were tested on a benchmark for the textitdynamic estimation of thematic fit Our results show that TLMs can reach performances that are comparable to those achieved by SDM. However, additional analysis consistently suggests that TLMs do not capture important aspects of event knowledge.
arXiv Detail & Related papers (2021-07-22T20:52:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.