Am I Blue or Is My Hobby Counting Teardrops? Expression Leakage in Large Language Models as a Symptom of Irrelevancy Disruption
- URL: http://arxiv.org/abs/2508.01708v1
- Date: Sun, 03 Aug 2025 10:29:19 GMT
- Title: Am I Blue or Is My Hobby Counting Teardrops? Expression Leakage in Large Language Models as a Symptom of Irrelevancy Disruption
- Authors: Berkay Köprü, Mehrzad Mashal, Yigit Gurses, Akos Kadar, Maximilian Schmitt, Ditty Mathew, Felix Burkhardt, Florian Eyben, Björn W. Schuller,
- Abstract summary: We introduce expression leakage, a novel phenomenon where large language models generate sentimentally charged expressions that are semantically unrelated to the input context.<n>Our experiments show that, as the model scales in the parameter space, the expression leakage reduces within the same LLM family.<n>In addition, our experiments indicate that, when negative sentiment is injected in the prompt, it disrupts the generation process more than the positive sentiment, causing a higher expression leakage rate.
- Score: 32.655632394093345
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models (LLMs) have advanced natural language processing (NLP) skills such as through next-token prediction and self-attention, but their ability to integrate broad context also makes them prone to incorporating irrelevant information. Prior work has focused on semantic leakage, bias introduced by semantically irrelevant context. In this paper, we introduce expression leakage, a novel phenomenon where LLMs systematically generate sentimentally charged expressions that are semantically unrelated to the input context. To analyse the expression leakage, we collect a benchmark dataset along with a scheme to automatically generate a dataset from free-form text from common-crawl. In addition, we propose an automatic evaluation pipeline that correlates well with human judgment, which accelerates the benchmarking by decoupling from the need of annotation for each analysed model. Our experiments show that, as the model scales in the parameter space, the expression leakage reduces within the same LLM family. On the other hand, we demonstrate that expression leakage mitigation requires specific care during the model building process, and cannot be mitigated by prompting. In addition, our experiments indicate that, when negative sentiment is injected in the prompt, it disrupts the generation process more than the positive sentiment, causing a higher expression leakage rate.
Related papers
- Knowing When to Stop: Dynamic Context Cutoff for Large Language Models [5.800837821046764]
Large language models (LLMs) process entire input contexts indiscriminately, which is inefficient in cases where the information required to answer a query is localized within the context.<n>We present dynamic context cutoff, a human-inspired method enabling LLMs to self-terminate processing upon acquiring sufficient task-relevant information.
arXiv Detail & Related papers (2025-02-03T03:38:29Z) - Scaling Down Semantic Leakage: Investigating Associative Bias in Smaller Language Models [0.4944564023471818]
I use Qwen2.5 model family to explore whether smaller models, ranging from 500M to 7B parameters, demonstrate less semantic leakage.<n>I introduce a new dataset of color-focused prompts, categorized into specific types of semantic associations, to systematically evaluate the models' performance.<n>Results indicate that smaller models exhibit less semantic leakage overall, although this trend is not strictly linear.
arXiv Detail & Related papers (2025-01-11T21:03:22Z) - AMRFact: Enhancing Summarization Factuality Evaluation with AMR-Driven Negative Samples Generation [57.8363998797433]
We propose AMRFact, a framework that generates perturbed summaries using Abstract Meaning Representations (AMRs)
Our approach parses factually consistent summaries into AMR graphs and injects controlled factual inconsistencies to create negative examples, allowing for coherent factually inconsistent summaries to be generated with high error-type coverage.
arXiv Detail & Related papers (2023-11-16T02:56:29Z) - An Analysis and Mitigation of the Reversal Curse [70.13419502543915]
Recent research observed a noteworthy phenomenon in large language models (LLMs)
The reversal curse is that when dealing with two entities, $a$ and $b$, LLMs excel in handling sequences in the form of $aRb$,'' but encounter challenges when processing $bR-1a$''
arXiv Detail & Related papers (2023-11-13T17:01:12Z) - RegaVAE: A Retrieval-Augmented Gaussian Mixture Variational Auto-Encoder
for Language Modeling [79.56442336234221]
We introduce RegaVAE, a retrieval-augmented language model built upon the variational auto-encoder (VAE)
It encodes the text corpus into a latent space, capturing current and future information from both source and target text.
Experimental results on various datasets demonstrate significant improvements in text generation quality and hallucination removal.
arXiv Detail & Related papers (2023-10-16T16:42:01Z) - Making Retrieval-Augmented Language Models Robust to Irrelevant Context [55.564789967211844]
An important desideratum of RALMs, is that retrieved information helps model performance when it is relevant.
Recent work has shown that retrieval augmentation can sometimes have a negative effect on performance.
arXiv Detail & Related papers (2023-10-02T18:52:35Z) - Understanding and Mitigating Spurious Correlations in Text
Classification with Neighborhood Analysis [69.07674653828565]
Machine learning models have a tendency to leverage spurious correlations that exist in the training set but may not hold true in general circumstances.
In this paper, we examine the implications of spurious correlations through a novel perspective called neighborhood analysis.
We propose a family of regularization methods, NFL (doN't Forget your Language) to mitigate spurious correlations in text classification.
arXiv Detail & Related papers (2023-05-23T03:55:50Z) - Attention-likelihood relationship in transformers [2.8304391396200064]
We analyze how large language models (LLMs) represent out-of-context words, investigating their reliance on the given context to capture their semantics.
Our likelihood-guided text perturbations reveal a correlation between token likelihood and attention values in transformer-based language models.
arXiv Detail & Related papers (2023-03-15T00:23:49Z) - Bridging the Gap Between Clean Data Training and Real-World Inference
for Spoken Language Understanding [76.89426311082927]
Existing models are trained on clean data, which causes a textitgap between clean data training and real-world inference.
We propose a method from the perspective of domain adaptation, by which both high- and low-quality samples are embedding into similar vector space.
Experiments on the widely-used dataset, Snips, and large scale in-house dataset (10 million training examples) demonstrate that this method not only outperforms the baseline models on real-world (noisy) corpus but also enhances the robustness, that is, it produces high-quality results under a noisy environment.
arXiv Detail & Related papers (2021-04-13T17:54:33Z) - A Controllable Model of Grounded Response Generation [122.7121624884747]
Current end-to-end neural conversation models inherently lack the flexibility to impose semantic control in the response generation process.
We propose a framework that we call controllable grounded response generation (CGRG)
We show that using this framework, a transformer based model with a novel inductive attention mechanism, trained on a conversation-like Reddit dataset, outperforms strong generation baselines.
arXiv Detail & Related papers (2020-05-01T21:22:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.