Sensitivity of Small Language Models to Fine-tuning Data Contamination
- URL: http://arxiv.org/abs/2511.06763v1
- Date: Mon, 10 Nov 2025 06:44:29 GMT
- Title: Sensitivity of Small Language Models to Fine-tuning Data Contamination
- Authors: Nicy Scaria, Silvester John Joseph Kennedy, Deepak Subramani,
- Abstract summary: Small Language Models (SLMs) are increasingly being deployed in resource-constrained environments.<n>We measure susceptibility to syntactic and semantic transformation types during instruction tuning.<n>Character reversal produces near-complete failure across all models regardless of size or family.<n>Semantic transformations demonstrate distinct threshold behaviors and greater resilience in core linguistic capabilities.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Small Language Models (SLMs) are increasingly being deployed in resource-constrained environments, yet their behavioral robustness to data contamination during instruction tuning remains poorly understood. We systematically investigate the contamination sensitivity of 23 SLMs (270M to 4B parameters) across multiple model families by measuring susceptibility to syntactic and semantic transformation types during instruction tuning: syntactic transformations (character and word reversal) and semantic transformations (irrelevant and counterfactual responses), each applied at contamination levels of 25\%, 50\%, 75\%, and 100\%. Our results reveal fundamental asymmetries in vulnerability patterns: syntactic transformations cause catastrophic performance degradation, with character reversal producing near-complete failure across all models regardless of size or family, while semantic transformations demonstrate distinct threshold behaviors and greater resilience in core linguistic capabilities. Critically, we discover a ``\textit{capability curse}" where larger, more capable models become more susceptible to learning semantic corruptions, effectively following harmful instructions more readily, while our analysis of base versus instruction-tuned variants reveals that alignment provides inconsistent robustness benefits, sometimes even reducing resilience. Our work establishes three core contributions: (1) empirical evidence of SLMs' disproportionate vulnerability to syntactic pattern contamination, (2) identification of asymmetric sensitivity patterns between syntactic and semantic transformations, and (3) systematic evaluation protocols for contamination robustness assessment. These findings have immediate deployment implications, suggesting that current robustness assumptions may not hold for smaller models and highlighting the need for contamination-aware training protocols.
Related papers
- Same Meaning, Different Scores: Lexical and Syntactic Sensitivity in LLM Evaluation [40.210132040677]
This paper examines how controlled, truth-conditionally equivalent lexical and syntactic perturbations affect the absolute performance and relative ranking of 23 contemporary Large Language Models (LLMs)<n>Results show that lexical perturbations consistently induce substantial, statistically significant performance degradation across nearly all models and tasks, while syntactic perturbations have more heterogeneous effects, occasionally improving results.
arXiv Detail & Related papers (2026-02-19T12:24:42Z) - DVD: A Robust Method for Detecting Variant Contamination in Large Language Model Evaluation [24.086354908256293]
textbfDVD is a single-sample detector that models the local output distribution induced by temperature sampling.<n>We construct the first benchmark for variant contamination across two domains Omni-MATH and SuperGPQA.<n>textbfDVD consistently outperforms perplexity-based, Min-$k$%++, edit-distance (CDD), and embedding-similarity baselines.
arXiv Detail & Related papers (2026-01-08T12:48:40Z) - Benchmarking Corruption Robustness of LVLMs: A Discriminative Benchmark and Robustness Alignment Metric [49.393713730706445]
We introduce Bench-C, a benchmark emphasizing discriminative samples for assessing corruption robustness.<n>We propose the Robustness Alignment Score (RAS), a unified metric that measures degradation in logit-level prediction structure.
arXiv Detail & Related papers (2025-11-24T12:07:56Z) - Exploring Semantic-constrained Adversarial Example with Instruction Uncertainty Reduction [51.50282796099369]
This paper develops a multi-dimensional instruction uncertainty reduction framework to generate semantically constrained adversarial examples.<n>By predicting the language-guided sampling process, the optimization process will be stabilized by the designed ResAdv-DDIM sampler.<n>We realize the reference-free generation of semantically constrained 3D adversarial examples for the first time.
arXiv Detail & Related papers (2025-10-27T04:02:52Z) - DeLeaker: Dynamic Inference-Time Reweighting For Semantic Leakage Mitigation in Text-to-Image Models [55.30555646945055]
Text-to-Image (T2I) models are vulnerable to semantic leakage.<n>We introduce DeLeaker, a lightweight approach that mitigates leakage by directly intervening on the model's attention maps.<n>SLIM is the first dataset dedicated to semantic leakage.
arXiv Detail & Related papers (2025-10-16T17:39:21Z) - Knowledge Collapse in LLMs: When Fluency Survives but Facts Fail under Recursive Synthetic Training [2.094557609248011]
Large language models increasingly rely on synthetic data due to human-written content scarcity.<n>Recursive training on model-generated outputs leads to model collapse, a degenerative process threatening factual reliability.
arXiv Detail & Related papers (2025-09-05T04:29:15Z) - Model-Agnostic Sentiment Distribution Stability Analysis for Robust LLM-Generated Texts Detection [23.153044933861988]
We propose SentiDetect, a model-agnostic framework for detecting large language models (LLMs)-generated text.<n>Our method is motivated by the empirical observation that LLM outputs tend to exhibit emotionally consistent patterns, whereas human-written texts display greater emotional variability.<n>We evaluate SentiDetect on five diverse datasets and a range of advanced LLMs, including Gemini-1.5-Pro, Claude-3, GPT-4-0613, and LLaMa-3.3.
arXiv Detail & Related papers (2025-08-09T09:55:47Z) - Assessing Representation Stability for Transformer Models [2.41710192205034]
Adrial text attacks remain a persistent threat to transformer models.<n>We introduce Representation Stability (RS), a model-aversa detection framework.<n>RS measures how embedding representations change when important words are masked.
arXiv Detail & Related papers (2025-08-06T21:07:49Z) - Model Hemorrhage and the Robustness Limits of Large Language Models [119.46442117681147]
Large language models (LLMs) demonstrate strong performance across natural language processing tasks, yet undergo significant performance degradation when modified for deployment.<n>We define this phenomenon as model hemorrhage - performance decline caused by parameter alterations and architectural changes.
arXiv Detail & Related papers (2025-03-31T10:16:03Z) - Estimating Commonsense Plausibility through Semantic Shifts [66.06254418551737]
We propose ComPaSS, a novel discriminative framework that quantifies commonsense plausibility by measuring semantic shifts.<n> Evaluations on two types of fine-grained commonsense plausibility estimation tasks show that ComPaSS consistently outperforms baselines.
arXiv Detail & Related papers (2025-02-19T06:31:06Z) - Semantic Sensitivities and Inconsistent Predictions: Measuring the
Fragility of NLI Models [44.56781176879151]
State-of-the-art Natural Language Inference (NLI) models are sensitive towards minor semantics preserving surface-form variations.
We show that semantic sensitivity causes performance degradations of $12.92%$ and $23.71%$ average over $textbfin-$ and $textbfout-of-$ domain settings.
arXiv Detail & Related papers (2024-01-25T14:47:05Z) - Improving Adversarial Robustness to Sensitivity and Invariance Attacks
with Deep Metric Learning [80.21709045433096]
A standard method in adversarial robustness assumes a framework to defend against samples crafted by minimally perturbing a sample.
We use metric learning to frame adversarial regularization as an optimal transport problem.
Our preliminary results indicate that regularizing over invariant perturbations in our framework improves both invariant and sensitivity defense.
arXiv Detail & Related papers (2022-11-04T13:54:02Z) - CLINE: Contrastive Learning with Semantic Negative Examples for Natural
Language Understanding [35.003401250150034]
We propose Contrastive Learning with semantIc Negative Examples (CLINE) to improve robustness of pre-trained language models.
CLINE constructs semantic negative examples unsupervised to improve the robustness under semantically adversarial attacking.
Empirical results show that our approach yields substantial improvements on a range of sentiment analysis, reasoning, and reading comprehension tasks.
arXiv Detail & Related papers (2021-07-01T13:34:12Z) - Fundamental Tradeoffs between Invariance and Sensitivity to Adversarial
Perturbations [65.05561023880351]
Adversarial examples are malicious inputs crafted to induce misclassification.
This paper studies a complementary failure mode, invariance-based adversarial examples.
We show that defenses against sensitivity-based attacks actively harm a model's accuracy on invariance-based attacks.
arXiv Detail & Related papers (2020-02-11T18:50:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.