Related papers: GlitchProber: Advancing Effective Detection and Mitigation of Glitch Tokens in Large Language Models

GlitchProber: Advancing Effective Detection and Mitigation of Glitch Tokens in Large Language Models

URL: http://arxiv.org/abs/2408.04905v2
Date: Mon, 23 Sep 2024 03:46:55 GMT
Title: GlitchProber: Advancing Effective Detection and Mitigation of Glitch Tokens in Large Language Models
Authors: Zhibo Zhang, Wuxia Bai, Yuxi Li, Mark Huasong Meng, Kailong Wang, Ling Shi, Li Li, Jun Wang, Haoyu Wang,
Abstract summary: Large language models (LLMs) have achieved unprecedented success in the field of natural language processing. Recent research has discovered a class of abnormal tokens in the model's vocabulary space and named them "glitch tokens" In this work, we aim to enhance the understanding of glitch tokens and propose techniques for their detection and mitigation.
Score: 17.633722815221983
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models (LLMs) have achieved unprecedented success in the field of natural language processing. However, the black-box nature of their internal mechanisms has brought many concerns about their trustworthiness and interpretability. Recent research has discovered a class of abnormal tokens in the model's vocabulary space and named them "glitch tokens". Those tokens, once included in the input, may induce the model to produce incorrect, irrelevant, or even harmful results, drastically undermining the reliability and practicality of LLMs. In this work, we aim to enhance the understanding of glitch tokens and propose techniques for their detection and mitigation. We first reveal the characteristic features induced by glitch tokens on LLMs, which are evidenced by significant deviations in the distributions of attention patterns and dynamic information from intermediate model layers. Based on the insights, we develop GlitchProber, a tool for efficient glitch token detection and mitigation. GlitchProber utilizes small-scale sampling, principal component analysis for accelerated feature extraction, and a simple classifier for efficient vocabulary screening. Taking one step further, GlitchProber rectifies abnormal model intermediate layer values to mitigate the destructive effects of glitch tokens. Evaluated on five mainstream open-source LLMs, GlitchProber demonstrates higher efficiency, precision, and recall compared to existing approaches, with an average F1 score of 0.86 and an average repair rate of 50.06%. GlitchProber unveils a novel path to address the challenges posed by glitch tokens and inspires future research toward more robust and interpretable LLMs.

Related papers

Demystifying Singular Defects in Large Language Models [61.98878352956125]
In large language models (LLMs), the underlying causes of high-norm tokens remain largely unexplored. We provide both theoretical insights and empirical validation across a range of recent models. We showcase two practical applications of these findings: the improvement of quantization schemes and the design of LLM signatures.
arXiv Detail & Related papers (2025-02-10T20:09:16Z)
GaussMark: A Practical Approach for Structural Watermarking of Language Models [61.84270985214254]
GaussMark is a simple, efficient, and relatively robust scheme for watermarking large language models. We show that GaussMark is reliable, efficient, and relatively robust to corruptions such as insertions, deletions, substitutions, and roundtrip translations.
arXiv Detail & Related papers (2025-01-17T22:30:08Z)
GlitchMiner: Mining Glitch Tokens in Large Language Models via Gradient-based Discrete Optimization [5.962706501263955]
Glitch tokens in Large Language Models (LLMs) can trigger unpredictable behaviors, threatening model reliability and safety. We propose GlitchMiner, a gradient-based discrete optimization framework that efficiently identifies glitch tokens. Experiments across multiple LLM architectures demonstrate that GlitchMiner outperforms existing methods in detection accuracy and adaptability.
arXiv Detail & Related papers (2024-10-19T09:49:12Z)
Exploring Automatic Cryptographic API Misuse Detection in the Era of LLMs [60.32717556756674]
This paper introduces a systematic evaluation framework to assess Large Language Models in detecting cryptographic misuses. Our in-depth analysis of 11,940 LLM-generated reports highlights that the inherent instabilities in LLMs can lead to over half of the reports being false positives. The optimized approach achieves a remarkable detection rate of nearly 90%, surpassing traditional methods and uncovering previously unknown misuses in established benchmarks.
arXiv Detail & Related papers (2024-07-23T15:31:26Z)
AnomaLLMy -- Detecting anomalous tokens in black-box LLMs through low-confidence single-token predictions [0.0]
AnomaLLMy is a novel technique for the automatic detection of anomalous tokens in black-box Large Language Models. AnomaLLMy identifies irregularities in model behavior, addressing the issue of anomalous tokens degrading the quality and reliability of models.
arXiv Detail & Related papers (2024-06-28T11:28:44Z)
Breaking the Ceiling of the LLM Community by Treating Token Generation as a Classification for Ensembling [3.873482175367558]
In this paper, we treat the Generation of each token by Large Language Model (LLM) as a Classification (GaC) for ensembling. In experiments, we ensemble state-of-the-art LLMs on several benchmarks, including exams, mathematics and reasoning, and observe that our method breaks the existing community performance ceiling.
arXiv Detail & Related papers (2024-06-18T13:17:26Z)
Advancing the Robustness of Large Language Models through Self-Denoised Smoothing [50.54276872204319]
Large language models (LLMs) have achieved significant success, but their vulnerability to adversarial perturbations has raised considerable concerns. We propose to leverage the multitasking nature of LLMs to first denoise the noisy inputs and then to make predictions based on these denoised versions. Unlike previous denoised smoothing techniques in computer vision, which require training a separate model to enhance the robustness of LLMs, our method offers significantly better efficiency and flexibility.
arXiv Detail & Related papers (2024-04-18T15:47:00Z)
Glitch Tokens in Large Language Models: Categorization Taxonomy and Effective Detection [23.382844182972807]
We introduce and explore the phenomenon of "glitch tokens", which are anomalous tokens produced by established tokenizers. We present categorizations of the identified glitch tokens and symptoms exhibited by LLMs when interacting with glitch tokens. We propose GlitchHunter, a novel iterative clustering-based technique, for efficient glitch token detection.
arXiv Detail & Related papers (2024-04-15T16:06:36Z)
FFN-SkipLLM: A Hidden Gem for Autoregressive Decoding with Adaptive Feed Forward Skipping [49.66872823080736]
Autoregressive Large Language Models (e.g., LLaMa, GPTs) are omnipresent achieving remarkable success in language understanding and generation. To mitigate overload incurred during generation, several early-exit and layer-dropping strategies have been proposed. We propose FFN-SkipLLM, which is an input-adaptive feed-forward skipping strategy.
arXiv Detail & Related papers (2024-04-05T02:35:43Z)
Token-Level Adversarial Prompt Detection Based on Perplexity Measures and Contextual Information [67.78183175605761]
Large Language Models are susceptible to adversarial prompt attacks. This vulnerability underscores a significant concern regarding the robustness and reliability of LLMs. We introduce a novel approach to detecting adversarial prompts at a token level.
arXiv Detail & Related papers (2023-11-20T03:17:21Z)
A New Benchmark and Reverse Validation Method for Passage-level Hallucination Detection [63.56136319976554]
Large Language Models (LLMs) generate hallucinations, which can cause significant damage when deployed for mission-critical tasks. We propose a self-check approach based on reverse validation to detect factual errors automatically in a zero-resource fashion. We empirically evaluate our method and existing zero-resource detection methods on two datasets.
arXiv Detail & Related papers (2023-10-10T10:14:59Z)
A Stitch in Time Saves Nine: Detecting and Mitigating Hallucinations of LLMs by Validating Low-Confidence Generation [76.34411067299331]
Large language models often tend to 'hallucinate' which critically hampers their reliability. We propose an approach that actively detects and mitigates hallucinations during the generation process. We show that the proposed active detection and mitigation approach successfully reduces the hallucinations of the GPT-3.5 model from 47.5% to 14.5% on average.
arXiv Detail & Related papers (2023-07-08T14:25:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.