Related papers: Navigating the Shortcut Maze: A Comprehensive Analysis of Shortcut Learning in Text Classification by Language Models

Navigating the Shortcut Maze: A Comprehensive Analysis of Shortcut Learning in Text Classification by Language Models

URL: http://arxiv.org/abs/2409.17455v1
Date: Thu, 26 Sep 2024 01:17:42 GMT
Title: Navigating the Shortcut Maze: A Comprehensive Analysis of Shortcut Learning in Text Classification by Language Models
Authors: Yuqing Zhou, Ruixiang Tang, Ziyu Yao, Ziwei Zhu
Abstract summary: This study addresses the overlooked impact of subtler, more complex shortcuts that compromise model reliability beyond oversimplified shortcuts. We introduce a comprehensive benchmark that categorizes shortcuts into occurrence, style, and concept. Our research systematically investigates models' resilience and susceptibilities to sophisticated shortcuts.
Score: 20.70050968223901
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Language models (LMs), despite their advances, often depend on spurious correlations, undermining their accuracy and generalizability. This study addresses the overlooked impact of subtler, more complex shortcuts that compromise model reliability beyond oversimplified shortcuts. We introduce a comprehensive benchmark that categorizes shortcuts into occurrence, style, and concept, aiming to explore the nuanced ways in which these shortcuts influence the performance of LMs. Through extensive experiments across traditional LMs, large language models, and state-of-the-art robust models, our research systematically investigates models' resilience and susceptibilities to sophisticated shortcuts. Our benchmark and code can be found at: https://github.com/yuqing-zhou/shortcut-learning-in-text-classification.

Related papers

Farther the Shift, Sparser the Representation: Analyzing OOD Mechanisms in LLMs [100.02824137397464]
We investigate how Large Language Models adapt their internal representations when encountering inputs of increasing difficulty.<n>We reveal a consistent and quantifiable phenomenon: as task difficulty increases, the last hidden states of LLMs become substantially sparser.<n>This sparsity--difficulty relation is observable across diverse models and domains.
arXiv Detail & Related papers (2026-03-03T18:48:15Z)
Schema for In-Context Learning [0.7850388075652649]
In-context learning (ICL) enables language models to adapt to new tasks by conditioning on demonstration examples.<n>Inspired by cognitive science, we introduce SCHEMA ACTIVATED IN CONTEXT (SA-ICL)<n>This framework extracts the representation of the building blocks of cognition for the reasoning process instilled from prior examples.<n>We show that SA-ICL consistently boosts performance, up to 36.19 percent, when the single demonstration example is of high quality.
arXiv Detail & Related papers (2025-10-14T21:00:15Z)
DBR: Divergence-Based Regularization for Debiasing Natural Language Understanding Models [50.54264918467997]
Pre-trained language models (PLMs) have achieved impressive results on various natural language processing tasks. Recent research has revealed that these models often rely on superficial features and shortcuts instead of developing a genuine understanding of language. We propose Divergence Based Regularization (DBR) to mitigate this shortcut learning behavior.
arXiv Detail & Related papers (2025-02-25T16:44:10Z)
On the Shortcut Learning in Multilingual Neural Machine Translation [95.30470845501141]
This study revisits the commonly-cited off-target issue in multilingual neural machine translation (MNMT) We attribute the off-target issue to the overfitting of the shortcuts of (non-centric, centric) language mappings. Analyses on learning dynamics show that the shortcut learning generally occurs in the later stage of model training.
arXiv Detail & Related papers (2024-11-15T21:09:36Z)
Shortcut Learning in In-Context Learning: A Survey [17.19214732926589]
Shortcut learning refers to the phenomenon where models employ simple, non-robust decision rules in practical tasks. This paper provides a novel perspective to review relevant research on shortcut learning in In-Context Learning (ICL)
arXiv Detail & Related papers (2024-11-04T12:13:04Z)
LangSuitE: Planning, Controlling and Interacting with Large Language Models in Embodied Text Environments [70.91258869156353]
We introduce LangSuitE, a versatile and simulation-free testbed featuring 6 representative embodied tasks in textual embodied worlds. Compared with previous LLM-based testbeds, LangSuitE offers adaptability to diverse environments without multiple simulation engines. We devise a novel chain-of-thought (CoT) schema, EmMem, which summarizes embodied states w.r.t. history information.
arXiv Detail & Related papers (2024-06-24T03:36:29Z)
RAVEN: In-Context Learning with Retrieval-Augmented Encoder-Decoder Language Models [57.12888828853409]
RAVEN is a model that combines retrieval-augmented masked language modeling and prefix language modeling. Fusion-in-Context Learning enables the model to leverage more in-context examples without requiring additional training. Our work underscores the potential of retrieval-augmented encoder-decoder language models for in-context learning.
arXiv Detail & Related papers (2023-08-15T17:59:18Z)
Large Language Models Can be Lazy Learners: Analyze Shortcuts in In-Context Learning [28.162661418161466]
Large language models (LLMs) have recently shown great potential for in-context learning. This paper investigates the reliance of LLMs on shortcuts or spurious correlations within prompts. We uncover a surprising finding that larger models are more likely to utilize shortcuts in prompts during inference.
arXiv Detail & Related papers (2023-05-26T20:56:30Z)
Shortcut Detection with Variational Autoencoders [1.3174512123890016]
We present a novel approach to detect shortcuts in image and audio datasets by leveraging variational autoencoders (VAEs) The disentanglement of features in the latent space of VAEs allows us to discover feature-target correlations in datasets and semi-automatically evaluate them for ML shortcuts. We demonstrate the applicability of our method on several real-world datasets and identify shortcuts that have not been discovered before.
arXiv Detail & Related papers (2023-02-08T18:26:10Z)
Language Model Pre-Training with Sparse Latent Typing [66.75786739499604]
We propose a new pre-training objective, Sparse Latent Typing, which enables the model to sparsely extract sentence-level keywords with diverse latent types. Experimental results show that our model is able to learn interpretable latent type categories in a self-supervised manner without using any external knowledge.
arXiv Detail & Related papers (2022-10-23T00:37:08Z)
Shortcut Learning of Large Language Models in Natural Language Understanding [119.45683008451698]
Large language models (LLMs) have achieved state-of-the-art performance on a series of natural language understanding tasks. They might rely on dataset bias and artifacts as shortcuts for prediction. This has significantly affected their generalizability and adversarial robustness.
arXiv Detail & Related papers (2022-08-25T03:51:39Z)
Why Machine Reading Comprehension Models Learn Shortcuts? [56.629192589376046]
We argue that larger proportion of shortcut questions in training data make models rely on shortcut tricks excessively. A thorough empirical analysis shows that MRC models tend to learn shortcut questions earlier than challenging questions.
arXiv Detail & Related papers (2021-06-02T08:43:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.