GLS-CSC: A Simple but Effective Strategy to Mitigate Chinese STM Models'
Over-Reliance on Superficial Clue
- URL: http://arxiv.org/abs/2309.04162v1
- Date: Fri, 8 Sep 2023 07:10:57 GMT
- Title: GLS-CSC: A Simple but Effective Strategy to Mitigate Chinese STM Models'
Over-Reliance on Superficial Clue
- Authors: Yanrui Du, Sendong Zhao, Yuhan Chen, Rai Bai, Jing Liu, Hua Wu,
Haifeng Wang, Bing Qin
- Abstract summary: We analyze and mitigate the influence of superficial clues on STM models.
We propose Gradually Learn Samples Containing Superficial Clue (GLS-CSC) as a training strategy.
We demonstrate that GLS-CSC outperforms existing methods in terms of enhancing the robustness and generalization of Chinese STM models.
- Score: 51.713301130055065
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pre-trained models have achieved success in Chinese Short Text Matching (STM)
tasks, but they often rely on superficial clues, leading to a lack of robust
predictions. To address this issue, it is crucial to analyze and mitigate the
influence of superficial clues on STM models. Our study aims to investigate
their over-reliance on the edit distance feature, commonly used to measure the
semantic similarity of Chinese text pairs, which can be considered a
superficial clue. To mitigate STM models' over-reliance on superficial clues,
we propose a novel resampling training strategy called Gradually Learn Samples
Containing Superficial Clue (GLS-CSC). Through comprehensive evaluations of
In-Domain (I.D.), Robustness (Rob.), and Out-Of-Domain (O.O.D.) test sets, we
demonstrate that GLS-CSC outperforms existing methods in terms of enhancing the
robustness and generalization of Chinese STM models. Moreover, we conduct a
detailed analysis of existing methods and reveal their commonality.
Related papers
- S-LoRA: Scalable Low-Rank Adaptation for Class Incremental Learning [73.93639228235622]
Continual Learning with foundation models has emerged as a promising approach to harnessing the power of pre-trained models for sequential tasks.
We propose a Scalable Low-Rank Adaptation (S-LoRA) method for CL (in particular class incremental learning), which incrementally decouples the learning of the direction and magnitude of LoRA parameters.
Our theoretical and empirical analysis demonstrates that S-LoRA tends to follow a low-loss trajectory that converges to an overlapped low-loss region, resulting in an excellent stability-plasticity trade-off in CL.
arXiv Detail & Related papers (2025-01-22T20:00:41Z) - Navigating the Shortcut Maze: A Comprehensive Analysis of Shortcut
Learning in Text Classification by Language Models [20.70050968223901]
This study addresses the overlooked impact of subtler, more complex shortcuts that compromise model reliability beyond oversimplified shortcuts.
We introduce a comprehensive benchmark that categorizes shortcuts into occurrence, style, and concept.
Our research systematically investigates models' resilience and susceptibilities to sophisticated shortcuts.
arXiv Detail & Related papers (2024-09-26T01:17:42Z) - Strategic Chain-of-Thought: Guiding Accurate Reasoning in LLMs through Strategy Elicitation [16.350747493026432]
The Chain-of-Thought (CoT) paradigm has emerged as a critical approach for enhancing the reasoning capabilities of large language models (LLMs)
We propose the textbfStrategic Chain-of-Thought (SCoT) to refine LLM performance by integrating strategic knowledge prior to generating intermediate reasoning steps.
SCoT employs a two-stage approach within a single prompt: first eliciting an effective problem-solving strategy, which is then used to guide the generation of high-quality CoT paths and final answers.
arXiv Detail & Related papers (2024-09-05T06:28:05Z) - Improving Representation Learning for Histopathologic Images with
Cluster Constraints [31.426157660880673]
Self-supervised learning (SSL) pretraining strategies are emerging as a viable alternative.
We introduce an SSL framework for transferable representation learning and semantically meaningful clustering.
Our approach outperforms common SSL methods in downstream classification and clustering tasks.
arXiv Detail & Related papers (2023-10-18T21:20:44Z) - Adversarial Capsule Networks for Romanian Satire Detection and Sentiment
Analysis [0.13048920509133807]
Satire detection and sentiment analysis are intensively explored natural language processing tasks.
In languages with fewer research resources, an alternative is to produce artificial examples based on character-level adversarial processes.
In this work, we improve the well-known NLP models with adversarial training and capsule networks.
The proposed framework outperforms the existing methods for the two tasks, achieving up to 99.08% accuracy.
arXiv Detail & Related papers (2023-06-13T15:23:44Z) - Exploration and Exploitation: Two Ways to Improve Chinese Spelling
Correction Models [51.744357472072416]
We propose a method, which continually identifies the weak spots of a model to generate more valuable training instances.
Experimental results show that such an adversarial training method combined with the pretraining strategy can improve both the generalization and robustness of multiple CSC models.
arXiv Detail & Related papers (2021-05-31T09:17:33Z) - On Data-Augmentation and Consistency-Based Semi-Supervised Learning [77.57285768500225]
Recently proposed consistency-based Semi-Supervised Learning (SSL) methods have advanced the state of the art in several SSL tasks.
Despite these advances, the understanding of these methods is still relatively limited.
arXiv Detail & Related papers (2021-01-18T10:12:31Z) - Revisiting LSTM Networks for Semi-Supervised Text Classification via
Mixed Objective Function [106.69643619725652]
We develop a training strategy that allows even a simple BiLSTM model, when trained with cross-entropy loss, to achieve competitive results.
We report state-of-the-art results for text classification task on several benchmark datasets.
arXiv Detail & Related papers (2020-09-08T21:55:22Z) - Reparameterized Variational Divergence Minimization for Stable Imitation [57.06909373038396]
We study the extent to which variations in the choice of probabilistic divergence may yield more performant ILO algorithms.
We contribute a re parameterization trick for adversarial imitation learning to alleviate the challenges of the promising $f$-divergence minimization framework.
Empirically, we demonstrate that our design choices allow for ILO algorithms that outperform baseline approaches and more closely match expert performance in low-dimensional continuous-control tasks.
arXiv Detail & Related papers (2020-06-18T19:04:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.