Anomaly Detection in Human Language via Meta-Learning: A Few-Shot Approach
- URL: http://arxiv.org/abs/2507.20019v1
- Date: Sat, 26 Jul 2025 17:23:03 GMT
- Title: Anomaly Detection in Human Language via Meta-Learning: A Few-Shot Approach
- Authors: Saurav Singla, Aarav Singla, Advik Gupta, Parnika Gupta,
- Abstract summary: We propose a framework for detecting anomalies in human language across diverse domains with limited labeled data.<n>We treat anomaly detection as a few shot binary classification problem and leverage meta-learning to train models that generalize across tasks.<n>Our method combines episodic training with prototypical networks and domain resampling to adapt quickly to new anomaly detection tasks.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose a meta learning framework for detecting anomalies in human language across diverse domains with limited labeled data. Anomalies in language ranging from spam and fake news to hate speech pose a major challenge due to their sparsity and variability. We treat anomaly detection as a few shot binary classification problem and leverage meta-learning to train models that generalize across tasks. Using datasets from domains such as SMS spam, COVID-19 fake news, and hate speech, we evaluate model generalization on unseen tasks with minimal labeled anomalies. Our method combines episodic training with prototypical networks and domain resampling to adapt quickly to new anomaly detection tasks. Empirical results show that our method outperforms strong baselines in F1 and AUC scores. We also release the code and benchmarks to facilitate further research in few-shot text anomaly detection.
Related papers
- Text-ADBench: Text Anomaly Detection Benchmark based on LLMs Embedding [27.02879006439693]
This work performs a comprehensive empirical study and introduces a benchmark for text anomaly detection.<n>Our work systematically evaluates the effectiveness of embedding-based text anomaly detection.<n>By open-sourcing our benchmark toolkit, this work provides a foundation for future research in robust and scalable text anomaly detection systems.
arXiv Detail & Related papers (2025-07-16T14:47:41Z) - Exploring Gradient-Guided Masked Language Model to Detect Textual Adversarial Attacks [50.53590930588431]
adversarial examples pose serious threats to natural language processing systems.<n>Recent studies suggest that adversarial texts deviate from the underlying manifold of normal texts, whereas masked language models can approximate the manifold of normal data.<n>We first introduce Masked Language Model-based Detection (MLMD), leveraging mask unmask operations of the masked language modeling (MLM) objective.
arXiv Detail & Related papers (2025-04-08T14:10:57Z) - Vicinal Risk Minimization for Few-Shot Cross-lingual Transfer in Abusive
Language Detection [19.399281609371258]
Cross-lingual transfer learning from high-resource to medium and low-resource languages has shown encouraging results.
We resort to data augmentation and continual pre-training for domain adaptation to improve cross-lingual abusive language detection.
arXiv Detail & Related papers (2023-11-03T16:51:07Z) - Domain Adaptive Synapse Detection with Weak Point Annotations [63.97144211520869]
We present AdaSyn, a framework for domain adaptive synapse detection with weak point annotations.
In the WASPSYN challenge at I SBI 2023, our method ranks the 1st place.
arXiv Detail & Related papers (2023-08-31T05:05:53Z) - MAGE: Machine-generated Text Detection in the Wild [82.70561073277801]
Large language models (LLMs) have achieved human-level text generation, emphasizing the need for effective AI-generated text detection.
We build a comprehensive testbed by gathering texts from diverse human writings and texts generated by different LLMs.
Despite challenges, the top-performing detector can identify 86.54% out-of-domain texts generated by a new LLM, indicating the feasibility for application scenarios.
arXiv Detail & Related papers (2023-05-22T17:13:29Z) - Deep Learning for Hate Speech Detection: A Comparative Study [54.42226495344908]
We present here a large-scale empirical comparison of deep and shallow hate-speech detection methods.
Our goal is to illuminate progress in the area, and identify strengths and weaknesses in the current state-of-the-art.
In doing so we aim to provide guidance as to the use of hate-speech detection in practice, quantify the state-of-the-art, and identify future research directions.
arXiv Detail & Related papers (2022-02-19T03:48:20Z) - Adversarial GLUE: A Multi-Task Benchmark for Robustness Evaluation of
Language Models [86.02610674750345]
Adversarial GLUE (AdvGLUE) is a new multi-task benchmark to explore and evaluate the vulnerabilities of modern large-scale language models under various types of adversarial attacks.
We apply 14 adversarial attack methods to GLUE tasks to construct AdvGLUE, which is further validated by humans for reliable annotations.
All the language models and robust training methods we tested perform poorly on AdvGLUE, with scores lagging far behind the benign accuracy.
arXiv Detail & Related papers (2021-11-04T12:59:55Z) - ProtAugment: Unsupervised diverse short-texts paraphrasing for intent
detection meta-learning [4.689945062721168]
We propose ProtAugment, a meta-learning algorithm for intent detection.
ProtAugment is a novel extension of Prototypical Networks.
arXiv Detail & Related papers (2021-05-27T08:31:27Z) - Meta-learning One-class Classifiers with Eigenvalue Solvers for
Supervised Anomaly Detection [55.888835686183995]
We propose a neural network-based meta-learning method for supervised anomaly detection.
We experimentally demonstrate that the proposed method achieves better performance than existing anomaly detection and few-shot learning methods.
arXiv Detail & Related papers (2021-03-01T01:43:04Z) - Transformer-based Language Model Fine-tuning Methods for COVID-19 Fake
News Detection [7.29381091750894]
We propose a novel transformer-based language model fine-tuning approach for these fake news detection.
First, the token vocabulary of individual model is expanded for the actual semantics of professional phrases.
Last, the predicted features extracted by universal language model RoBERTa and domain-specific model CT-BERT are fused by one multiple layer perception to integrate fine-grained and high-level specific representations.
arXiv Detail & Related papers (2021-01-14T09:05:42Z) - Aggressive Language Detection with Joint Text Normalization via
Adversarial Multi-task Learning [31.02484600391725]
Aggressive language detection (ALD) is one of the crucial applications in NLP community.
In this work, we target improving the ALD by jointly performing text normalization (TN), via an adversarial multi-task learning framework.
arXiv Detail & Related papers (2020-09-19T06:26:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.