Multi-Level Knowledge Distillation for Out-of-Distribution Detection in
Text
- URL: http://arxiv.org/abs/2211.11300v3
- Date: Fri, 2 Jun 2023 06:48:09 GMT
- Title: Multi-Level Knowledge Distillation for Out-of-Distribution Detection in
Text
- Authors: Qianhui Wu, Huiqiang Jiang, Haonan Yin, B\"orje F. Karlsson, Chin-Yew
Lin
- Abstract summary: Self-supervised representation learning has proved to be a valuable component for out-of-distribution (OoD) detection.
In this paper, we analyze the complementary characteristics of both OoD detection methods.
We propose a multi-level knowledge distillation approach that integrates their strengths while mitigating their limitations.
- Score: 12.428289757859433
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Self-supervised representation learning has proved to be a valuable component
for out-of-distribution (OoD) detection with only the texts of in-distribution
(ID) examples. These approaches either train a language model from scratch or
fine-tune a pre-trained language model using ID examples, and then take the
perplexity output by the language model as OoD scores. In this paper, we
analyze the complementary characteristics of both OoD detection methods and
propose a multi-level knowledge distillation approach that integrates their
strengths while mitigating their limitations. Specifically, we use a fine-tuned
model as the teacher to teach a randomly initialized student model on the ID
examples. Besides the prediction layer distillation, we present a
similarity-based intermediate layer distillation method to thoroughly explore
the representation space of the teacher model. In this way, the learned student
can better represent the ID data manifold while gaining a stronger ability to
map OoD examples outside the ID data manifold with the regularization inherited
from pre-training. Besides, the student model sees only ID examples during
parameter learning, further promoting more distinguishable features for OoD
detection. We conduct extensive experiments over multiple benchmark datasets,
i.e., CLINC150, SST, ROSTD, 20 NewsGroups, and AG News; showing that the
proposed method yields new state-of-the-art performance. We also explore its
application as an AIGC detector to distinguish between answers generated by
ChatGPT and human experts. It is observed that our model exceeds human
evaluators in the pair-expert task on the Human ChatGPT Comparison Corpus.
Related papers
- Out-of-Distribution Detection Using Peer-Class Generated by Large Language Model [0.0]
Out-of-distribution (OOD) detection is a critical task to ensure the reliability and security of machine learning models.
In this paper, a novel method called ODPC is proposed, in which specific prompts to generate OOD peer classes of ID semantics are designed by a large language model.
Experiments on five benchmark datasets show that the method we propose can yield state-of-the-art results.
arXiv Detail & Related papers (2024-03-20T06:04:05Z) - Unleashing Mask: Explore the Intrinsic Out-of-Distribution Detection
Capability [70.72426887518517]
Out-of-distribution (OOD) detection is an indispensable aspect of secure AI when deploying machine learning models in real-world applications.
We propose a novel method, Unleashing Mask, which aims to restore the OOD discriminative capabilities of the well-trained model with ID data.
Our method utilizes a mask to figure out the memorized atypical samples, and then finetune the model or prune it with the introduced mask to forget them.
arXiv Detail & Related papers (2023-06-06T14:23:34Z) - Ensemble knowledge distillation of self-supervised speech models [84.69577440755457]
Distilled self-supervised models have shown competitive performance and efficiency in recent years.
We performed Ensemble Knowledge Distillation (EKD) on various self-supervised speech models such as HuBERT, RobustHuBERT, and WavLM.
Our method improves the performance of the distilled models on four downstream speech processing tasks.
arXiv Detail & Related papers (2023-02-24T17:15:39Z) - EmbedDistill: A Geometric Knowledge Distillation for Information
Retrieval [83.79667141681418]
Large neural models (such as Transformers) achieve state-of-the-art performance for information retrieval (IR)
We propose a novel distillation approach that leverages the relative geometry among queries and documents learned by the large teacher model.
We show that our approach successfully distills from both dual-encoder (DE) and cross-encoder (CE) teacher models to 1/10th size asymmetric students that can retain 95-97% of the teacher performance.
arXiv Detail & Related papers (2023-01-27T22:04:37Z) - Exploring Inconsistent Knowledge Distillation for Object Detection with
Data Augmentation [66.25738680429463]
Knowledge Distillation (KD) for object detection aims to train a compact detector by transferring knowledge from a teacher model.
We propose inconsistent knowledge distillation (IKD) which aims to distill knowledge inherent in the teacher model's counter-intuitive perceptions.
Our method outperforms state-of-the-art KD baselines on one-stage, two-stage and anchor-free object detectors.
arXiv Detail & Related papers (2022-09-20T16:36:28Z) - A Generative Language Model for Few-shot Aspect-Based Sentiment Analysis [90.24921443175514]
We focus on aspect-based sentiment analysis, which involves extracting aspect term, category, and predicting their corresponding polarities.
We propose to reformulate the extraction and prediction tasks into the sequence generation task, using a generative language model with unidirectional attention.
Our approach outperforms the previous state-of-the-art (based on BERT) on average performance by a large margins in few-shot and full-shot settings.
arXiv Detail & Related papers (2022-04-11T18:31:53Z) - No Shifted Augmentations (NSA): compact distributions for robust
self-supervised Anomaly Detection [4.243926243206826]
Unsupervised Anomaly detection (AD) requires building a notion of normalcy, distinguishing in-distribution (ID) and out-of-distribution (OOD) data.
We investigate how the emph geometrical compactness of the ID feature distribution makes isolating and detecting outliers easier.
We propose novel architectural modifications to the self-supervised feature learning step, that enable such compact distributions for ID data to be learned.
arXiv Detail & Related papers (2022-03-19T15:55:32Z) - Anomaly Detection via Reverse Distillation from One-Class Embedding [2.715884199292287]
We propose a novel T-S model consisting of a teacher encoder and a student decoder.
Instead of receiving raw images directly, the student network takes teacher model's one-class embedding as input.
In addition, we introduce a trainable one-class bottleneck embedding module in our T-S model.
arXiv Detail & Related papers (2022-01-26T01:48:37Z) - Unsupervised Paraphrasing with Pretrained Language Models [85.03373221588707]
We propose a training pipeline that enables pre-trained language models to generate high-quality paraphrases in an unsupervised setting.
Our recipe consists of task-adaptation, self-supervision, and a novel decoding algorithm named Dynamic Blocking.
We show with automatic and human evaluations that our approach achieves state-of-the-art performance on both the Quora Question Pair and the ParaNMT datasets.
arXiv Detail & Related papers (2020-10-24T11:55:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.