GOLD: Improving Out-of-Scope Detection in Dialogues using Data
Augmentation
- URL: http://arxiv.org/abs/2109.03079v1
- Date: Tue, 7 Sep 2021 13:35:03 GMT
- Title: GOLD: Improving Out-of-Scope Detection in Dialogues using Data
Augmentation
- Authors: Derek Chen, Zhou Yu
- Abstract summary: Gold technique augments existing data to train better OOS detectors operating in low-data regimes.
In experiments across three target benchmarks, the top GOLD model outperforms all existing methods on all key metrics.
- Score: 41.04593978694591
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Practical dialogue systems require robust methods of detecting out-of-scope
(OOS) utterances to avoid conversational breakdowns and related failure modes.
Directly training a model with labeled OOS examples yields reasonable
performance, but obtaining such data is a resource-intensive process. To tackle
this limited-data problem, previous methods focus on better modeling the
distribution of in-scope (INS) examples. We introduce GOLD as an orthogonal
technique that augments existing data to train better OOS detectors operating
in low-data regimes. GOLD generates pseudo-labeled candidates using samples
from an auxiliary dataset and keeps only the most beneficial candidates for
training through a novel filtering mechanism. In experiments across three
target benchmarks, the top GOLD model outperforms all existing methods on all
key metrics, achieving relative gains of 52.4%, 48.9% and 50.3% against median
baseline performance. We also analyze the unique properties of OOS data to
identify key factors for optimally applying our proposed method.
Related papers
- SUDS: A Strategy for Unsupervised Drift Sampling [0.5437605013181142]
Supervised machine learning encounters concept drift, where the data distribution changes over time, degrading performance.
We present the Strategy for Drift Sampling (SUDS), a novel method that selects homogeneous samples for retraining using existing drift detection algorithms.
Our results demonstrate the efficacy of SUDS in optimizing labeled data use in dynamic environments.
arXiv Detail & Related papers (2024-11-05T10:55:29Z) - OV-DINO: Unified Open-Vocabulary Detection with Language-Aware Selective Fusion [88.59397418187226]
We propose a novel unified open-vocabulary detection method called OV-DINO.
It is pre-trained on diverse large-scale datasets with language-aware selective fusion in a unified framework.
We evaluate the performance of the proposed OV-DINO on popular open-vocabulary detection benchmarks.
arXiv Detail & Related papers (2024-07-10T17:05:49Z) - Generating Hard-Negative Out-of-Scope Data with ChatGPT for Intent
Classification [8.013995844494456]
We present an automated technique to generate hard-negative OOS data using ChatGPT.
We show that classifiers struggle to correctly identify hard-negative OOS utterances more than general OOS utterances.
Finally, we show that incorporating hard-negative OOS data for training improves model robustness when detecting hard-negative OOS data and general OOS data.
arXiv Detail & Related papers (2024-03-08T19:25:00Z) - Cluster-level pseudo-labelling for source-free cross-domain facial
expression recognition [94.56304526014875]
We propose the first Source-Free Unsupervised Domain Adaptation (SFUDA) method for Facial Expression Recognition (FER)
Our method exploits self-supervised pretraining to learn good feature representations from the target data.
We validate the effectiveness of our method in four adaptation setups, proving that it consistently outperforms existing SFUDA methods when applied to FER.
arXiv Detail & Related papers (2022-10-11T08:24:50Z) - Attentive Prototypes for Source-free Unsupervised Domain Adaptive 3D
Object Detection [85.11649974840758]
3D object detection networks tend to be biased towards the data they are trained on.
We propose a single-frame approach for source-free, unsupervised domain adaptation of lidar-based 3D object detectors.
arXiv Detail & Related papers (2021-11-30T18:42:42Z) - Identifying Untrustworthy Samples: Data Filtering for Open-domain
Dialogues with Bayesian Optimization [28.22184410167622]
We present a data filtering method for open-domain dialogues.
We score training samples with a quality measure, sort them in descending order, and filter out those at the bottom.
Experimental results on two datasets show that our method can effectively identify untrustworthy samples.
arXiv Detail & Related papers (2021-09-14T06:42:54Z) - DEALIO: Data-Efficient Adversarial Learning for Imitation from
Observation [57.358212277226315]
In imitation learning from observation IfO, a learning agent seeks to imitate a demonstrating agent using only observations of the demonstrated behavior without access to the control signals generated by the demonstrator.
Recent methods based on adversarial imitation learning have led to state-of-the-art performance on IfO problems, but they typically suffer from high sample complexity due to a reliance on data-inefficient, model-free reinforcement learning algorithms.
This issue makes them impractical to deploy in real-world settings, where gathering samples can incur high costs in terms of time, energy, and risk.
We propose a more data-efficient IfO algorithm
arXiv Detail & Related papers (2021-03-31T23:46:32Z) - DAGA: Data Augmentation with a Generation Approach for Low-resource
Tagging Tasks [88.62288327934499]
We propose a novel augmentation method with language models trained on the linearized labeled sentences.
Our method is applicable to both supervised and semi-supervised settings.
arXiv Detail & Related papers (2020-11-03T07:49:15Z) - Automating Outlier Detection via Meta-Learning [37.736124230543865]
We develop the first principled data-driven approach to model selection for outlier detection, called MetaOD, based on meta-learning.
We show the effectiveness of MetaOD in selecting a detection model that significantly outperforms the most popular outlier detectors.
To foster and further research on this new problem, we open-source our entire meta-learning system, benchmark environment, and testbed datasets.
arXiv Detail & Related papers (2020-09-22T15:14:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.