HINT3: Raising the bar for Intent Detection in the Wild
- URL: http://arxiv.org/abs/2009.13833v2
- Date: Sat, 10 Oct 2020 07:52:18 GMT
- Title: HINT3: Raising the bar for Intent Detection in the Wild
- Authors: Gaurav Arora, Chirag Jain, Manas Chaturvedi, Krupal Modi
- Abstract summary: We introduce 3 new datasets created from live chatbots in diverse domains.
Unlike most existing datasets that are crowdsourced, our datasets contain real user queries received by the chatbots.
We find that performance classifiers saturate at inadequate levels on test sets because all systems latch on to unintended patterns in training data.
- Score: 0.9634859579172252
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Intent Detection systems in the real world are exposed to complexities of
imbalanced datasets containing varying perception of intent, unintended
correlations and domain-specific aberrations. To facilitate benchmarking which
can reflect near real-world scenarios, we introduce 3 new datasets created from
live chatbots in diverse domains. Unlike most existing datasets that are
crowdsourced, our datasets contain real user queries received by the chatbots
and facilitates penalising unwanted correlations grasped during the training
process. We evaluate 4 NLU platforms and a BERT based classifier and find that
performance saturates at inadequate levels on test sets because all systems
latch on to unintended patterns in training data.
Related papers
- How Hard is this Test Set? NLI Characterization by Exploiting Training Dynamics [49.9329723199239]
We propose a method for the automated creation of a challenging test set without relying on the manual construction of artificial and unrealistic examples.
We categorize the test set of popular NLI datasets into three difficulty levels by leveraging methods that exploit training dynamics.
When our characterization method is applied to the training set, models trained with only a fraction of the data achieve comparable performance to those trained on the full dataset.
arXiv Detail & Related papers (2024-10-04T13:39:21Z) - Uni$^2$Det: Unified and Universal Framework for Prompt-Guided Multi-dataset 3D Detection [64.08296187555095]
Uni$2$Det is a framework for unified and universal multi-dataset training on 3D detection.
We introduce multi-stage prompting modules for multi-dataset 3D detection.
Results on zero-shot cross-dataset transfer validate the generalization capability of our proposed method.
arXiv Detail & Related papers (2024-09-30T17:57:50Z) - Downstream-Pretext Domain Knowledge Traceback for Active Learning [138.02530777915362]
We propose a downstream-pretext domain knowledge traceback (DOKT) method that traces the data interactions of downstream knowledge and pre-training guidance.
DOKT consists of a traceback diversity indicator and a domain-based uncertainty estimator.
Experiments conducted on ten datasets show that our model outperforms other state-of-the-art methods.
arXiv Detail & Related papers (2024-07-20T01:34:13Z) - LORD: Leveraging Open-Set Recognition with Unknown Data [10.200937444995944]
LORD is a framework to Leverage Open-set Recognition by exploiting unknown data.
We identify three model-agnostic training strategies that exploit background data and applied them to well-established classifiers.
arXiv Detail & Related papers (2023-08-24T06:12:41Z) - Perception Datasets for Anomaly Detection in Autonomous Driving: A
Survey [4.731404257629232]
Multiple perception datasets have been created for the evaluation of anomaly detection methods.
This survey provides a structured and, to the best of our knowledge, complete overview and comparison of perception datasets for anomaly detection in autonomous driving.
arXiv Detail & Related papers (2023-02-06T14:07:13Z) - Detection Hub: Unifying Object Detection Datasets via Query Adaptation
on Language Embedding [137.3719377780593]
A new design (named Detection Hub) is dataset-aware and category-aligned.
It mitigates the dataset inconsistency and provides coherent guidance for the detector to learn across multiple datasets.
The categories across datasets are semantically aligned into a unified space by replacing one-hot category representations with word embedding.
arXiv Detail & Related papers (2022-06-07T17:59:44Z) - Attentive Prototypes for Source-free Unsupervised Domain Adaptive 3D
Object Detection [85.11649974840758]
3D object detection networks tend to be biased towards the data they are trained on.
We propose a single-frame approach for source-free, unsupervised domain adaptation of lidar-based 3D object detectors.
arXiv Detail & Related papers (2021-11-30T18:42:42Z) - Free Lunch for Co-Saliency Detection: Context Adjustment [14.688461235328306]
We propose a "cost-free" group-cut-paste (GCP) procedure to leverage images from off-the-shelf saliency detection datasets and synthesize new samples.
We collect a novel dataset called Context Adjustment Training. The two variants of our dataset, i.e., CAT and CAT+, consist of 16,750 and 33,500 images, respectively.
arXiv Detail & Related papers (2021-08-04T14:51:37Z) - Exploiting Shared Representations for Personalized Federated Learning [54.65133770989836]
We propose a novel federated learning framework and algorithm for learning a shared data representation across clients and unique local heads for each client.
Our algorithm harnesses the distributed computational power across clients to perform many local-updates with respect to the low-dimensional local parameters for every update of the representation.
This result is of interest beyond federated learning to a broad class of problems in which we aim to learn a shared low-dimensional representation among data distributions.
arXiv Detail & Related papers (2021-02-14T05:36:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.