Related papers: Redwood: Using Collision Detection to Grow a Large-Scale Intent Classification Dataset

Redwood: Using Collision Detection to Grow a Large-Scale Intent Classification Dataset

URL: http://arxiv.org/abs/2204.05483v1
Date: Tue, 12 Apr 2022 02:28:23 GMT
Title: Redwood: Using Collision Detection to Grow a Large-Scale Intent Classification Dataset
Authors: Stefan Larson, Kevin Leach
Abstract summary: In intent classification systems, problems can arise if training data for a new skill's intent overlaps semantically with an already-existing intent. This paper introduces the task of intent collision detection between multiple datasets for the purposes of growing a system's skillset. To highlight the need for intent collision detection, we show that model performance suffers if new data is added in such a way that does not arbitrate intents.
Score: 4.224157527132053
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Dialog systems must be capable of incorporating new skills via updates over time in order to reflect new use cases or deployment scenarios. Similarly, developers of such ML-driven systems need to be able to add new training data to an already-existing dataset to support these new skills. In intent classification systems, problems can arise if training data for a new skill's intent overlaps semantically with an already-existing intent. We call such cases collisions. This paper introduces the task of intent collision detection between multiple datasets for the purposes of growing a system's skillset. We introduce several methods for detecting collisions, and evaluate our methods on real datasets that exhibit collisions. To highlight the need for intent collision detection, we show that model performance suffers if new data is added in such a way that does not arbitrate colliding intents. Finally, we use collision detection to construct and benchmark a new dataset, Redwood, which is composed of 451 ntent categories from 13 original intent classification datasets, making it the largest publicly available intent classification benchmark.

Related papers

DialogVCS: Robust Natural Language Understanding in Dialogue System Upgrade [36.433020605744986]
In constant updates of product dialogue systems, new data from the real users would be merged into the existent data. New intents would emerge and might have semantic entanglement with the existing intents. We setup a new benchmark consisting of 4 Dialogue Version Control dataSets (VCS) We formulate the intent detection with imperfect data in the system update as a multi-label classification task with positive but unlabeled intents.
arXiv Detail & Related papers (2023-05-24T05:53:38Z)
Going beyond research datasets: Novel intent discovery in the industry setting [60.90117614762879]
This paper proposes methods to improve the intent discovery pipeline deployed in a large e-commerce platform. We show the benefit of pre-training language models on in-domain data: both self-supervised and with weak supervision. We also devise the best method to utilize the conversational structure (i.e., question and answer) of real-life datasets during fine-tuning for clustering tasks, which we call Conv.
arXiv Detail & Related papers (2023-05-09T14:21:29Z)
Novel Class Discovery without Forgetting [72.52222295216062]
We identify and formulate a new, pragmatic problem setting of NCDwF: Novel Class Discovery without Forgetting. We propose a machine learning model to incrementally discover novel categories of instances from unlabeled data. We introduce experimental protocols based on CIFAR-10, CIFAR-100 and ImageNet-1000 to measure the trade-off between knowledge retention and novel class discovery.
arXiv Detail & Related papers (2022-07-21T17:54:36Z)
Bridging Non Co-occurrence with Unlabeled In-the-wild Data for Incremental Object Detection [56.22467011292147]
Several incremental learning methods are proposed to mitigate catastrophic forgetting for object detection. Despite the effectiveness, these methods require co-occurrence of the unlabeled base classes in the training data of the novel classes. We propose the use of unlabeled in-the-wild data to bridge the non-occurrence caused by the missing base classes during the training of additional novel classes.
arXiv Detail & Related papers (2021-10-28T10:57:25Z)
The Pursuit of Knowledge: Discovering and Localizing Novel Categories using Dual Memory [85.01439251151203]
We tackle object category discovery, which is the problem of discovering and localizing novel objects in a large unlabeled dataset. We propose a method to use prior knowledge about certain object categories to discover new categories by leveraging two memory modules. We show the performance of our detector on the COCO minival dataset to demonstrate its in-the-wild capabilities.
arXiv Detail & Related papers (2021-05-04T17:55:59Z)
Generalized Zero-shot Intent Detection via Commonsense Knowledge [5.398580049917152]
We propose RIDE: an intent detection model that leverages commonsense knowledge in an unsupervised fashion to overcome the issue of training data scarcity. RIDE computes robust and generalizable relationship meta-features that capture deep semantic relationships between utterances and intent labels. Our extensive experimental analysis on three widely-used intent detection benchmarks shows that relationship meta-features significantly increase the accuracy of detecting both seen and unseen intents.
arXiv Detail & Related papers (2021-02-04T23:36:41Z)
Benchmarking Intent Detection for Task-Oriented Dialog Systems [6.54201796167054]
Intent detection is a key component of modern goal-oriented dialog systems that accomplish a user task by predicting the intent of users' text input. There are three primary challenges in designing robust and accurate intent detection models. Our results show that Watson Assistant's intent detection model outperforms other commercial solutions.
arXiv Detail & Related papers (2020-12-07T18:58:57Z)
Learning Adaptive Embedding Considering Incremental Class [55.21855842960139]
Class-Incremental Learning (CIL) aims to train a reliable model with the streaming data, which emerges unknown classes sequentially. Different from traditional closed set learning, CIL has two main challenges: 1) Novel class detection. After the novel classes are detected, the model needs to be updated without re-training using entire previous data.
arXiv Detail & Related papers (2020-08-31T04:11:24Z)
Two-Level Residual Distillation based Triple Network for Incremental Object Detection [21.725878050355824]
We propose a novel incremental object detector based on Faster R-CNN to continuously learn from new object classes without using old data. It is a triple network where an old model and a residual model as assistants for helping the incremental model learning on new classes without forgetting the previous learned knowledge.
arXiv Detail & Related papers (2020-07-27T11:04:57Z)
Semi-Supervised Class Discovery [7.123519086758813]
We introduce the dataset Reconstruction Accuracy, a new and important measure of the effectiveness of a model's ability to create labels. We apply a new, class learnability, for deciding whether a class is worthy of addition to the training dataset. We show that our class discovery system can be successfully applied to vision and language.
arXiv Detail & Related papers (2020-02-10T00:29:44Z)
Cross-dataset Training for Class Increasing Object Detection [52.34737978720484]
We present a conceptually simple, flexible and general framework for cross-dataset training in object detection. By cross-dataset training, existing datasets can be utilized to detect the merged object classes with a single model. While using cross-dataset training, we only need to label the new classes on the new dataset.
arXiv Detail & Related papers (2020-01-14T04:40:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.