Data Augmentation for Copy-Mechanism in Dialogue State Tracking
- URL: http://arxiv.org/abs/2002.09634v1
- Date: Sat, 22 Feb 2020 05:40:32 GMT
- Title: Data Augmentation for Copy-Mechanism in Dialogue State Tracking
- Authors: Xiaohui Song, Liangjun Zang, Yipeng Su, Xing Wu, Jizhong Han and
Songlin Hu
- Abstract summary: We find out the factors that influence the generalization ability of a common copy-mechanism model for dialogue state tracking (DST)
We propose a simple but effective algorithm of data augmentation to train copy-mechanism models, which augments the input dataset by copying user utterances and replacing the real slot values with randomly generated strings.
- Score: 30.768655511224527
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While several state-of-the-art approaches to dialogue state tracking (DST)
have shown promising performances on several benchmarks, there is still a
significant performance gap between seen slot values (i.e., values that occur
in both training set and test set) and unseen ones (values that occur in
training set but not in test set). Recently, the copy-mechanism has been widely
used in DST models to handle unseen slot values, which copies slot values from
user utterance directly. In this paper, we aim to find out the factors that
influence the generalization ability of a common copy-mechanism model for DST.
Our key observations include: 1) the copy-mechanism tends to memorize values
rather than infer them from contexts, which is the primary reason for
unsatisfactory generalization performance; 2) greater diversity of slot values
in the training set increase the performance on unseen values but slightly
decrease the performance on seen values. Moreover, we propose a simple but
effective algorithm of data augmentation to train copy-mechanism models, which
augments the input dataset by copying user utterances and replacing the real
slot values with randomly generated strings. Users could use two
hyper-parameters to realize a trade-off between the performances on seen values
and unseen ones, as well as a trade-off between overall performance and
computational cost. Experimental results on three widely used datasets (WoZ
2.0, DSTC2, and Multi-WoZ 2.0) show the effectiveness of our approach.
Related papers
- JPAVE: A Generation and Classification-based Model for Joint Product
Attribute Prediction and Value Extraction [59.94977231327573]
We propose a multi-task learning model with value generation/classification and attribute prediction called JPAVE.
Two variants of our model are designed for open-world and closed-world scenarios.
Experimental results on a public dataset demonstrate the superiority of our model compared with strong baselines.
arXiv Detail & Related papers (2023-11-07T18:36:16Z) - Free-text Keystroke Authentication using Transformers: A Comparative
Study of Architectures and Loss Functions [1.0152838128195467]
Keystroke biometrics is a promising approach for user identification and verification, leveraging the unique patterns in individuals' typing behavior.
We propose a Transformer-based network that employs self-attention to extract informative features from keystroke sequences.
Our model surpasses the previous state-of-the-art in free-text keystroke authentication.
arXiv Detail & Related papers (2023-10-18T00:34:26Z) - Retrieval-Enhanced Contrastive Vision-Text Models [61.783728119255365]
We propose to equip vision-text models with the ability to refine their embedding with cross-modal retrieved information from a memory at inference time.
Remarkably, we show that this can be done with a light-weight, single-layer, fusion transformer on top of a frozen CLIP.
Our experiments validate that our retrieval-enhanced contrastive (RECO) training improves CLIP performance substantially on several challenging fine-grained tasks.
arXiv Detail & Related papers (2023-06-12T15:52:02Z) - Value function estimation using conditional diffusion models for control [62.27184818047923]
We propose a simple algorithm called Diffused Value Function (DVF)
It learns a joint multi-step model of the environment-robot interaction dynamics using a diffusion model.
We show how DVF can be used to efficiently capture the state visitation measure for multiple controllers.
arXiv Detail & Related papers (2023-06-09T18:40:55Z) - DOAD: Decoupled One Stage Action Detection Network [77.14883592642782]
Localizing people and recognizing their actions from videos is a challenging task towards high-level video understanding.
Existing methods are mostly two-stage based, with one stage for person bounding box generation and the other stage for action recognition.
We present a decoupled one-stage network dubbed DOAD, to improve the efficiency for-temporal action detection.
arXiv Detail & Related papers (2023-04-01T08:06:43Z) - An Efficiency Study for SPLADE Models [5.725475501578801]
In this paper, we focus on improving the efficiency of the SPLADE model.
We propose several techniques including L1 regularization for queries, a separation of document/ encoders, a FLOPS-regularized middle-training, and the use of faster query encoders.
arXiv Detail & Related papers (2022-07-08T11:42:05Z) - On the Eigenvalues of Global Covariance Pooling for Fine-grained Visual
Recognition [65.67315418971688]
We show that truncating small eigenvalues of the Global Covariance Pooling (GCP) can attain smoother gradient.
On fine-grained datasets, truncating the small eigenvalues would make the model fail to converge.
Inspired by this observation, we propose a network branch dedicated to magnifying the importance of small eigenvalues.
arXiv Detail & Related papers (2022-05-26T11:41:36Z) - Efficient Two-Stage Detection of Human-Object Interactions with a Novel
Unary-Pairwise Transformer [41.44769642537572]
Unary-Pairwise Transformer is a two-stage detector that exploits unary and pairwise representations for HOIs.
We evaluate our method on the HICO-DET and V-COCO datasets, and significantly outperform state-of-the-art approaches.
arXiv Detail & Related papers (2021-12-03T10:52:06Z) - Few-Shot Named Entity Recognition: A Comprehensive Study [92.40991050806544]
We investigate three schemes to improve the model generalization ability for few-shot settings.
We perform empirical comparisons on 10 public NER datasets with various proportions of labeled data.
We create new state-of-the-art results on both few-shot and training-free settings.
arXiv Detail & Related papers (2020-12-29T23:43:16Z) - A Fast and Robust BERT-based Dialogue State Tracker for Schema-Guided
Dialogue Dataset [8.990035371365408]
We introduce FastSGT, a fast and robust BERT-based model for state tracking in goal-oriented dialogue systems.
The proposed model is designed for theGuided Dialogue dataset which contains natural language descriptions.
Our model keeps the efficiency in terms of computational and memory consumption while improving the accuracy significantly.
arXiv Detail & Related papers (2020-08-27T18:51:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.