Modeling Token-level Uncertainty to Learn Unknown Concepts in SLU via
Calibrated Dirichlet Prior RNN
- URL: http://arxiv.org/abs/2010.08101v1
- Date: Fri, 16 Oct 2020 02:12:30 GMT
- Title: Modeling Token-level Uncertainty to Learn Unknown Concepts in SLU via
Calibrated Dirichlet Prior RNN
- Authors: Yilin Shen, Wenhu Chen, Hongxia Jin
- Abstract summary: One major task of spoken language understanding (SLU) in modern personal assistants is to extract semantic concepts from an utterance.
Recent research collected question and answer annotated data to learn what is unknown and should be asked.
We incorporate softmax-based slot filling neural architectures to model the sequence uncertainty without question supervision.
- Score: 98.4713940310056
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: One major task of spoken language understanding (SLU) in modern personal
assistants is to extract semantic concepts from an utterance, called slot
filling. Although existing slot filling models attempted to improve extracting
new concepts that are not seen in training data, the performance in practice is
still not satisfied. Recent research collected question and answer annotated
data to learn what is unknown and should be asked, yet not practically scalable
due to the heavy data collection effort. In this paper, we incorporate
softmax-based slot filling neural architectures to model the sequence
uncertainty without question supervision. We design a Dirichlet Prior RNN to
model high-order uncertainty by degenerating as softmax layer for RNN model
training. To further enhance the uncertainty modeling robustness, we propose a
novel multi-task training to calibrate the Dirichlet concentration parameters.
We collect unseen concepts to create two test datasets from SLU benchmark
datasets Snips and ATIS. On these two and another existing Concept Learning
benchmark datasets, we show that our approach significantly outperforms
state-of-the-art approaches by up to 8.18%. Our method is generic and can be
applied to any RNN or Transformer based slot filling models with a softmax
layer.
Related papers
- Multi-Stage Knowledge Integration of Vision-Language Models for Continual Learning [79.46570165281084]
We propose a Multi-Stage Knowledge Integration network (MulKI) to emulate the human learning process in distillation methods.
MulKI achieves this through four stages, including Eliciting Ideas, Adding New Ideas, Distinguishing Ideas, and Making Connections.
Our method demonstrates significant improvements in maintaining zero-shot capabilities while supporting continual learning across diverse downstream tasks.
arXiv Detail & Related papers (2024-11-11T07:36:19Z) - Sparse Concept Bottleneck Models: Gumbel Tricks in Contrastive Learning [86.15009879251386]
We propose a novel architecture and method of explainable classification with Concept Bottleneck Models (CBM)
CBMs require an additional set of concepts to leverage.
We show a significant increase in accuracy using sparse hidden layers in CLIP-based bottleneck models.
arXiv Detail & Related papers (2024-04-04T09:43:43Z) - Complementary Learning Subnetworks for Parameter-Efficient
Class-Incremental Learning [40.13416912075668]
We propose a rehearsal-free CIL approach that learns continually via the synergy between two Complementary Learning Subnetworks.
Our method achieves competitive results against state-of-the-art methods, especially in accuracy gain, memory cost, training efficiency, and task-order.
arXiv Detail & Related papers (2023-06-21T01:43:25Z) - Retrieval-Enhanced Contrastive Vision-Text Models [61.783728119255365]
We propose to equip vision-text models with the ability to refine their embedding with cross-modal retrieved information from a memory at inference time.
Remarkably, we show that this can be done with a light-weight, single-layer, fusion transformer on top of a frozen CLIP.
Our experiments validate that our retrieval-enhanced contrastive (RECO) training improves CLIP performance substantially on several challenging fine-grained tasks.
arXiv Detail & Related papers (2023-06-12T15:52:02Z) - Boundary Unlearning [5.132489421775161]
We propose Boundary Unlearning, a rapid yet effective way to unlearn an entire class from a trained machine learning model.
We extensively evaluate Boundary Unlearning on image classification and face recognition tasks, with an expected speed-up of $17times$ and $19times$, respectively.
arXiv Detail & Related papers (2023-03-21T03:33:18Z) - TWINS: A Fine-Tuning Framework for Improved Transferability of
Adversarial Robustness and Generalization [89.54947228958494]
This paper focuses on the fine-tuning of an adversarially pre-trained model in various classification tasks.
We propose a novel statistics-based approach, Two-WIng NormliSation (TWINS) fine-tuning framework.
TWINS is shown to be effective on a wide range of image classification datasets in terms of both generalization and robustness.
arXiv Detail & Related papers (2023-03-20T14:12:55Z) - Boosting Low-Data Instance Segmentation by Unsupervised Pre-training
with Saliency Prompt [103.58323875748427]
This work offers a novel unsupervised pre-training solution for low-data regimes.
Inspired by the recent success of the Prompting technique, we introduce a new pre-training method that boosts QEIS models.
Experimental results show that our method significantly boosts several QEIS models on three datasets.
arXiv Detail & Related papers (2023-02-02T15:49:03Z) - Adversarial Learning Networks: Source-free Unsupervised Domain
Incremental Learning [0.0]
In a non-stationary environment, updating a DNN model requires parameter re-training or model fine-tuning.
We propose an unsupervised source-free method to update DNN classification models.
Unlike existing methods, our approach can update a DNN model incrementally for non-stationary source and target tasks without storing past training data.
arXiv Detail & Related papers (2023-01-28T02:16:13Z) - Assemble Foundation Models for Automatic Code Summarization [9.53949558569201]
We propose a flexible and robust approach for automatic code summarization based on neural networks.
We assemble available foundation models, such as CodeBERT and GPT-2, into a single model named AdaMo.
We introduce two adaptive schemes from the perspective of knowledge transfer, namely continuous pretraining and intermediate finetuning.
arXiv Detail & Related papers (2022-01-13T21:38:33Z) - Causality-aware counterfactual confounding adjustment for feature
representations learned by deep models [14.554818659491644]
Causal modeling has been recognized as a potential solution to many challenging problems in machine learning (ML)
We describe how a recently proposed counterfactual approach can still be used to deconfound the feature representations learned by deep neural network (DNN) models.
arXiv Detail & Related papers (2020-04-20T17:37:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.