Pre-training for low resource speech-to-intent applications
- URL: http://arxiv.org/abs/2103.16674v1
- Date: Tue, 30 Mar 2021 20:44:29 GMT
- Title: Pre-training for low resource speech-to-intent applications
- Authors: Pu Wang, Hugo Van hamme
- Abstract summary: We discuss a user-taught speech-to-intent (S2I) system in this paper.
The user-taught system learns from scratch from the users' spoken input with action demonstration.
In this paper we combine the encoder of an end-to-end ASR system with the prior NMF/capsule network-based user-taught decoder.
- Score: 26.093156590824076
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Designing a speech-to-intent (S2I) agent which maps the users' spoken
commands to the agents' desired task actions can be challenging due to the
diverse grammatical and lexical preference of different users. As a remedy, we
discuss a user-taught S2I system in this paper. The user-taught system learns
from scratch from the users' spoken input with action demonstration, which
ensure it is fully matched to the users' way of formulating intents and their
articulation habits. The main issue is the scarce training data due to the user
effort involved. Existing state-of-art approaches in this setting are based on
non-negative matrix factorization (NMF) and capsule networks. In this paper we
combine the encoder of an end-to-end ASR system with the prior NMF/capsule
network-based user-taught decoder, and investigate whether pre-training
methodology can reduce training data requirements for the NMF and capsule
network. Experimental results show the pre-trained ASR-NMF framework
significantly outperforms other models, and also, we discuss limitations of
pre-training with different types of command-and-control(C&C) applications.
Related papers
- Unsupervised Pre-training with Language-Vision Prompts for Low-Data Instance Segmentation [105.23631749213729]
We propose a novel method for unsupervised pre-training in low-data regimes.
Inspired by the recently successful prompting technique, we introduce a new method, Unsupervised Pre-training with Language-Vision Prompts.
We show that our method can converge faster and perform better than CNN-based models in low-data regimes.
arXiv Detail & Related papers (2024-05-22T06:48:43Z) - On the Soft-Subnetwork for Few-shot Class Incremental Learning [67.0373924836107]
We propose a few-shot class incremental learning (FSCIL) method referred to as emphSoft-SubNetworks (SoftNet).
Our objective is to learn a sequence of sessions incrementally, where each session only includes a few training instances per class while preserving the knowledge of the previously learned ones.
We provide comprehensive empirical validations demonstrating that our SoftNet effectively tackles the few-shot incremental learning problem by surpassing the performance of state-of-the-art baselines over benchmark datasets.
arXiv Detail & Related papers (2022-09-15T04:54:02Z) - SimCURL: Simple Contrastive User Representation Learning from Command
Sequences [22.92215383896495]
We propose SimCURL, a contrastive self-supervised deep learning framework that learns user representation from unlabeled command sequences.
We train and evaluate our method on a real-world command sequence dataset of more than half a billion commands.
arXiv Detail & Related papers (2022-07-29T16:06:03Z) - Contextual Squeeze-and-Excitation for Efficient Few-Shot Image
Classification [57.36281142038042]
We present a new adaptive block called Contextual Squeeze-and-Excitation (CaSE) that adjusts a pretrained neural network on a new task to significantly improve performance.
We also present a new training protocol based on Coordinate-Descent called UpperCaSE that exploits meta-trained CaSE blocks and fine-tuning routines for efficient adaptation.
arXiv Detail & Related papers (2022-06-20T15:25:08Z) - Attribute Inference Attack of Speech Emotion Recognition in Federated
Learning Settings [56.93025161787725]
Federated learning (FL) is a distributed machine learning paradigm that coordinates clients to train a model collaboratively without sharing local data.
We propose an attribute inference attack framework that infers sensitive attribute information of the clients from shared gradients or model parameters.
We show that the attribute inference attack is achievable for SER systems trained using FL.
arXiv Detail & Related papers (2021-12-26T16:50:42Z) - Routing with Self-Attention for Multimodal Capsule Networks [108.85007719132618]
We present a new multimodal capsule network that allows us to leverage the strength of capsules in the context of a multimodal learning framework.
To adapt the capsules to large-scale input data, we propose a novel routing by self-attention mechanism that selects relevant capsules.
This allows not only for robust training with noisy video data, but also to scale up the size of the capsule network compared to traditional routing methods.
arXiv Detail & Related papers (2021-12-01T19:01:26Z) - ARTA: Collection and Classification of Ambiguous Requests and Thoughtful
Actions [35.557857101679296]
Human-assisting systems must take thoughtful, appropriate actions for ambiguous user requests.
We develop a model that classifies ambiguous user requests into corresponding system actions.
Experiments show that the PU learning method achieved better performance than the general positive/negative learning method.
arXiv Detail & Related papers (2021-06-15T09:28:39Z) - BCFNet: A Balanced Collaborative Filtering Network with Attention
Mechanism [106.43103176833371]
Collaborative Filtering (CF) based recommendation methods have been widely studied.
We propose a novel recommendation model named Balanced Collaborative Filtering Network (BCFNet)
In addition, an attention mechanism is designed to better capture the hidden information within implicit feedback and strengthen the learning ability of the neural network.
arXiv Detail & Related papers (2021-03-10T14:59:23Z) - Reinforced Imitative Graph Representation Learning for Mobile User
Profiling: An Adversarial Training Perspective [21.829562421373712]
We study the problem of mobile user profiling, which is a critical component for quantifying users' characteristics in the human mobility modeling pipeline.
We propose an imitation-based mobile user profiling framework by exploiting reinforcement learning.
arXiv Detail & Related papers (2021-01-07T17:10:00Z) - Pre-Training for Query Rewriting in A Spoken Language Understanding
System [14.902583546933563]
We first propose a neural-retrieval based approach for query rewriting.
Then, inspired by the wide success of pre-trained contextual language embeddings, we propose a language-modeling (LM) based approach.
arXiv Detail & Related papers (2020-02-13T16:31:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.