Large-scale Transfer Learning for Low-resource Spoken Language
Understanding
- URL: http://arxiv.org/abs/2008.05671v1
- Date: Thu, 13 Aug 2020 03:43:05 GMT
- Title: Large-scale Transfer Learning for Low-resource Spoken Language
Understanding
- Authors: Xueli Jia, Jianzong Wang, Zhiyong Zhang, Ning Cheng, Jing Xiao
- Abstract summary: We propose an attention-based Spoken Language Understanding model together with three encoder enhancement strategies to overcome data sparsity challenge.
Cross-language transfer learning and multi-task strategies have been improved by up to 4:52% and 3:89% respectively, compared to the baseline.
- Score: 31.013231069185387
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: End-to-end Spoken Language Understanding (SLU) models are made increasingly
large and complex to achieve the state-ofthe-art accuracy. However, the
increased complexity of a model can also introduce high risk of over-fitting,
which is a major challenge in SLU tasks due to the limitation of available
data. In this paper, we propose an attention-based SLU model together with
three encoder enhancement strategies to overcome data sparsity challenge. The
first strategy focuses on the transferlearning approach to improve feature
extraction capability of the encoder. It is implemented by pre-training the
encoder component with a quantity of Automatic Speech Recognition annotated
data relying on the standard Transformer architecture and then fine-tuning the
SLU model with a small amount of target labelled data. The second strategy
adopts multitask learning strategy, the SLU model integrates the speech
recognition model by sharing the same underlying encoder, such that improving
robustness and generalization ability. The third strategy, learning from
Component Fusion (CF) idea, involves a Bidirectional Encoder Representation
from Transformer (BERT) model and aims to boost the capability of the decoder
with an auxiliary network. It hence reduces the risk of over-fitting and
augments the ability of the underlying encoder, indirectly. Experiments on the
FluentAI dataset show that cross-language transfer learning and multi-task
strategies have been improved by up to 4:52% and 3:89% respectively, compared
to the baseline.
Related papers
- A Single Transformer for Scalable Vision-Language Modeling [74.05173379908703]
We present SOLO, a single transformer for visiOn-Language mOdeling.
A unified single Transformer architecture, like SOLO, effectively addresses these scalability concerns in LVLMs.
In this paper, we introduce the first open-source training recipe for developing SOLO, an open-source 7B LVLM.
arXiv Detail & Related papers (2024-07-08T22:40:15Z) - Efficient Transformer Encoders for Mask2Former-style models [57.54752243522298]
ECO-M2F is a strategy to self-select the number of hidden layers in the encoder conditioned on the input image.
The proposed approach reduces expected encoder computational cost while maintaining performance.
It is flexible in architecture configurations, and can be extended beyond the segmentation task to object detection.
arXiv Detail & Related papers (2024-04-23T17:26:34Z) - Agent-driven Generative Semantic Communication with Cross-Modality and Prediction [57.335922373309074]
We propose a novel agent-driven generative semantic communication framework based on reinforcement learning.
In this work, we develop an agent-assisted semantic encoder with cross-modality capability, which can track the semantic changes, channel condition, to perform adaptive semantic extraction and sampling.
The effectiveness of the designed models has been verified using the UA-DETRAC dataset, demonstrating the performance gains of the overall A-GSC framework.
arXiv Detail & Related papers (2024-04-10T13:24:27Z) - Low-Resolution Self-Attention for Semantic Segmentation [96.81482872022237]
We introduce the Low-Resolution Self-Attention (LRSA) mechanism to capture global context at a significantly reduced computational cost.
Our approach involves computing self-attention in a fixed low-resolution space regardless of the input image's resolution.
We demonstrate the effectiveness of our LRSA approach by building the LRFormer, a vision transformer with an encoder-decoder structure.
arXiv Detail & Related papers (2023-10-08T06:10:09Z) - End-to-end spoken language understanding using joint CTC loss and
self-supervised, pretrained acoustic encoders [13.722028186368737]
We leverage self-supervised acoustic encoders fine-tuned with Connectionist Temporal Classification to extract textual embeddings.
Our model achieves 4% absolute improvement over the the state-of-the-art (SOTA) dialogue act classification model on the DSTC2 dataset.
arXiv Detail & Related papers (2023-05-04T15:36:37Z) - Deliberation Model for On-Device Spoken Language Understanding [69.5587671262691]
We propose a novel deliberation-based approach to end-to-end (E2E) spoken language understanding (SLU)
We show that our approach can significantly reduce the degradation when moving from natural speech to synthetic speech training.
arXiv Detail & Related papers (2022-04-04T23:48:01Z) - BLINC: Lightweight Bimodal Learning for Low-Complexity VVC Intra Coding [5.629161809575015]
Versatile Video Coding (VVC) achieves almost twice coding efficiency compared to its predecessor, the High Efficiency Video Coding (HEVC)
This paper proposes a novel machine learning approach that jointly and separately employs two modalities of features, to simplify the intra coding decision.
arXiv Detail & Related papers (2022-01-19T19:12:41Z) - Online Deep Learning based on Auto-Encoder [4.128388784932455]
We propose a two-phase Online Deep Learning based on Auto-Encoder (ODLAE)
Based on auto-encoder, considering reconstruction loss, we extract abstract hierarchical latent representations of instances.
We devise two fusion strategies: the output-level fusion strategy, which is obtained by fusing the classification results of each hidden layer; and feature-level fusion strategy, which is leveraged self-attention mechanism to fusion every hidden layer output.
arXiv Detail & Related papers (2022-01-19T02:14:57Z) - Optimization-driven Machine Learning for Intelligent Reflecting Surfaces
Assisted Wireless Networks [82.33619654835348]
Intelligent surface (IRS) has been employed to reshape the wireless channels by controlling individual scattering elements' phase shifts.
Due to the large size of scattering elements, the passive beamforming is typically challenged by the high computational complexity.
In this article, we focus on machine learning (ML) approaches for performance in IRS-assisted wireless networks.
arXiv Detail & Related papers (2020-08-29T08:39:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.