I$^2$KD-SLU: An Intra-Inter Knowledge Distillation Framework for
Zero-Shot Cross-Lingual Spoken Language Understanding
- URL: http://arxiv.org/abs/2310.02594v1
- Date: Wed, 4 Oct 2023 05:45:23 GMT
- Title: I$^2$KD-SLU: An Intra-Inter Knowledge Distillation Framework for
Zero-Shot Cross-Lingual Spoken Language Understanding
- Authors: Tianjun Mao and Chenghong Zhang
- Abstract summary: We propose an Intra-Inter Knowledge Distillation framework for zero-shot cross-lingual Spoken Language Understanding (I$2$KD-SLU)
Specifically, we not only apply intra-knowledge distillation between intent predictions or slot predictions of the same utterance in different languages, but also apply inter-knowledge distillation between intent predictions and slot predictions of the same utterance.
Our experimental results demonstrate that our proposed framework significantly improves the performance compared with the strong baselines.
- Score: 1.223779595809275
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Spoken language understanding (SLU) typically includes two subtasks: intent
detection and slot filling. Currently, it has achieved great success in
high-resource languages, but it still remains challenging in low-resource
languages due to the scarcity of labeled training data. Hence, there is a
growing interest in zero-shot cross-lingual SLU. Despite of the success of
existing zero-shot cross-lingual SLU models, most of them neglect to achieve
the mutual guidance between intent and slots. To address this issue, we propose
an Intra-Inter Knowledge Distillation framework for zero-shot cross-lingual
Spoken Language Understanding (I$^2$KD-SLU) to model the mutual guidance.
Specifically, we not only apply intra-knowledge distillation between intent
predictions or slot predictions of the same utterance in different languages,
but also apply inter-knowledge distillation between intent predictions and slot
predictions of the same utterance. Our experimental results demonstrate that
our proposed framework significantly improves the performance compared with the
strong baselines and achieves the new state-of-the-art performance on the
MultiATIS++ dataset, obtaining a significant improvement over the previous best
model in overall accuracy.
Related papers
- Towards Spoken Language Understanding via Multi-level Multi-grained Contrastive Learning [50.1035273069458]
Spoken language understanding (SLU) is a core task in task-oriented dialogue systems.
We propose a multi-level MMCL framework to apply contrastive learning at three levels, including utterance level, slot level, and word level.
Our framework achieves new state-of-the-art results on two public multi-intent SLU datasets.
arXiv Detail & Related papers (2024-05-31T14:34:23Z) - Cross-Lingual NER for Financial Transaction Data in Low-Resource
Languages [70.25418443146435]
We propose an efficient modeling framework for cross-lingual named entity recognition in semi-structured text data.
We employ two independent datasets of SMSs in English and Arabic, each carrying semi-structured banking transaction information.
With access to only 30 labeled samples, our model can generalize the recognition of merchants, amounts, and other fields from English to Arabic.
arXiv Detail & Related papers (2023-07-16T00:45:42Z) - The Interpreter Understands Your Meaning: End-to-end Spoken Language
Understanding Aided by Speech Translation [13.352795145385645]
Speech translation (ST) is a good means of pretraining speech models for end-to-end spoken language understanding.
We show that our models reach higher performance over baselines on monolingual and multilingual intent classification.
We also create new benchmark datasets for speech summarization and low-resource/zero-shot transfer from English to French or Spanish.
arXiv Detail & Related papers (2023-05-16T17:53:03Z) - Multi-level Contrastive Learning for Cross-lingual Spoken Language
Understanding [90.87454350016121]
We develop novel code-switching schemes to generate hard negative examples for contrastive learning at all levels.
We develop a label-aware joint model to leverage label semantics for cross-lingual knowledge transfer.
arXiv Detail & Related papers (2022-05-07T13:44:28Z) - GL-CLeF: A Global-Local Contrastive Learning Framework for Cross-lingual
Spoken Language Understanding [74.39024160277809]
We present Global--Local Contrastive Learning Framework (GL-CLeF) to address this shortcoming.
Specifically, we employ contrastive learning, leveraging bilingual dictionaries to construct multilingual views of the same utterance.
GL-CLeF achieves the best performance and successfully pulls representations of similar sentences across languages closer.
arXiv Detail & Related papers (2022-04-18T13:56:58Z) - From Good to Best: Two-Stage Training for Cross-lingual Machine Reading
Comprehension [51.953428342923885]
We develop a two-stage approach to enhance the model performance.
The first stage targets at recall: we design a hard-learning (HL) algorithm to maximize the likelihood that the top-k predictions contain the accurate answer.
The second stage focuses on precision: an answer-aware contrastive learning mechanism is developed to learn the fine difference between the accurate answer and other candidates.
arXiv Detail & Related papers (2021-12-09T07:31:15Z) - Towards Multi-Sense Cross-Lingual Alignment of Contextual Embeddings [41.148892848434585]
We propose a novel framework to align contextual embeddings at the sense level by leveraging cross-lingual signal from bilingual dictionaries only.
We operationalize our framework by first proposing a novel sense-aware cross entropy loss to model word senses explicitly.
We then propose a sense alignment objective on top of the sense-aware cross entropy loss for cross-lingual model pretraining, and pretrain cross-lingual models for several language pairs.
arXiv Detail & Related papers (2021-03-11T04:55:35Z) - Mixed-Lingual Pre-training for Cross-lingual Summarization [54.4823498438831]
Cross-lingual Summarization aims at producing a summary in the target language for an article in the source language.
We propose a solution based on mixed-lingual pre-training that leverages both cross-lingual tasks like translation and monolingual tasks like masked language models.
Our model achieves an improvement of 2.82 (English to Chinese) and 1.15 (Chinese to English) ROUGE-1 scores over state-of-the-art results.
arXiv Detail & Related papers (2020-10-18T00:21:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.