AMTSS: An Adaptive Multi-Teacher Single-Student Knowledge Distillation
Framework For Multilingual Language Inference
- URL: http://arxiv.org/abs/2305.07928v1
- Date: Sat, 13 May 2023 14:42:30 GMT
- Title: AMTSS: An Adaptive Multi-Teacher Single-Student Knowledge Distillation
Framework For Multilingual Language Inference
- Authors: Qianglong Chen, Feng Ji, Feng-Lin Li, Guohai Xu, Ming Yan, Ji Zhang
and Yin Zhang
- Abstract summary: AMTSS is an adaptive multi-teacher single-student distillation framework.
We first introduce an adaptive learning strategy and teacher importance weight, which enables a student to effectively learn from max-margin teachers.
We present a shared student with different projection layers in support of multiple languages, which contributes to largely reducing development and machine cost.
- Score: 27.333905128454546
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Knowledge distillation is of key importance to launching multilingual
pre-trained language models for real applications. To support cost-effective
language inference in multilingual settings, we propose AMTSS, an adaptive
multi-teacher single-student distillation framework, which allows distilling
knowledge from multiple teachers to a single student. We first introduce an
adaptive learning strategy and teacher importance weight, which enables a
student to effectively learn from max-margin teachers and easily adapt to new
languages. Moreover, we present a shared student encoder with different
projection layers in support of multiple languages, which contributes to
largely reducing development and machine cost. Experimental results show that
AMTSS gains competitive results on the public XNLI dataset and the realistic
industrial dataset AliExpress (AE) in the E-commerce scenario.
Related papers
- Efficient Multilingual ASR Finetuning via LoRA Language Experts [59.27778147311189]
This paper proposes an efficient finetuning framework for customized multilingual ASR via prepared LoRA language experts based on Whisper.<n>Through LoRA expert fusion or knowledge distillation, our approach achieves better recognition performance on target languages than standard fine-tuning methods.<n> Experimental results demonstrate that the proposed models yield approximately 10% and 15% relative performance gains in language-aware and language-agnostic scenarios.
arXiv Detail & Related papers (2025-06-11T07:06:27Z) - Simulating LLM-to-LLM Tutoring for Multilingual Math Feedback [11.889826908536941]
We present the first large-scale simulation of multilingual tutor-student interactions using large language models (LLMs)<n>A stronger model plays the role of the tutor, generating feedback in the form of hints, while a weaker model simulates the student.<n>Our study examines how student input language, teacher feedback language, model choice, and language resource level jointly influence performance.
arXiv Detail & Related papers (2025-06-05T11:53:04Z) - LUSIFER: Language Universal Space Integration for Enhanced Multilingual Embeddings with Large Language Models [89.13128402847943]
We present LUSIFER, a novel zero-shot approach that adapts LLM-based embedding models for multilingual tasks without requiring multilingual supervision.
LUSIFER's architecture combines a multilingual encoder, serving as a language-universal learner, with an LLM-based embedding model optimized for embedding-specific tasks.
We introduce a new benchmark encompassing 5 primary embedding tasks, 123 diverse datasets, and coverage across 14 languages.
arXiv Detail & Related papers (2025-01-01T15:43:07Z) - Extracting and Transferring Abilities For Building Multi-lingual Ability-enhanced Large Language Models [104.96990850774566]
We propose a Multi-lingual Ability Extraction and Transfer approach, named as MAET.
Our key idea is to decompose and extract language-agnostic ability-related weights from large language models.
Experiment results show MAET can effectively and efficiently extract and transfer the advanced abilities, and outperform training-based baseline methods.
arXiv Detail & Related papers (2024-10-10T11:23:18Z) - Investigating Multilingual Instruction-Tuning: Do Polyglot Models Demand for Multilingual Instructions? [42.37657013017192]
We show that instruction-tuning on parallel instead of monolingual corpora benefits cross-lingual instruction following capabilities by up to 9.9%.
We also conduct a human annotation study to understand the alignment between human-based and GPT-4-based evaluation within multilingual chat scenarios.
arXiv Detail & Related papers (2024-02-21T11:07:07Z) - UltraLink: An Open-Source Knowledge-Enhanced Multilingual Supervised
Fine-tuning Dataset [69.33424532827608]
Open-source large language models (LLMs) have gained significant strength across diverse fields.
In this work, we construct an open-source multilingual supervised fine-tuning dataset.
The resulting UltraLink dataset comprises approximately 1 million samples across five languages.
arXiv Detail & Related papers (2024-02-07T05:05:53Z) - UM4: Unified Multilingual Multiple Teacher-Student Model for
Zero-Resource Neural Machine Translation [102.04003089261761]
Multilingual neural machine translation (MNMT) enables one-pass translation using shared semantic space for all languages.
We propose a novel method, named as Unified Multilingual Multiple teacher-student Model for NMT (UM4)
Our method unifies source-teacher, target-teacher, and pivot-teacher models to guide the student model for the zero-resource translation.
arXiv Detail & Related papers (2022-07-11T14:22:59Z) - Large-scale Bilingual Language-Image Contrastive Learning [17.19890778916312]
We collect 1.1 billion image-text pairs (708 million Korean and 476 million English) and train a bilingual multimodal model named KELIP.
We introduce simple yet effective training schemes, including MAE pre-training and multi-crop augmentation.
Experiments demonstrate that a model trained with such training schemes shows competitive performance in both languages.
arXiv Detail & Related papers (2022-03-28T03:02:03Z) - Breaking Down Multilingual Machine Translation [74.24795388967907]
We show that multilingual training is beneficial to encoders in general, while it only benefits decoders for low-resource languages (LRLs)
Our many-to-one models for high-resource languages and one-to-many models for LRLs outperform the best results reported by Aharoni et al.
arXiv Detail & Related papers (2021-10-15T14:57:12Z) - Exploring Teacher-Student Learning Approach for Multi-lingual
Speech-to-Intent Classification [73.5497360800395]
We develop an end-to-end system that supports multiple languages.
We exploit knowledge from a pre-trained multi-lingual natural language processing model.
arXiv Detail & Related papers (2021-09-28T04:43:11Z) - Towards Developing a Multilingual and Code-Mixed Visual Question
Answering System by Knowledge Distillation [20.33235443471006]
We propose a knowledge distillation approach to extend an English language-vision model (teacher) into an equally effective multilingual and code-mixed model (student)
We also create the large-scale multilingual and code-mixed VQA dataset in eleven different language setups.
Experimental results and in-depth analysis show the effectiveness of the proposed VQA model over the pre-trained language-vision models on eleven diverse language setups.
arXiv Detail & Related papers (2021-09-10T03:47:29Z) - MergeDistill: Merging Pre-trained Language Models using Distillation [5.396915402673246]
We propose MergeDistill, a framework to merge pre-trained LMs in a way that can best leverage their assets with minimal dependencies.
We demonstrate the applicability of our framework in a practical setting by leveraging pre-existing teacher LMs and training student LMs that perform competitively with or even outperform teacher LMs trained on several orders of magnitude more data and with a fixed model capacity.
arXiv Detail & Related papers (2021-06-05T08:22:05Z) - LightMBERT: A Simple Yet Effective Method for Multilingual BERT
Distillation [45.65004479806485]
multilingual pre-trained language models have shown impressive performance on cross-lingual natural language understanding tasks.
These models are computationally intensive and difficult to be deployed on resource-restricted devices.
We propose a simple yet effective distillation method (LightMBERT) for transferring the cross-lingual generalization ability of the multilingual BERT to a small student model.
arXiv Detail & Related papers (2021-03-11T02:24:41Z) - Cross-lingual Machine Reading Comprehension with Language Branch
Knowledge Distillation [105.41167108465085]
Cross-lingual Machine Reading (CLMRC) remains a challenging problem due to the lack of large-scale datasets in low-source languages.
We propose a novel augmentation approach named Language Branch Machine Reading (LBMRC)
LBMRC trains multiple machine reading comprehension (MRC) models proficient in individual language.
We devise a multilingual distillation approach to amalgamate knowledge from multiple language branch models to a single model for all target languages.
arXiv Detail & Related papers (2020-10-27T13:12:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.