Massively Multilingual Shallow Fusion with Large Language Models
- URL: http://arxiv.org/abs/2302.08917v1
- Date: Fri, 17 Feb 2023 14:46:38 GMT
- Title: Massively Multilingual Shallow Fusion with Large Language Models
- Authors: Ke Hu, Tara N. Sainath, Bo Li, Nan Du, Yanping Huang, Andrew M. Dai,
Yu Zhang, Rodrigo Cabrera, Zhifeng Chen, Trevor Strohman
- Abstract summary: We train a single multilingual language model (LM) for shallow fusion in multiple languages.
Compared to a dense LM of similar computation during inference, GLaM reduces the WER of an English long-tail test set by 4.4% relative.
In a multilingual shallow fusion task, GLaM improves 41 out of 50 languages with an average relative WER reduction of 3.85%, and a maximum reduction of 10%.
- Score: 62.76735265311028
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While large language models (LLM) have made impressive progress in natural
language processing, it remains unclear how to utilize them in improving
automatic speech recognition (ASR). In this work, we propose to train a single
multilingual language model (LM) for shallow fusion in multiple languages. We
push the limits of the multilingual LM to cover up to 84 languages by scaling
up using a mixture-of-experts LLM, i.e., generalist language model (GLaM). When
the number of experts increases, GLaM dynamically selects only two at each
decoding step to keep the inference computation roughly constant. We then apply
GLaM to a multilingual shallow fusion task based on a state-of-the-art
end-to-end model. Compared to a dense LM of similar computation during
inference, GLaM reduces the WER of an English long-tail test set by 4.4%
relative. In a multilingual shallow fusion task, GLaM improves 41 out of 50
languages with an average relative WER reduction of 3.85%, and a maximum
reduction of 10%. Compared to the baseline model, GLaM achieves an average WER
reduction of 5.53% over 43 languages.
Related papers
- Language Imbalance Driven Rewarding for Multilingual Self-improving [35.1576728251478]
Large Language Models (LLMs) have achieved state-of-the-art performance across numerous tasks.
This imbalance, while limiting broader applications, generates a natural preference ranking between languages.
We propose $textitLanguage Imbalance Driven Rewarding$, where the inherent imbalance between dominant and non-dominant languages is leveraged as a reward signal.
arXiv Detail & Related papers (2024-10-11T16:32:05Z) - The Ups and Downs of Large Language Model Inference with Vocabulary Trimming by Language Heuristics [74.99898531299148]
This research examines vocabulary trimming (VT) inspired by restricting embedding entries to the language of interest to bolster time and memory efficiency.
We apply two languages to trim the full vocabulary - Unicode-based script filtering and corpus-based selection - to different language families and sizes.
It is found that VT reduces the memory usage of small models by nearly 50% and has an upper bound of 25% improvement in generation speed.
arXiv Detail & Related papers (2023-11-16T09:35:50Z) - Breaking Language Barriers in Multilingual Mathematical Reasoning: Insights and Observations [59.056367787688146]
This paper pioneers exploring and training powerful Multilingual Math Reasoning (xMR) LLMs.
We construct the first multilingual math reasoning instruction dataset, MGSM8KInstruct, encompassing ten distinct languages.
By utilizing translation, we construct the first multilingual math reasoning instruction dataset, MGSM8KInstruct, encompassing ten distinct languages.
arXiv Detail & Related papers (2023-10-31T08:09:20Z) - PolyLM: An Open Source Polyglot Large Language Model [57.64420154135178]
We present PolyLM, a multilingual large language model (LLMs) trained on 640 billion (B) tokens, avaliable in two model sizes: 1.7B and 13B.
To enhance its multilingual capabilities, we 1) integrate bilingual data into training data; and 2) adopt a curriculum learning strategy that increases the proportion of non-English data from 30% in the first stage to 60% in the final stage during pre-training.
Further, we propose a multilingual self-instruct method which automatically generates 132.7K diverse multilingual instructions for model fine-tuning.
arXiv Detail & Related papers (2023-07-12T09:00:37Z) - Democratizing LLMs for Low-Resource Languages by Leveraging their English Dominant Abilities with Linguistically-Diverse Prompts [75.33019401706188]
Large language models (LLMs) are known to effectively perform tasks by simply observing few exemplars.
We propose to assemble synthetic exemplars from a diverse set of high-resource languages to prompt the LLMs to translate from any language into English.
Our unsupervised prompting method performs on par with supervised few-shot learning in LLMs of different sizes for translations between English and 13 Indic and 21 African low-resource languages.
arXiv Detail & Related papers (2023-06-20T08:27:47Z) - Cross-lingual Machine Reading Comprehension with Language Branch
Knowledge Distillation [105.41167108465085]
Cross-lingual Machine Reading (CLMRC) remains a challenging problem due to the lack of large-scale datasets in low-source languages.
We propose a novel augmentation approach named Language Branch Machine Reading (LBMRC)
LBMRC trains multiple machine reading comprehension (MRC) models proficient in individual language.
We devise a multilingual distillation approach to amalgamate knowledge from multiple language branch models to a single model for all target languages.
arXiv Detail & Related papers (2020-10-27T13:12:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.