Related papers: The Ups and Downs of Large Language Model Inference with Vocabulary Trimming by Language Heuristics

The Ups and Downs of Large Language Model Inference with Vocabulary Trimming by Language Heuristics

URL: http://arxiv.org/abs/2311.09709v2
Date: Sun, 28 Apr 2024 23:43:53 GMT
Title: The Ups and Downs of Large Language Model Inference with Vocabulary Trimming by Language Heuristics
Authors: Nikolay Bogoychev, Pinzhen Chen, Barry Haddow, Alexandra Birch,
Abstract summary: This research examines vocabulary trimming (VT) inspired by restricting embedding entries to the language of interest to bolster time and memory efficiency. We apply two languages to trim the full vocabulary - Unicode-based script filtering and corpus-based selection - to different language families and sizes. It is found that VT reduces the memory usage of small models by nearly 50% and has an upper bound of 25% improvement in generation speed.
Score: 74.99898531299148
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Deploying large language models (LLMs) encounters challenges due to intensive computational and memory requirements. Our research examines vocabulary trimming (VT) inspired by restricting embedding entries to the language of interest to bolster time and memory efficiency. While such modifications have been proven effective in tasks like machine translation, tailoring them to LLMs demands specific modifications given the diverse nature of LLM applications. We apply two language heuristics to trim the full vocabulary - Unicode-based script filtering and corpus-based selection - to different LLM families and sizes. The methods are straightforward, interpretable, and easy to implement. It is found that VT reduces the memory usage of small models by nearly 50% and has an upper bound of 25% improvement in generation speed. Yet, we reveal the limitations of these methods in that they do not perform consistently well for each language with diminishing returns in larger models.

Related papers

Optimizing LLMs for Italian: Reducing Token Fertility and Enhancing Efficiency Through Vocabulary Adaptation [39.60572668223083]
State-of-the-art Large Language Models (LLMs) can handle other languages, due to language contamination or some degree of multilingual pretraining data, but not optimized for non-English languages. In this work, we thoroughly compare a variety of vocabulary adaptation techniques for optimizing English LLMs for the Italian language. We adapt two LLMs: Mistral-7b-v0.1, reducing token fertility by 25%, and Llama-3.1-8B, optimizing the vocabulary and reducing the number of parameters by 1 billion.
arXiv Detail & Related papers (2025-04-23T18:12:27Z)
Franken-Adapter: Cross-Lingual Adaptation of LLMs by Embedding Surgery [31.516243610548635]
We present $textitFranken-Adapter$, a modular language adaptation approach for decoder-only Large Language Models. Our method begins by creating customized vocabularies for target languages and performing language adaptation through embedding tuning on multilingual data. Experiments on $ttGemma2$ models with up to 27B parameters demonstrate improvements of up to 20% across 96 languages, spanning both discriminative and generative tasks.
arXiv Detail & Related papers (2025-02-12T00:38:11Z)
Enhancing Code Generation for Low-Resource Languages: No Silver Bullet [55.39571645315926]
Large Language Models (LLMs) rely on large and diverse datasets to learn syntax, semantics, and usage patterns of programming languages. For low-resource languages, the limited availability of such data hampers the models' ability to generalize effectively. We present an empirical study investigating the effectiveness of several approaches for boosting LLMs' performance on low-resource languages.
arXiv Detail & Related papers (2025-01-31T12:23:28Z)
Understanding and Mitigating Language Confusion in LLMs [76.96033035093204]
We evaluate 15 typologically diverse languages with existing and newly-created English and multilingual prompts. We find that Llama Instruct and Mistral models exhibit high degrees of language confusion. We find that language confusion can be partially mitigated via few-shot prompting, multilingual SFT and preference tuning.
arXiv Detail & Related papers (2024-06-28T17:03:51Z)
Crosslingual Capabilities and Knowledge Barriers in Multilingual Large Language Models [62.91524967852552]
Large language models (LLMs) are typically multilingual due to pretraining on diverse multilingual corpora. But can these models relate corresponding concepts across languages, effectively being crosslingual? This study evaluates six state-of-the-art LLMs on inherently crosslingual tasks.
arXiv Detail & Related papers (2024-06-23T15:15:17Z)
Exploring Design Choices for Building Language-Specific LLMs [36.32622880071991]
We study building language-specific language models by adapting monolingual and multilingual models. We find that the initial performance of LLM does not always correlate with the final performance after the adaptation.
arXiv Detail & Related papers (2024-06-20T18:47:43Z)
How Can We Effectively Expand the Vocabulary of LLMs with 0.01GB of Target Language Text? [38.1823640848362]
Large language models (LLMs) have shown remarkable capabilities in many languages beyond English. LLMs require more inference steps when generating non-English text due to their reliance on English-centric tokenizers and vocabulary. Vocabulary expansion with target language tokens is a widely used cross-lingual vocabulary adaptation approach to remedy this issue.
arXiv Detail & Related papers (2024-06-17T12:42:34Z)
Accelerating Multilingual Language Model for Excessively Tokenized Languages [3.5570874721859016]
tokenizers in large language models (LLMs) often fragment a text into character or Unicode-level tokens in non-Roman alphabetic languages. We introduce a simple yet effective framework to accelerate text generation in such languages.
arXiv Detail & Related papers (2024-01-19T12:26:57Z)
Soft Language Clustering for Multilingual Model Pre-training [57.18058739931463]
We propose XLM-P, which contextually retrieves prompts as flexible guidance for encoding instances conditionally. Our XLM-P enables (1) lightweight modeling of language-invariant and language-specific knowledge across languages, and (2) easy integration with other multilingual pre-training methods.
arXiv Detail & Related papers (2023-06-13T08:08:08Z)
Chain-of-Dictionary Prompting Elicits Translation in Large Language Models [100.47154959254937]
Large language models (LLMs) have shown surprisingly good performance in multilingual neural machine translation (MNMT) We present a novel method, CoD, which augments LLMs with prior knowledge with the chains of multilingual dictionaries for a subset of input words to elicit translation abilities.
arXiv Detail & Related papers (2023-05-11T05:19:47Z)
UNKs Everywhere: Adapting Multilingual Language Models to New Scripts [103.79021395138423]
Massively multilingual language models such as multilingual BERT (mBERT) and XLM-R offer state-of-the-art cross-lingual transfer performance on a range of NLP tasks. Due to their limited capacity and large differences in pretraining data, there is a profound performance gap between resource-rich and resource-poor target languages. We propose novel data-efficient methods that enable quick and effective adaptation of pretrained multilingual models to such low-resource languages and unseen scripts.
arXiv Detail & Related papers (2020-12-31T11:37:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.