AdaptiVocab: Enhancing LLM Efficiency in Focused Domains through Lightweight Vocabulary Adaptation
- URL: http://arxiv.org/abs/2503.19693v2
- Date: Fri, 01 Aug 2025 10:53:31 GMT
- Title: AdaptiVocab: Enhancing LLM Efficiency in Focused Domains through Lightweight Vocabulary Adaptation
- Authors: Itay Nakash, Nitay Calderon, Eyal Ben David, Elad Hoffer, Roi Reichart,
- Abstract summary: AdaptiVocab is an end-to-end approach for vocabulary adaptation.<n>It modifies the vocabulary by replacing tokens with domain-specific n-gram-based tokens.<n>Our results show that AdaptiVocab reduces token usage by over 25% without compromising performance.
- Score: 21.71269358186483
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large Language Models (LLMs) have shown impressive versatility as general purpose models. However, their broad applicability comes at a high-cost computational overhead, particularly in auto-regressive decoding where each step requires a forward pass. In domain-specific settings, general-purpose capabilities are unnecessary and can be exchanged for efficiency. In this work, we take a novel perspective on domain adaptation, reducing latency and computational costs by adapting the vocabulary to focused domains of interest. We introduce AdaptiVocab, an end-to-end approach for vocabulary adaptation, designed to enhance LLM efficiency in low-resource domains. AdaptiVocab can be applied to any tokenizer and architecture, modifying the vocabulary by replacing tokens with domain-specific n-gram-based tokens, thereby reducing the number of tokens required for both input processing and output generation. AdaptiVocab initializes new n-token embeddings using an exponentially weighted combination of existing embeddings and employs a lightweight fine-tuning phase that can be efficiently performed on a single GPU. We evaluate two 7B LLMs across three niche domains, assessing efficiency, generation quality, and end-task performance. Our results show that AdaptiVocab reduces token usage by over 25% without compromising performance
Related papers
- AdaptBPE: From General Purpose to Specialized Tokenizers [18.70903226766322]
We propose a post-training adaptation strategy that selectively replaces low-utility tokens with more relevant ones based on their frequency in an adaptation corpus.<n>Our algorithm identifies the token inventory that most effectively encodes the adaptation corpus for a given target vocabulary size.<n>This method serves as a lightweight adaptation mechanism, akin to a vocabulary fine-tuning process, enabling optimized tokenization for specific domains or tasks.
arXiv Detail & Related papers (2026-01-29T12:59:40Z) - Vocabulary Customization for Efficient Domain-Specific LLM Deployment [7.455386813746299]
General-domain tokenizer fails to capture frequent domain-specific terms, leading to higher token fertility and thus a decrease in processing speed.<n>We address this limitation by augmenting the pretrained vocabulary with a set of domain-specific tokens.<n> Evaluated on real-world e-Commerce use-cases, the augmented tokenizer significantly shortens input sequences by up to 20%.
arXiv Detail & Related papers (2025-09-30T11:43:12Z) - Can Smaller LLMs do better? Unlocking Cross-Domain Potential through Parameter-Efficient Fine-Tuning for Text Summarization [15.402666674186937]
We leverage parameter-efficient fine-tuning techniques (PEFTs) on high-resource datasets to improve performance on unseen low-resource domains.<n>We benchmark six PEFTs with textttLlama-3-8B-Instruct on 14 training datasets from the Scientific, Medical, Legal, and News domains.<n>Experiments show that for low-resource domains, inference using Within-Domain Adapters can achieve better performance than Few-Shot.
arXiv Detail & Related papers (2025-09-01T09:58:52Z) - Efficient Multi-modal Long Context Learning for Training-free Adaptation [96.21248144937627]
This paper introduces Efficient Multi-Modal Long Context Learning (EMLoC)<n>It embeds demonstration examples directly into the model input.<n>It condenses long-context multimodal inputs into compact, task-specific memory representations.
arXiv Detail & Related papers (2025-05-26T10:49:44Z) - HYPEROFA: Expanding LLM Vocabulary to New Languages via Hypernetwork-Based Embedding Initialization [50.27950279695363]
Many pre-trained language models (PLMs) exhibit suboptimal performance on mid- and low-resource languages.<n>A common strategy to address this is to introduce new tokens specific to the target languages, initialize their embeddings, and apply continual pre-training on target-language data.<n>We propose HYPEROFA, a hypernetwork-based approach for more adaptive token embedding.
arXiv Detail & Related papers (2025-04-21T19:40:32Z) - Semantic Library Adaptation: LoRA Retrieval and Fusion for Open-Vocabulary Semantic Segmentation [72.28364940168092]
Open-vocabulary semantic segmentation models associate vision and text to label pixels from an undefined set of classes using textual queries.
We introduce Semantic Library Adaptation (SemLA), a novel framework for training-free, test-time domain adaptation.
arXiv Detail & Related papers (2025-03-27T17:59:58Z) - InternVL-X: Advancing and Accelerating InternVL Series with Efficient Visual Token Compression [1.8893427856534721]
We propose InternVL-X, which outperforms the InternVL model in both performance and efficiency.
By utilizing 20% or fewer visual tokens, InternVL-X achieves state-of-the-art performance on 7 public MLLM benchmarks, and improves the average metric by 2.34% across 12 tasks.
arXiv Detail & Related papers (2025-03-27T09:31:35Z) - Small Models, Big Impact: Efficient Corpus and Graph-Based Adaptation of Small Multilingual Language Models for Low-Resource Languages [10.418542753869433]
Low-resource languages (LRLs) face significant challenges in natural language processing (NLP) due to limited data.<n>Current state-of-the-art large language models (LLMs) still struggle with LRLs.<n>Small multilingual models (mLMs) such as mBERT and XLM-R offer greater promise due to a better fit of their capacity to low training data sizes.
arXiv Detail & Related papers (2025-02-14T13:10:39Z) - Over-Tokenized Transformer: Vocabulary is Generally Worth Scaling [10.985444895887207]
We introduce Over-Tokenized Transformers, a framework that decouples input and output vocabularies to improve language modeling performance.
We uncover a log-linear relationship between input vocabulary size and training loss, demonstrating that larger input vocabularies consistently enhance model performance.
Our findings highlight the importance of tokenization in scaling laws and provide practical insight for tokenizer design.
arXiv Detail & Related papers (2025-01-28T14:15:42Z) - Accelerating Multimodal Large Language Models by Searching Optimal Vision Token Reduction [62.8375542401319]
Multimodal Large Language Models (MLLMs) encode the input image(s) as vision tokens and feed them into the language backbone.<n>The number of vision tokens increases quadratically as the image resolutions, leading to huge computational costs.<n>We propose a greedy search algorithm (G-Search) to find the least number of vision tokens to keep at each layer from the shallow to the deep.
arXiv Detail & Related papers (2024-11-30T18:54:32Z) - ATP-LLaVA: Adaptive Token Pruning for Large Vision Language Models [32.6661928486072]
ATP-LLaVA is a novel approach that adaptively determines instance-specific token pruning ratios for each Large Language Model layer.<n>Our approach reduces the average token count by 75% while maintaining performance, with only a minimal 1.9% degradation across seven widely used benchmarks.
arXiv Detail & Related papers (2024-11-30T11:42:35Z) - Retrofitting Large Language Models with Dynamic Tokenization [3.608780819053423]
We propose retrofitting current language models with dynamic tokenization.<n>We merge frequent subword sequences in a batch, then apply a pre-trained embedding-prediction hypernetwork to compute the token embeddings on-the-fly.<n>We find that dynamic tokenization can mitigate the limitations of static tokenization by substantially improving inference speed and promoting fairness across languages.
arXiv Detail & Related papers (2024-11-27T17:51:58Z) - How to Learn a New Language? An Efficient Solution for Self-Supervised Learning Models Unseen Languages Adaption in Low-Resource Scenario [72.02391485962127]
Speech Self-Supervised Learning (SSL) models achieve impressive performance on Automatic Speech Recognition (ASR)<n>In low-resource language ASR, they encounter the domain mismatch problem between pre-trained and low-resource languages.<n>We extend a conventional efficient fine-tuning scheme based on the adapter to handle these issues.
arXiv Detail & Related papers (2024-11-27T10:51:00Z) - How to Make LLMs Strong Node Classifiers? [70.14063765424012]
Language Models (LMs) are challenging the dominance of domain-specific models, such as Graph Neural Networks (GNNs) and Graph Transformers (GTs)<n>We propose a novel approach that empowers off-the-shelf LMs to achieve performance comparable to state-of-the-art (SOTA) GNNs on node classification tasks.
arXiv Detail & Related papers (2024-10-03T08:27:54Z) - Adaptive Draft-Verification for Efficient Large Language Model Decoding [24.347886232342862]
Large language model (LLM) decoding involves generating a sequence of tokens based on a given context.
The typical autoregressive decoding method requires a separate forward pass through the model for each token generated.
We introduce ADED, which accelerates LLM decoding without requiring fine-tuning.
arXiv Detail & Related papers (2024-06-27T22:20:39Z) - OpenDAS: Open-Vocabulary Domain Adaptation for 2D and 3D Segmentation [54.98688607911399]
We propose the task of open-vocabulary domain adaptation to infuse domain-specific knowledge into Vision-Language Models (VLMs)
Existing VLM adaptation methods improve performance on base (training) queries, but fail to preserve the open-set capabilities of VLMs on novel queries.
Our approach is the only parameter-efficient method that consistently surpasses the original VLM on novel classes.
arXiv Detail & Related papers (2024-05-30T15:16:06Z) - APoLLo: Unified Adapter and Prompt Learning for Vision Language Models [58.9772868980283]
We present APoLLo, a unified multi-modal approach that combines Adapter and Prompt learning for Vision-Language models.
APoLLo achieves a relative gain up to 6.03% over MaPLe (SOTA) on novel classes for 10 diverse image recognition datasets.
arXiv Detail & Related papers (2023-12-04T01:42:09Z) - Non-Parametric Unsupervised Domain Adaptation for Neural Machine
Translation [61.27321597981737]
$k$NN-MT has shown the promising capability of directly incorporating the pre-trained neural machine translation (NMT) model with domain-specific token-level $k$-nearest-neighbor retrieval.
We propose a novel framework that directly uses in-domain monolingual sentences in the target language to construct an effective datastore for $k$-nearest-neighbor retrieval.
arXiv Detail & Related papers (2021-09-14T11:50:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.