Vocabulary Customization for Efficient Domain-Specific LLM Deployment
- URL: http://arxiv.org/abs/2509.26124v1
- Date: Tue, 30 Sep 2025 11:43:12 GMT
- Title: Vocabulary Customization for Efficient Domain-Specific LLM Deployment
- Authors: Christian Herold, Michael Kozielski, Nicholas Santavas, Yannick Versley, Shahram Khadivi,
- Abstract summary: General-domain tokenizer fails to capture frequent domain-specific terms, leading to higher token fertility and thus a decrease in processing speed.<n>We address this limitation by augmenting the pretrained vocabulary with a set of domain-specific tokens.<n> Evaluated on real-world e-Commerce use-cases, the augmented tokenizer significantly shortens input sequences by up to 20%.
- Score: 7.455386813746299
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: When using an LLM to process text outside the training domain(s), an often overlooked factor is vocabulary mismatch, where the general-domain tokenizer fails to capture frequent domain-specific terms, leading to higher token fertility and thus a decrease in processing speed due to suboptimal sub-word splits. We address this limitation by augmenting the pretrained vocabulary with a set of domain-specific tokens. To this end, we design an algorithm that extends an existing tokenizer while guaranteeing it never decreases tokenization efficiency: every input sequence is segmented into at most the same number of tokens as before. Evaluated on real-world e-Commerce use-cases, the augmented tokenizer significantly shortens input sequences by up to 20% and reduces inference latency on downstream tasks while preserving predictive quality. We further analyze secondary effects, such as the impact on forward pass speed and the rate at which the model adopts the newly introduced tokens, to illustrate the broader benefits of vocabulary adaptation.
Related papers
- LiteToken: Removing Intermediate Merge Residues From BPE Tokenizers [76.59130257385826]
Intermediate merge residues in BPE vocabularies are frequent during merge learning so that retained in the final vocabulary, but are mostly further merged and rarely emitted when tokenizing the corpus during tokenizer usage.<n>We present a systematic empirical characterization of this phenomenon across commonly used tokenizers and introduce LiteToken, a simple method for removing residue tokens.<n>Experiments show that LiteToken reduces token fragmentation, reduces parameters, and improves robustness to noisy or misspelled inputs, while preserving overall performance.
arXiv Detail & Related papers (2026-02-04T16:19:05Z) - Accelerate Speculative Decoding with Sparse Computation in Verification [49.74839681322316]
Speculative decoding accelerates autoregressive language model inference by verifying multiple draft tokens in parallel.<n>Existing sparsification methods are designed primarily for standard token-by-token autoregressive decoding.<n>We propose a sparse verification framework that jointly sparsifies attention, FFN, and MoE components during the verification stage to reduce the dominant computation cost.
arXiv Detail & Related papers (2025-12-26T07:53:41Z) - Achieving Tokenizer Flexibility in Language Models through Heuristic Adaptation and Supertoken Learning [0.0]
Pretrained language models (LLMs) are often constrained by their fixed tokenization schemes.<n> Tokenadapt is a model-agnostic tokenizer transplantation method.<n>Our framework introduces two innovations: first, Tokenadapt, a model-agnostic tokenizer transplantation method, and second, novel pre-tokenization for multi-word Supertokens.
arXiv Detail & Related papers (2025-05-14T19:00:27Z) - HYPEROFA: Expanding LLM Vocabulary to New Languages via Hypernetwork-Based Embedding Initialization [50.27950279695363]
Many pre-trained language models (PLMs) exhibit suboptimal performance on mid- and low-resource languages.<n>A common strategy to address this is to introduce new tokens specific to the target languages, initialize their embeddings, and apply continual pre-training on target-language data.<n>We propose HYPEROFA, a hypernetwork-based approach for more adaptive token embedding.
arXiv Detail & Related papers (2025-04-21T19:40:32Z) - Fast Controlled Generation from Language Models with Adaptive Weighted Rejection Sampling [90.86991492288487]
evaluating constraint on every token can be prohibitively expensive.<n> LCD can distort the global distribution over strings, sampling tokens based only on local information.<n>We show that our approach is superior to state-of-the-art baselines.
arXiv Detail & Related papers (2025-04-07T18:30:18Z) - Boundless Byte Pair Encoding: Breaking the Pre-tokenization Barrier [4.300681074103876]
Pre-tokenization causes the distribution of tokens in a corpus to skew towards common, full-length words.<n>We propose BoundlessB, a modified BPE algorithm that relaxes the pretoken boundary constraint.<n>Our approach selectively merges two complete pretokens into a larger unit we term a superword.
arXiv Detail & Related papers (2025-03-31T19:36:29Z) - AdaptiVocab: Enhancing LLM Efficiency in Focused Domains through Lightweight Vocabulary Adaptation [21.71269358186483]
AdaptiVocab is an end-to-end approach for vocabulary adaptation.<n>It modifies the vocabulary by replacing tokens with domain-specific n-gram-based tokens.<n>Our results show that AdaptiVocab reduces token usage by over 25% without compromising performance.
arXiv Detail & Related papers (2025-03-25T14:18:21Z) - SEP: Self-Enhanced Prompt Tuning for Visual-Language Model [93.94454894142413]
We introduce a novel approach named Self-Enhanced Prompt Tuning (SEP)
SEP explicitly incorporates discriminative prior knowledge to enhance both textual-level and visual-level embeddings.
Comprehensive evaluations across various benchmarks and tasks confirm SEP's efficacy in prompt tuning.
arXiv Detail & Related papers (2024-05-24T13:35:56Z) - Improving Tokenisation by Alternative Treatment of Spaces [7.596737214110957]
We experiment with an alternative tokenisation approach where spaces are always treated as individual tokens.
We find that our modified algorithms lead to improved performance on downstream NLP tasks.
arXiv Detail & Related papers (2022-04-08T13:22:30Z) - Domain Adversarial Fine-Tuning as an Effective Regularizer [80.14528207465412]
In Natural Language Processing (NLP), pretrained language models (LMs) that are transferred to downstream tasks have been recently shown to achieve state-of-the-art results.
Standard fine-tuning can degrade the general-domain representations captured during pretraining.
We introduce a new regularization technique, AFTER; domain Adversarial Fine-Tuning as an Effective Regularizer.
arXiv Detail & Related papers (2020-09-28T14:35:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.