Related papers: Repurposing Language Models into Embedding Models: Finding the Compute-Optimal Recipe

Repurposing Language Models into Embedding Models: Finding the Compute-Optimal Recipe

URL: http://arxiv.org/abs/2406.04165v2
Date: Thu, 21 Nov 2024 09:30:51 GMT
Title: Repurposing Language Models into Embedding Models: Finding the Compute-Optimal Recipe
Authors: Alicja Ziarko, Albert Q. Jiang, Bartosz Piotrowski, Wenda Li, Mateja Jamnik, Piotr Miłoś,
Abstract summary: In this paper, we study how to contrastively train text embedding models in a compute-optimal fashion. Our innovation is an algorithm that produces optimal configurations of model sizes, data quantities, and fine-tuning methods for text-embedding models at different computational budget levels.
Score: 10.34105218186634
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Text embeddings are essential for many tasks, such as document retrieval, clustering, and semantic similarity assessment. In this paper, we study how to contrastively train text embedding models in a compute-optimal fashion, given a suite of pre-trained decoder-only language models. Our innovation is an algorithm that produces optimal configurations of model sizes, data quantities, and fine-tuning methods for text-embedding models at different computational budget levels. The resulting recipe, which we obtain through extensive experiments, can be used by practitioners to make informed design choices for their embedding models. Specifically, our findings suggest that full fine-tuning and low-rank adaptation fine-tuning produce optimal models at lower and higher computational budgets respectively.

Related papers

Empirical Comparison of Encoder-Based Language Models and Feature-Based Supervised Machine Learning Approaches to Automated Scoring of Long Essays [8.899249868081956]
Long context may impose challenges for encoder-only language models in text processing.<n>This study trained several commonly used encoder-based language models for automated scoring of long essays.
arXiv Detail & Related papers (2026-01-06T02:17:45Z)
Generalized Top-k Mallows Model for Ranked Choices [7.389630498367403]
We present a novel sampling scheme tailored to top-k Mallows models and an efficient algorithm for computing choice probabilities.<n>We also present an active learning algorithm for estimating the model parameters from observed choice data.<n>These contributions provide new tools for analysis and prediction in critical decision-making scenarios.
arXiv Detail & Related papers (2025-10-24T21:49:21Z)
Variational Prefix Tuning for Diverse and Accurate Code Summarization Using Pre-trained Language Models [3.06414751922655]
Variational Prefix Tuning (VPT) is a novel approach that enhances pre-trained models' ability to generate diverse yet accurate sets of summaries.<n>Our method integrates a Conditional Variational Autoencoder (CVAE) framework as a modular component into pre-trained models.
arXiv Detail & Related papers (2025-05-14T01:46:56Z)
Not-Just-Scaling Laws: Towards a Better Understanding of the Downstream Impact of Language Model Design Decisions [65.89403417819764]
We quantify the impact of design choices on language model capabilities. By incorporating features besides model size and number of training tokens, we can achieve a relative 3-28% increase in ability to predict downstream performance.
arXiv Detail & Related papers (2025-03-05T19:46:04Z)
A Post-Training Enhanced Optimization Approach for Small Language Models [0.0]
This paper proposes a continuous post-training alignment data construction method for small language models. The core of this method is based on the data guidance of large models, optimizing the diversity and accuracy of alignment data.
arXiv Detail & Related papers (2024-11-05T09:32:26Z)
Ensembling Finetuned Language Models for Text Classification [55.15643209328513]
Finetuning is a common practice across different communities to adapt pretrained models to particular tasks. ensembles of neural networks are typically used to boost performance and provide reliable uncertainty estimates. We present a metadataset with predictions from five large finetuned models on six datasets and report results of different ensembling strategies.
arXiv Detail & Related papers (2024-10-25T09:15:54Z)
An Online Learning Approach to Prompt-based Selection of Generative Models [23.91197677628145]
An online identification of the best generation model for various input prompts can reduce the costs associated with querying sub-optimal models. We propose an online learning framework to predict the best data generation model for a given input prompt. Our experiments on real and simulated text-to-image and image-to-text generative models show RFF-UCB performs successfully in identifying the best generation model.
arXiv Detail & Related papers (2024-10-17T07:33:35Z)
Revisiting SMoE Language Models by Evaluating Inefficiencies with Task Specific Expert Pruning [78.72226641279863]
Sparse Mixture of Expert (SMoE) models have emerged as a scalable alternative to dense models in language modeling. Our research explores task-specific model pruning to inform decisions about designing SMoE architectures. We introduce an adaptive task-aware pruning technique UNCURL to reduce the number of experts per MoE layer in an offline manner post-training.
arXiv Detail & Related papers (2024-09-02T22:35:03Z)
Navigating Scaling Laws: Compute Optimality in Adaptive Model Training [39.96209967632896]
In recent years, the state-of-the-art in deep learning has been dominated by very large models that have been pre-trained on vast amounts of data. We extend the concept of optimality by allowing for an adaptive' model, i.e. a model that can change its shape during training.
arXiv Detail & Related papers (2023-11-06T16:20:28Z)
Mixture-of-Linguistic-Experts Adapters for Improving and Interpreting Pre-trained Language Models [22.977852629450346]
We propose a method that combines two popular research areas by injecting linguistic structures into pre-trained language models. In our approach, parallel adapter modules encoding different linguistic structures are combined using a novel Mixture-of-Linguistic-Experts architecture. Our experiment results show that our approach can outperform state-of-the-art PEFT methods with a comparable number of parameters.
arXiv Detail & Related papers (2023-10-24T23:29:06Z)
The Languini Kitchen: Enabling Language Modelling Research at Different Scales of Compute [66.84421705029624]
We introduce an experimental protocol that enables model comparisons based on equivalent compute, measured in accelerator hours. We pre-process an existing large, diverse, and high-quality dataset of books that surpasses existing academic benchmarks in quality, diversity, and document length. This work also provides two baseline models: a feed-forward model derived from the GPT-2 architecture and a recurrent model in the form of a novel LSTM with ten-fold throughput.
arXiv Detail & Related papers (2023-09-20T10:31:17Z)
Evaluating Representations with Readout Model Switching [19.907607374144167]
In this paper, we propose to use the Minimum Description Length (MDL) principle to devise an evaluation metric. We design a hybrid discrete and continuous-valued model space for the readout models and employ a switching strategy to combine their predictions. The proposed metric can be efficiently computed with an online method and we present results for pre-trained vision encoders of various architectures.
arXiv Detail & Related papers (2023-02-19T14:08:01Z)
Dataless Knowledge Fusion by Merging Weights of Language Models [51.8162883997512]
Fine-tuning pre-trained language models has become the prevalent paradigm for building downstream NLP models. This creates a barrier to fusing knowledge across individual models to yield a better single model. We propose a dataless knowledge fusion method that merges models in their parameter space.
arXiv Detail & Related papers (2022-12-19T20:46:43Z)
Prototypical Fine-tuning: Towards Robust Performance Under Varying Data Sizes [47.880781811936345]
We propose a novel framework for fine-tuning pretrained language models (LM) Our prototypical fine-tuning approach can automatically adjust the model capacity according to the number of data points and the model's inherent attributes.
arXiv Detail & Related papers (2022-11-24T14:38:08Z)
A Multi-dimensional Evaluation of Tokenizer-free Multilingual Pretrained Models [87.7086269902562]
We show that subword-based models might still be the most practical choice in many settings. We encourage future work in tokenizer-free methods to consider these factors when designing and evaluating new models.
arXiv Detail & Related papers (2022-10-13T15:47:09Z)
Distilling Interpretable Models into Human-Readable Code [71.11328360614479]
Human-readability is an important and desirable standard for machine-learned model interpretability. We propose to train interpretable models using conventional methods, and then distill them into concise, human-readable code. We describe a piecewise-linear curve-fitting algorithm that produces high-quality results efficiently and reliably across a broad range of use cases.
arXiv Detail & Related papers (2021-01-21T01:46:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.