Related papers: jina-embeddings-v5-text: Task-Targeted Embedding Distillation

jina-embeddings-v5-text: Task-Targeted Embedding Distillation

URL: http://arxiv.org/abs/2602.15547v1
Date: Tue, 17 Feb 2026 12:50:50 GMT
Title: jina-embeddings-v5-text: Task-Targeted Embedding Distillation
Authors: Mohammad Kalim Akram, Saba Sturua, Nastia Havriushenko, Quentin Herreros, Michael Günther, Maximilian Werk, Han Xiao,
Abstract summary: General-purpose models are typically trained with single- or multi-stage processes using contrastive loss functions.<n>We introduce a novel training regimen that combines model distillation techniques with task-specific contrastive loss to produce compact embedding models.<n> Benchmark scores for the resulting models exceed or match the state-of-the-art for models of similar size.
Score: 4.215793601372204
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Text embedding models are widely used for semantic similarity tasks, including information retrieval, clustering, and classification. General-purpose models are typically trained with single- or multi-stage processes using contrastive loss functions. We introduce a novel training regimen that combines model distillation techniques with task-specific contrastive loss to produce compact, high-performance embedding models. Our findings suggest that this approach is more effective for training small models than purely contrastive or distillation-based training paradigms alone. Benchmark scores for the resulting models, jina-embeddings-v5-text-small and jina-embeddings-v5-text-nano, exceed or match the state-of-the-art for models of similar size. jina-embeddings-v5-text models additionally support long texts (up to 32k tokens) in many languages, and generate embeddings that remain robust under truncation and binary quantization. Model weights are publicly available, hopefully inspiring further advances in embedding model development.

Related papers

Bagging-Based Model Merging for Robust General Text Embeddings [73.51674133699196]
General-purpose text embedding models underpin a wide range of NLP and information retrieval applications.<n>We present a systematic study of multi-task training for text embeddings from two perspectives: data scheduling and model merging.<n>We propose Bagging-based rObust mOdel Merging (BOOM), which trains multiple embedding models on sampled subsets and merges them into a single model.
arXiv Detail & Related papers (2026-02-05T15:45:08Z)
Ensembling Finetuned Language Models for Text Classification [55.15643209328513]
Finetuning is a common practice across different communities to adapt pretrained models to particular tasks. ensembles of neural networks are typically used to boost performance and provide reliable uncertainty estimates. We present a metadataset with predictions from five large finetuned models on six datasets and report results of different ensembling strategies.
arXiv Detail & Related papers (2024-10-25T09:15:54Z)
VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks [60.5257456681402]
We study the potential for building universal embeddings capable of handling a wide range of downstream tasks.<n>We build a series of VLM2Vec models on SoTA VLMs like Phi-3.5-V, LLaVA-1.6 and evaluate them on MMEB's evaluation split.<n>Our results show that VLM2Vec achieves an absolute average improvement of 10% to 20% over existing multimodal embedding models.
arXiv Detail & Related papers (2024-10-07T16:14:05Z)
NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models [38.41524186248607]
We introduce NV-Embed, incorporating architectural designs, training procedures, and curated datasets.<n>For model architecture, we propose a latent attention layer to obtain pooled embeddings.<n>For training algorithm, we introduce a two-stage contrastive instruction-tuning method.
arXiv Detail & Related papers (2024-05-27T17:59:45Z)
Multilingual E5 Text Embeddings: A Technical Report [63.503320030117145]
Three embedding models of different sizes are provided, offering a balance between the inference efficiency and embedding quality. We introduce a new instruction-tuned embedding model, whose performance is on par with state-of-the-art, English-only models of similar sizes.
arXiv Detail & Related papers (2024-02-08T13:47:50Z)
Towards General Text Embeddings with Multi-stage Contrastive Learning [20.803769345818456]
GTE is a general-purpose text embedding model trained with multi-stage contrastive learning. We train a unified text embedding model by employing contrastive learning over a diverse mixture of datasets from multiple sources.
arXiv Detail & Related papers (2023-08-07T03:52:59Z)
Compressing Sentence Representation with maximum Coding Rate Reduction [0.0]
In most natural language inference problems, sentence representation is needed for semantic retrieval tasks. Due to space and time hardware limitations, there is a need to attain comparable results when using the smaller model. We demonstrate that the new language model with reduced complexity and sentence embedding size can achieve comparable results on semantic retrieval benchmarks.
arXiv Detail & Related papers (2023-04-25T09:23:43Z)
Training Trajectories of Language Models Across Scales [99.38721327771208]
Scaling up language models has led to unprecedented performance gains. How do language models of different sizes learn during pre-training? Why do larger language models demonstrate more desirable behaviors?
arXiv Detail & Related papers (2022-12-19T19:16:29Z)
On the Compositional Generalization Gap of In-Context Learning [73.09193595292233]
We look at the gap between the in-distribution (ID) and out-of-distribution (OOD) performance of such models in semantic parsing tasks with in-context learning. We evaluate four model families, OPT, BLOOM, CodeGen and Codex on three semantic parsing datasets.
arXiv Detail & Related papers (2022-11-15T19:56:37Z)
Part-Based Models Improve Adversarial Robustness [57.699029966800644]
We show that combining human prior knowledge with end-to-end learning can improve the robustness of deep neural networks. Our model combines a part segmentation model with a tiny classifier and is trained end-to-end to simultaneously segment objects into parts. Our experiments indicate that these models also reduce texture bias and yield better robustness against common corruptions and spurious correlations.
arXiv Detail & Related papers (2022-09-15T15:41:47Z)
Introducing various Semantic Models for Amharic: Experimentation and Evaluation with multiple Tasks and Datasets [19.855120632909124]
We introduce different semantic models for Amharic. Models are build using word2Vec embeddings, distributional thesaurus (DT), contextual embeddings, and DT embeddings. We find that newly trained models perform better than pre-trained multilingual models.
arXiv Detail & Related papers (2020-11-02T17:48:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.