Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models
- URL: http://arxiv.org/abs/2506.05176v3
- Date: Wed, 11 Jun 2025 02:54:49 GMT
- Title: Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models
- Authors: Yanzhao Zhang, Mingxin Li, Dingkun Long, Xin Zhang, Huan Lin, Baosong Yang, Pengjun Xie, An Yang, Dayiheng Liu, Junyang Lin, Fei Huang, Jingren Zhou,
- Abstract summary: We introduce the Qwen3 Embedding series, a significant advancement over its predecessor, the GTE-Qwen series.<n>The Qwen3 Embedding series offers a spectrum of model sizes for both embedding and reranking tasks.<n>The Qwen3 Embedding series achieves state-of-the-art results across diverse benchmarks.
- Score: 90.54780244175511
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work, we introduce the Qwen3 Embedding series, a significant advancement over its predecessor, the GTE-Qwen series, in text embedding and reranking capabilities, built upon the Qwen3 foundation models. Leveraging the Qwen3 LLMs' robust capabilities in multilingual text understanding and generation, our innovative multi-stage training pipeline combines large-scale unsupervised pre-training with supervised fine-tuning on high-quality datasets. Effective model merging strategies further ensure the robustness and adaptability of the Qwen3 Embedding series. During the training process, the Qwen3 LLMs serve not only as backbone models but also play a crucial role in synthesizing high-quality, rich, and diverse training data across multiple domains and languages, thus enhancing the training pipeline. The Qwen3 Embedding series offers a spectrum of model sizes (0.6B, 4B, 8B) for both embedding and reranking tasks, addressing diverse deployment scenarios where users can optimize for either efficiency or effectiveness. Empirical evaluations demonstrate that the Qwen3 Embedding series achieves state-of-the-art results across diverse benchmarks. Notably, it excels on the multilingual evaluation benchmark MTEB for text embedding, as well as in various retrieval tasks, including code retrieval, cross-lingual retrieval and multilingual retrieval. To facilitate reproducibility and promote community-driven research and development, the Qwen3 Embedding models are publicly available under the Apache 2.0 license.
Related papers
- Qwen3 Technical Report [137.96804244102205]
We present Qwen3, the latest version of the Qwen model family.<n>Qwen3 comprises a series of large language models (LLMs) designed to advance performance, efficiency, and multilingual capabilities.
arXiv Detail & Related papers (2025-05-14T13:41:34Z) - An Empirical Study of Qwen3 Quantization [30.214896404069677]
Low-bit quantization presents a promising solution, yet its impact on Qwen3's performance remains underexplored.<n>We rigorously assess 5 existing classic post-training quantization techniques applied to Qwen3, spanning bit-widths from 1 to 8 bits.<n>Our findings reveal that while Qwen3 maintains competitive performance at moderate bit-widths, it experiences notable degradation in linguistic tasks under ultra-low precision.
arXiv Detail & Related papers (2025-05-04T18:43:44Z) - InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models [139.19991097260115]
We introduce InternVL3, a significant advancement in the InternVL series featuring a native multimodal pre-training paradigm.<n>In particular, InternVL3-78B achieves a score of 72.2 on the MMMU benchmark, setting a new state-of-the-art among open-source MLLMs.<n>In pursuit of open-science principles, we will publicly release both the training data and model weights to foster further research and development in next-generation MLLMs.
arXiv Detail & Related papers (2025-04-14T17:59:25Z) - KaLM-Embedding: Superior Training Data Brings A Stronger Embedding Model [27.25688303240741]
KaLM-Embedding is a general multilingual embedding model that leverages a large quantity of cleaner, more diverse, and domain-specific training data.<n>Our model has been trained with key techniques proven to enhance performance.
arXiv Detail & Related papers (2025-01-02T03:17:51Z) - NVLM: Open Frontier-Class Multimodal LLMs [64.00053046838225]
We introduce NVLM 1.0, a family of frontier-class multimodal large language models (LLMs) that achieve state-of-the-art results on vision-language tasks.
We propose a novel architecture that enhances both training efficiency and multimodal reasoning capabilities.
We develop production-grade multimodality for the NVLM-1.0 models, enabling them to excel in vision-language tasks.
arXiv Detail & Related papers (2024-09-17T17:59:06Z) - BGE M3-Embedding: Multi-Lingual, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation [26.65107475147534]
We present a new embedding model, called M3-Embedding, which is distinguished for its versatility in Multi-Linguality, Multi-Functionality, and Multi-Granularity.
It can support more than 100 working languages, leading to new state-of-the-art performances on multi-lingual and cross-lingual retrieval tasks.
M3-Embedding is able to process inputs of different granularities, spanning from short sentences to long documents of up to 8192 tokens.
arXiv Detail & Related papers (2024-02-05T17:26:49Z) - Towards General Text Embeddings with Multi-stage Contrastive Learning [20.803769345818456]
GTE is a general-purpose text embedding model trained with multi-stage contrastive learning.
We train a unified text embedding model by employing contrastive learning over a diverse mixture of datasets from multiple sources.
arXiv Detail & Related papers (2023-08-07T03:52:59Z) - Recitation-Augmented Language Models [85.30591349383849]
We show that RECITE is a powerful paradigm for knowledge-intensive NLP tasks.
Specifically, we show that by utilizing recitation as the intermediate step, a recite-and-answer scheme can achieve new state-of-the-art performance.
arXiv Detail & Related papers (2022-10-04T00:49:20Z) - Enabling Multimodal Generation on CLIP via Vision-Language Knowledge
Distillation [79.72299298976525]
We propose to augment a vision-language pre-training model with a textual pre-trained language model (PLM) via vision-language knowledge distillation (VLKD)
Experiments show that the resulting model has strong zero-shot performance on multimodal generation tasks, such as open-ended visual question answering and image captioning.
The original textual language understanding and generation ability of the PLM is maintained after VLKD, which makes our model versatile for both multimodal and unimodal tasks.
arXiv Detail & Related papers (2022-03-12T09:33:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.