Do Reasoning Models Enhance Embedding Models?
- URL: http://arxiv.org/abs/2601.21192v1
- Date: Thu, 29 Jan 2026 02:48:34 GMT
- Title: Do Reasoning Models Enhance Embedding Models?
- Authors: Wun Yu Chan, Shaojin Chen, Huihao Jing, Kwun Hang Lau, Elton Chun-Chai Li, Zihao Wang, Haoran Li, Yangqiu Song,
- Abstract summary: State-of-the-art embedding models are increasingly derived from decoder-only Large Language Model backbones adapted via contrastive learning.<n>We show that embedding models from RLVR-tuned backbones yield no consistent performance advantage over their base counterparts when subjected to identical training recipes.
- Score: 48.43242995118735
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: State-of-the-art embedding models are increasingly derived from decoder-only Large Language Model (LLM) backbones adapted via contrastive learning. Given the emergence of reasoning models trained via Reinforcement Learning with Verifiable Rewards (RLVR), a natural question arises: do enhanced reasoning translate to superior semantic representations when these models serve as embedding initializations? Contrary to expectation, our evaluation on MTEB and BRIGHT reveals a **null effect**: embedding models initialized from RLVR-tuned backbones yield no consistent performance advantage over their base counterparts when subjected to identical training recipes. To unpack this paradox, we introduce **H**ierarchical **R**epresentation **S**imilarity **A**nalysis (HRSA), a framework that decomposes similarity across representation, geometry, and function levels. HRSA reveals that while RLVR induces irreversible latent manifold's local geometry reorganization and reversible coordinate basis drift, it preserves the global manifold geometry and linear readout. Consequently, subsequent contrastive learning drives strong alignment between base- and reasoning-initialized models, a phenomenon we term **Manifold Realignment**. Empirically, our findings suggest that unlike Supervised Fine-Tuning (SFT), RLVR optimizes trajectories within an existing semantic landscape rather than fundamentally restructuring the landscape itself.
Related papers
- Can Recommender Systems Teach Themselves? A Recursive Self-Improving Framework with Fidelity Control [82.30868101940068]
We propose a paradigm in which a model bootstraps its own performance without reliance on external data or teacher models.<n>Our theoretical analysis shows that RSIR acts as a data-driven implicit regularizer, smoothing the optimization landscape.<n>We show that even smaller models benefit, and weak models can generate effective training curricula for stronger ones.
arXiv Detail & Related papers (2026-02-17T15:31:32Z) - Understanding Degradation with Vision Language Model [56.09241449206817]
Understanding visual degradations is a critical yet challenging problem in computer vision.<n>We introduce DU-VLM, a multimodal chain-of-thought model trained with supervised fine-tuning and reinforcement learning.<n>We also introduce textbfDU-110k, a large-scale dataset comprising 110,000 clean-degraded pairs with grounded physical annotations.
arXiv Detail & Related papers (2026-02-04T13:51:15Z) - LLMs as High-Dimensional Nonlinear Autoregressive Models with Attention: Training, Alignment and Inference [15.493230983626281]
Large language models (LLMs) based on transformer architectures are typically described through collections of architectural components and training procedures.<n>We formulate LLMs as high-dimensional nonlinear autoregressive models with attention-based dependencies.
arXiv Detail & Related papers (2026-01-31T00:37:53Z) - Why Self-Rewarding Works: Theoretical Guarantees for Iterative Alignment of Language Models [50.248686344277246]
Self-Rewarding Language Models (SRLMs) achieve notable success in iteratively improving alignment without external feedback.<n>This paper provides the first rigorous theoretical guarantees for SRLMs.
arXiv Detail & Related papers (2026-01-30T03:45:43Z) - Round-trip Reinforcement Learning: Self-Consistent Training for Better Chemical LLMs [51.29260537017623]
Large Language Models (LLMs) are emerging as versatile foundation models for computational chemistry.<n>These models often lack round-trip consistency.<n>We introduce Round-Trip Reinforcement Learning (RTRL), a novel framework that trains a model to improve its consistency.
arXiv Detail & Related papers (2025-10-01T23:58:58Z) - How LLMs Learn to Reason: A Complex Network Perspective [14.638878448692493]
Training large language models with Reinforcement Learning from Verifiable Rewards exhibits a set of puzzling behaviors.<n>We propose that these seemingly disparate phenomena can be explained using a single unifying theory.<n>Our work provides a new physical intuition for engineering the emergent reasoning capabilities of future AI systems.
arXiv Detail & Related papers (2025-09-28T04:10:37Z) - Recurrent Expansion: A Pathway Toward the Next Generation of Deep Learning [0.26107298043931204]
Recurrent Expansion (RE) is a new learning paradigm that advances beyond conventional Machine Learning (ML) and Deep Learning (DL)<n>RE emphasizes multiple mappings of data through identical deep architectures and analyzes their internal representations (i.e., feature maps) in conjunction with observed performance signals such as loss.<n>A scalable and adaptive variant, Sc-HMVRE, introduces selective mechanisms and scale diversity for real-world deployment.
arXiv Detail & Related papers (2025-07-04T19:26:48Z) - OpenVLThinker: Complex Vision-Language Reasoning via Iterative SFT-RL Cycles [91.88062410741833]
We introduce OpenVLThinker, one of the first open-source large vision-language models (LVLMs) to exhibit sophisticated chain-of-thought reasoning.<n>We show that OpenVLThinker-7B consistently advances performance across six benchmarks demanding mathematical and general reasoning.
arXiv Detail & Related papers (2025-03-21T17:52:43Z) - State-space models can learn in-context by gradient descent [1.3087858009942543]
We show that state-space models can perform gradient-based learning and use it for in-context learning in much the same way as transformers.<n>Specifically, we prove that a single structured state-space model layer, augmented with multiplicative input and output gating, can reproduce the outputs of an implicit linear model.<n>We also provide novel insights into the relationship between state-space models and linear self-attention, and their ability to learn in-context.
arXiv Detail & Related papers (2024-10-15T15:22:38Z) - Unbiased Learning of Deep Generative Models with Structured Discrete
Representations [7.9057320008285945]
We propose novel algorithms for learning structured variational autoencoders (SVAEs)
We are the first to demonstrate the SVAE's ability to handle multimodal uncertainty when data is missing by incorporating discrete latent variables.
Our memory-efficient implicit differentiation scheme makes the SVAE tractable to learn via gradient descent, while demonstrating robustness to incomplete optimization.
arXiv Detail & Related papers (2023-06-14T03:59:21Z) - Understanding Augmentation-based Self-Supervised Representation Learning
via RKHS Approximation and Regression [53.15502562048627]
Recent work has built the connection between self-supervised learning and the approximation of the top eigenspace of a graph Laplacian operator.
This work delves into a statistical analysis of augmentation-based pretraining.
arXiv Detail & Related papers (2023-06-01T15:18:55Z) - Re-parameterizing Your Optimizers rather than Architectures [119.08740698936633]
We propose a novel paradigm of incorporating model-specific prior knowledge into Structurals and using them to train generic (simple) models.
As an implementation, we propose a novel methodology to add prior knowledge by modifying the gradients according to a set of model-specific hyper- parameters.
For a simple model trained with a Repr, we focus on a VGG-style plain model and showcase that such a simple model trained with a Repr, which is referred to as Rep-VGG, performs on par with the recent well-designed models.
arXiv Detail & Related papers (2022-05-30T16:55:59Z) - Self-Reflective Variational Autoencoder [21.054722609128525]
Variational Autoencoder (VAE) is a powerful framework for learning latent variable generative models.
We introduce a solution, which we call self-reflective inference.
We empirically demonstrate the clear advantages of matching the variational posterior to the exact posterior.
arXiv Detail & Related papers (2020-07-10T05:05:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.