LatentEvolve: Self-Evolving Test-Time Scaling in Latent Space
- URL: http://arxiv.org/abs/2509.24771v1
- Date: Mon, 29 Sep 2025 13:37:39 GMT
- Title: LatentEvolve: Self-Evolving Test-Time Scaling in Latent Space
- Authors: Guibin Zhang, Fanci Meng, Guancheng Wan, Zherui Li, Kun Wang, Zhenfei Yin, Lei Bai, Shuicheng Yan,
- Abstract summary: Test-timeScaling (TTS) has been demonstrated to significantly enhance the reasoning capabilities of Large Language Models (LLMs) during the inference phase without altering model parameters.<n>We propose LatentEvolve, a self-evolving latent TTS framework inspired by the complementary learning system theory.
- Score: 66.71318175695988
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Test-time Scaling (TTS) has been demonstrated to significantly enhance the reasoning capabilities of Large Language Models (LLMs) during the inference phase without altering model parameters. However, existing TTS methods are largely independent, implying that LLMs have not yet evolved to progressively learn how to scale more effectively. With the objective of evolving LLMs to learn ``how to scale test-time computation,'' we propose LatentEvolve, a self-evolving latent TTS framework inspired by the complementary learning system (CLS) theory. Analogous to the human brain's dual system of a fast-recall hippocampus and a slow-consolidating neocortex, LatentEvolve comprises two evolutionary components: \textit{daytime scaling}, which rapidly retrieves historical latent representations to better guide current LLM reasoning; and \textit{nighttime scaling}, which integrates past latent optimizations in a manner akin to the human brain's consolidation of experiences during sleep. The alternation of daytime and nighttime processes facilitates a fast and slow evolution of LLM TTS, mirroring human cognitive dynamics in a fully unsupervised manner. Extensive experiments across eight benchmarks and five model backbones demonstrate that our LatentEvolve surpasses state-of-the-art TTS methods such as LatentSeek and TTRL by up to $13.33\%$ and exhibits exceptional cross-domain and cross-backbone generalization.
Related papers
- dMLLM-TTS: Self-Verified and Efficient Test-Time Scaling for Diffusion Multi-Modal Large Language Models [40.03969764207708]
Diffusion Multi-modal Large Language Models (dMLLMs) have recently emerged as a novel architecture unifying image generation and understanding.<n>We propose dMLLM-TTS, a novel framework operating on two complementary scaling axes to unlock their full generative potential.<n>Our framework substantially improves generation quality while achieving up to 6x greater efficiency than linear search.
arXiv Detail & Related papers (2025-12-22T14:31:58Z) - Step-level Verifier-guided Hybrid Test-Time Scaling for Large Language Models [58.36334504216682]
Test-Time Scaling is a promising approach to progressively elicit the model's intelligence during inference.<n>In this paper, we focus on training-free TTS methods for reasoning.<n>We introduce a novel inference paradigm called Hybrid Test-Time Scaling.
arXiv Detail & Related papers (2025-07-21T11:28:09Z) - TimeHC-RL: Temporal-aware Hierarchical Cognitive Reinforcement Learning for Enhancing LLMs' Social Intelligence [62.21106561772784]
We introduce Temporal-aware Hierarchical Cognitive Reinforcement Learning (TimeHC-RL) for enhancing Large Language Models' social intelligence.<n> Experimental results reveal the superiority of our proposed TimeHC-RL method compared to the widely adopted System 2 RL method.<n>It gives the 7B backbone model wings, enabling it to rival the performance of advanced models like DeepSeek-R1 and OpenAI-O3.
arXiv Detail & Related papers (2025-05-30T12:01:06Z) - Scaling Image and Video Generation via Test-Time Evolutionary Search [41.715197824076746]
Test-time scaling (TTS) has emerged as a promising direction for improving generative model performance by allocating additional computation at inference time.<n>EvoSearch is a novel, generalist, and efficient TTS method that effectively enhances the scalability of both image and video generation across diffusion and flow models.
arXiv Detail & Related papers (2025-05-23T08:25:46Z) - Seek in the Dark: Reasoning via Test-Time Instance-Level Policy Gradient in Latent Space [92.6187727249868]
We introduce LatentSeek, a framework that enhances reasoning through Test-Time Instance-level Adaptation (TTIA) within the model's latent space.<n>LatentSeek is evaluated on a range of reasoning benchmarks, including GSM8K, MATH-500, and AIME2024.<n>Results show that LatentSeek consistently outperforms strong baselines.
arXiv Detail & Related papers (2025-05-19T16:26:02Z) - T1: Advancing Language Model Reasoning through Reinforcement Learning and Inference Scaling [52.34735382627312]
Large language models (LLMs) have demonstrated remarkable capabilities in complex reasoning tasks.<n>Existing approaches mainly rely on imitation learning and struggle to achieve effective test-time scaling.<n>We present T1 to scale reinforcement learning by encouraging exploration and understand inference scaling.
arXiv Detail & Related papers (2025-01-20T18:33:33Z) - A Survey on Self-Evolution of Large Language Models [116.54238664264928]
Large language models (LLMs) have significantly advanced in various fields and intelligent agent applications.
To address this issue, self-evolution approaches that enable LLMs to autonomously acquire, refine, and learn from experiences generated by the model itself are rapidly growing.
arXiv Detail & Related papers (2024-04-22T17:43:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.