ThetaEvolve: Test-time Learning on Open Problems
- URL: http://arxiv.org/abs/2511.23473v1
- Date: Fri, 28 Nov 2025 18:58:14 GMT
- Title: ThetaEvolve: Test-time Learning on Open Problems
- Authors: Yiping Wang, Shao-Rong Su, Zhiyuan Zeng, Eva Xu, Liliang Ren, Xinyu Yang, Zeyi Huang, Xuehai He, Luyao Ma, Baolin Peng, Hao Cheng, Pengcheng He, Weizhu Chen, Shuohang Wang, Simon Shaolei Du, Yelong Shen,
- Abstract summary: We introduce ThetaEvolve, an open-source framework that simplifies and extends AlphaEvolve to efficiently scale both in-context learning and Reinforcement Learning (RL) at test time.<n>We find that ThetaEvolve with RL at test-time consistently outperforms inference-only baselines.
- Score: 110.5756538358217
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent advances in large language models (LLMs) have enabled breakthroughs in mathematical discovery, exemplified by AlphaEvolve, a closed-source system that evolves programs to improve bounds on open problems. However, it relies on ensembles of frontier LLMs to achieve new bounds and is a pure inference system that models cannot internalize the evolving strategies. We introduce ThetaEvolve, an open-source framework that simplifies and extends AlphaEvolve to efficiently scale both in-context learning and Reinforcement Learning (RL) at test time, allowing models to continually learn from their experiences in improving open optimization problems. ThetaEvolve features a single LLM, a large program database for enhanced exploration, batch sampling for higher throughput, lazy penalties to discourage stagnant outputs, and optional reward shaping for stable training signals, etc. ThetaEvolve is the first evolving framework that enable a small open-source model, like DeepSeek-R1-0528-Qwen3-8B, to achieve new best-known bounds on open problems (circle packing and first auto-correlation inequality) mentioned in AlphaEvolve. Besides, across two models and four open tasks, we find that ThetaEvolve with RL at test-time consistently outperforms inference-only baselines, and the model indeed learns evolving capabilities, as the RL-trained checkpoints demonstrate faster progress and better final performance on both trained target task and other unseen tasks. We release our code publicly: https://github.com/ypwang61/ThetaEvolve
Related papers
- SkillRL: Evolving Agents via Recursive Skill-Augmented Reinforcement Learning [83.98129545309277]
We propose SkillRL, a framework that bridges the gap between raw experience and policy improvement.<n>Our approach introduces an experience-based distillation mechanism to build a hierarchical skill library SkillBank.<n> Experimental results on ALF, WebShop and seven search-augmented tasks demonstrate that SkillRL achieves state-of-the-art performance.
arXiv Detail & Related papers (2026-02-09T03:17:17Z) - EvoLMM: Self-Evolving Large Multimodal Models with Continuous Rewards [52.42920996842378]
We propose a self-evolving framework, named EvoLMM, that instantiates two cooperative agents from a single backbone model.<n>This dynamic feedback encourages both the generation of informative queries and the refinement of structured reasoning.<n>Our code and models are available at https://github.com/mbzuai-oryx/EvoLMM.
arXiv Detail & Related papers (2025-11-20T18:59:54Z) - GigaEvo: An Open Source Optimization Framework Powered By LLMs And Evolution Algorithms [7.228213026504935]
GigaEvo is an open-source framework that enables researchers to study and experiment with hybrid LLM-evolution approaches.<n>We provide detailed descriptions of system architecture, implementation decisions, and experimental methodology to support further research.
arXiv Detail & Related papers (2025-11-17T14:44:47Z) - Scientific Algorithm Discovery by Augmenting AlphaEvolve with Deep Research [23.532524147608253]
We present DeepEvolve, an agent that integrates deep research with algorithm evolution.<n>Each not only proposes new hypotheses but also refines, implements, and tests them, avoiding both shallow improvements and unproductive over-refinements.<n>Across nine benchmarks in chemistry, mathematics, biology, materials, and patents, DeepEvolve consistently improves the initial algorithm.
arXiv Detail & Related papers (2025-10-07T15:49:51Z) - MCCE: A Framework for Multi-LLM Collaborative Co-Evolution [17.41200156551317]
Multi-objective discrete optimization problems pose significant challenges due to their vast and unstructured spaces.<n>Large language models (LLMs) offer powerful priors and reasoning ability, making them naturals when expert knowledge matters.<n>We introduce Multi-LLM Collaborative Co-evolution, a hybrid framework that unites a frozen closed-source LLM with a lightweight trainable model.
arXiv Detail & Related papers (2025-10-06T10:03:28Z) - Evolving Language Models without Labels: Majority Drives Selection, Novelty Promotes Variation [74.75716642635484]
Large language models (LLMs) are increasingly trained with reinforcement learning from verifiable rewards (RLVR)<n>We propose EVOL-RL, a label-free framework that mirrors the evolutionary principle of balancing selection with variation.<n>EVOL-RL consistently outperforms the majority-only baseline.
arXiv Detail & Related papers (2025-09-18T17:50:04Z) - AgentGym-RL: Training LLM Agents for Long-Horizon Decision Making through Multi-Turn Reinforcement Learning [129.44038804430542]
We introduce AgentGym-RL, a new framework to train LLM agents for multi-turn interactive decision-making through RL.<n>We propose ScalingInter-RL, a training approach designed for exploration-exploitation balance and stable RL optimization.<n>Our agents match or surpass commercial models on 27 tasks across diverse environments.
arXiv Detail & Related papers (2025-09-10T16:46:11Z) - EvoCoT: Overcoming the Exploration Bottleneck in Reinforcement Learning [25.518032764227442]
Reinforcement learning with verifiable reward (RLVR) has become a promising paradigm for post-training large language models (LLMs) to improve their reasoning capability.<n>We propose EvoCoT, a self-evolving curriculum learning framework based on two-stage chain-of-thought (CoT) reasoning optimization.<n>EvoCoT constrains the exploration space by self-generating and verifying CoT trajectories, then gradually shortens CoT steps to expand the space in a controlled way.
arXiv Detail & Related papers (2025-08-11T09:49:01Z) - MM-Eureka: Exploring the Frontiers of Multimodal Reasoning with Rule-based Reinforcement Learning [55.82649731348012]
We introduce the MMK12 dataset and MM-EUREKA with 7B and 32B parameters.<n>The former is a high-quality multimodal mathematics reasoning dataset featuring diverse knowledge domains with human-verified answers and solution processes.<n>The latter is a multimodal model employing rule-based reinforcement learning utilizing online filtering and two-stage training strategy to enhance training stability.
arXiv Detail & Related papers (2025-03-10T14:23:12Z) - A Survey on Self-Evolution of Large Language Models [116.54238664264928]
Large language models (LLMs) have significantly advanced in various fields and intelligent agent applications.
To address this issue, self-evolution approaches that enable LLMs to autonomously acquire, refine, and learn from experiences generated by the model itself are rapidly growing.
arXiv Detail & Related papers (2024-04-22T17:43:23Z) - DARLEI: Deep Accelerated Reinforcement Learning with Evolutionary
Intelligence [77.78795329701367]
We present DARLEI, a framework that combines evolutionary algorithms with parallelized reinforcement learning.
We characterize DARLEI's performance under various conditions, revealing factors impacting diversity of evolved morphologies.
We hope to extend DARLEI in future work to include interactions between diverse morphologies in richer environments.
arXiv Detail & Related papers (2023-12-08T16:51:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.