EvoFSM: Controllable Self-Evolution for Deep Research with Finite State Machines
- URL: http://arxiv.org/abs/2601.09465v1
- Date: Wed, 14 Jan 2026 13:19:13 GMT
- Title: EvoFSM: Controllable Self-Evolution for Deep Research with Finite State Machines
- Authors: Shuo Zhang, Chaofa Yuan, Ryan Guo, Xiaomin Yu, Rui Xu, Zhangquan Chen, Zinuo Li, Zhi Yang, Shuhao Guan, Zhenheng Tang, Sen Hu, Liwen Zhang, Ronghao Chen, Huacan Wang,
- Abstract summary: EvoFSM is a structured self-evolving framework that achieves both adaptability and control by evolving an explicit Finite State Machine.<n>EvoFSM refines the FSM through a small set of constrained operations, and further incorporates a self-evolving memory that distills successful trajectories as reusable priors and failure patterns.<n>In particular, EvoFSM reaches 58.0% accuracy on the DeepSearch benchmark.
- Score: 23.086761228480682
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While LLM-based agents have shown promise for deep research, most existing approaches rely on fixed workflows that struggle to adapt to real-world, open-ended queries. Recent work therefore explores self-evolution by allowing agents to rewrite their own code or prompts to improve problem-solving ability, but unconstrained optimization often triggers instability, hallucinations, and instruction drift. We propose EvoFSM, a structured self-evolving framework that achieves both adaptability and control by evolving an explicit Finite State Machine (FSM) instead of relying on free-form rewriting. EvoFSM decouples the optimization space into macroscopic Flow (state-transition logic) and microscopic Skill (state-specific behaviors), enabling targeted improvements under clear behavioral boundaries. Guided by a critic mechanism, EvoFSM refines the FSM through a small set of constrained operations, and further incorporates a self-evolving memory that distills successful trajectories as reusable priors and failure patterns as constraints for future queries. Extensive evaluations on five multi-hop QA benchmarks demonstrate the effectiveness of EvoFSM. In particular, EvoFSM reaches 58.0% accuracy on the DeepSearch benchmark. Additional results on interactive decision-making tasks further validate its generalization.
Related papers
- EvoX: Meta-Evolution for Automated Discovery [115.89434419482797]
EvoX is an adaptive evolution method that optimize its own evolution process.<n>It continuously updates how prior solutions are selected and varied based on progress.<n>It outperforms existing AI-driven evolutionary methods including AlphaEvolve, OpenEvolve, GEPA, and ShinkaEvolve on the majority of tasks.
arXiv Detail & Related papers (2026-02-26T18:54:41Z) - DeltaEvolve: Accelerating Scientific Discovery through Momentum-Driven Evolution [28.737322041874293]
LLM-driven evolutionary systems have shown promise for automated science discovery.<n>Existing approaches such as AlphaEvolve rely on full-code histories that are context-inefficient.<n>We propose DeltaEvolve, a momentum-driven evolutionary framework that replaces full-code history with structured semantic delta.
arXiv Detail & Related papers (2026-02-02T23:47:54Z) - LoongFlow: Directed Evolutionary Search via a Cognitive Plan-Execute-Summarize Paradigm [8.050281821865978]
LoongFlow is a self-evolving agent framework that achieves state-of-the-art solution quality with significantly reduced computational costs.<n>Unlike "blind" mutation operators, LoongFlow integrates Large Language Models into a cognitive "Plan-Execute-Summarize" (PES) paradigm.<n>To sustain long-term architectural coherence, we incorporate a hybrid evolutionary memory system.
arXiv Detail & Related papers (2025-12-30T08:39:28Z) - EvoIR: Towards All-in-One Image Restoration via Evolutionary Frequency Modulation [54.37259500020744]
EvoIR is an AiOIR-specific framework that introduces evolutionary frequency modulation for dynamic and adaptive image restoration.<n>Specifically, EvoIR employs the Frequency-Modulated Module (FMM) that decomposes features into high- and low-frequency branches in an explicit manner.<n>Central to EvoIR, an Evolutionary Optimization Strategy (EOS) iteratively adjusts frequency-aware objectives through a population-based evolutionary process.
arXiv Detail & Related papers (2025-12-04T18:59:10Z) - LLM4EO: Large Language Model for Evolutionary Optimization in Flexible Job Shop Scheduling [4.782301990330074]
This work leverages Large Language Models (LLMs) to perceive evolutionary dynamics and enable operator-level meta-evolution.<n>The proposed framework, LLM4EO, comprises three components: knowledge-transfer-based operator design, evolution perception and analysis, and adaptive operator evolution.
arXiv Detail & Related papers (2025-11-20T15:56:09Z) - MM-HELIX: Boosting Multimodal Long-Chain Reflective Reasoning with Holistic Platform and Adaptive Hybrid Policy Optimization [103.74675519953898]
Long-chain reflective reasoning is a prerequisite for solving complex real-world problems.<n>We build a benchmark consisting 1,260 samples of 42 challenging synthetic tasks.<n>We generate post-training data and explore learning paradigms for exploiting such data.
arXiv Detail & Related papers (2025-10-09T17:53:58Z) - AutoEvoEval: An Automated Framework for Evolving Close-Ended LLM Evaluation Data [0.6278186810520364]
Large language models (LLMs) have shown remarkable performance on various tasks.<n>Existing evaluation benchmarks are often static and insufficient to fully assess their robustness and generalization.<n>We propose AutoEvoEval, an evolution-based evaluation framework for close-ended tasks such as question answering.
arXiv Detail & Related papers (2025-06-30T11:18:56Z) - Tournament of Prompts: Evolving LLM Instructions Through Structured Debates and Elo Ratings [0.9437165725355702]
We introduce DEEVO, a novel framework that guides prompt evolution through a debate-driven evaluation with an Elo-based selection.<n>Using Elo ratings as a fitness proxy, DEEVO simultaneously drives improvement and preserves valuable diversity in the prompt population.
arXiv Detail & Related papers (2025-05-30T19:33:41Z) - A Survey on Post-training of Large Language Models [185.51013463503946]
Large Language Models (LLMs) have fundamentally transformed natural language processing, making them indispensable across domains ranging from conversational systems to scientific exploration.<n>These challenges necessitate advanced post-training language models (PoLMs) to address shortcomings, such as restricted reasoning capacities, ethical uncertainties, and suboptimal domain-specific performance.<n>This paper presents the first comprehensive survey of PoLMs, systematically tracing their evolution across five core paradigms: Fine-tuning, which enhances task-specific accuracy; Alignment, which ensures ethical coherence and alignment with human preferences; Reasoning, which advances multi-step inference despite challenges in reward design; Integration and Adaptation, which
arXiv Detail & Related papers (2025-03-08T05:41:42Z) - Generate, Discriminate, Evolve: Enhancing Context Faithfulness via Fine-Grained Sentence-Level Self-Evolution [61.80716438091887]
GenDiE (Generate, Discriminate, Evolve) is a novel self-evolving framework that enhances context faithfulness through fine-grained sentence-level optimization.<n>By treating each sentence in a response as an independent optimization unit, GenDiE effectively addresses the limitations of previous approaches.<n>Experiments on ASQA (in-domain LFQA) and ConFiQA datasets demonstrate that GenDiE surpasses various baselines in both faithfulness and correctness.
arXiv Detail & Related papers (2025-03-03T16:08:33Z) - A Survey on Self-Evolution of Large Language Models [116.54238664264928]
Large language models (LLMs) have significantly advanced in various fields and intelligent agent applications.
To address this issue, self-evolution approaches that enable LLMs to autonomously acquire, refine, and learn from experiences generated by the model itself are rapidly growing.
arXiv Detail & Related papers (2024-04-22T17:43:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.