Related papers: EvoFSM: Controllable Self-Evolution for Deep Research with Finite State Machines

EvoFSM: Controllable Self-Evolution for Deep Research with Finite State Machines

URL: http://arxiv.org/abs/2601.09465v1
Date: Wed, 14 Jan 2026 13:19:13 GMT
Title: EvoFSM: Controllable Self-Evolution for Deep Research with Finite State Machines
Authors: Shuo Zhang, Chaofa Yuan, Ryan Guo, Xiaomin Yu, Rui Xu, Zhangquan Chen, Zinuo Li, Zhi Yang, Shuhao Guan, Zhenheng Tang, Sen Hu, Liwen Zhang, Ronghao Chen, Huacan Wang,
Abstract summary: EvoFSM is a structured self-evolving framework that achieves both adaptability and control by evolving an explicit Finite State Machine.<n>EvoFSM refines the FSM through a small set of constrained operations, and further incorporates a self-evolving memory that distills successful trajectories as reusable priors and failure patterns.<n>In particular, EvoFSM reaches 58.0% accuracy on the DeepSearch benchmark.
Score: 23.086761228480682
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: While LLM-based agents have shown promise for deep research, most existing approaches rely on fixed workflows that struggle to adapt to real-world, open-ended queries. Recent work therefore explores self-evolution by allowing agents to rewrite their own code or prompts to improve problem-solving ability, but unconstrained optimization often triggers instability, hallucinations, and instruction drift. We propose EvoFSM, a structured self-evolving framework that achieves both adaptability and control by evolving an explicit Finite State Machine (FSM) instead of relying on free-form rewriting. EvoFSM decouples the optimization space into macroscopic Flow (state-transition logic) and microscopic Skill (state-specific behaviors), enabling targeted improvements under clear behavioral boundaries. Guided by a critic mechanism, EvoFSM refines the FSM through a small set of constrained operations, and further incorporates a self-evolving memory that distills successful trajectories as reusable priors and failure patterns as constraints for future queries. Extensive evaluations on five multi-hop QA benchmarks demonstrate the effectiveness of EvoFSM. In particular, EvoFSM reaches 58.0% accuracy on the DeepSearch benchmark. Additional results on interactive decision-making tasks further validate its generalization.

Related papers

EvoX: Meta-Evolution for Automated Discovery [115.89434419482797]
EvoX is an adaptive evolution method that optimize its own evolution process.<n>It continuously updates how prior solutions are selected and varied based on progress.<n>It outperforms existing AI-driven evolutionary methods including AlphaEvolve, OpenEvolve, GEPA, and ShinkaEvolve on the majority of tasks.
arXiv Detail & Related papers (2026-02-26T18:54:41Z)
DeltaEvolve: Accelerating Scientific Discovery through Momentum-Driven Evolution [28.737322041874293]
LLM-driven evolutionary systems have shown promise for automated science discovery.<n>Existing approaches such as AlphaEvolve rely on full-code histories that are context-inefficient.<n>We propose DeltaEvolve, a momentum-driven evolutionary framework that replaces full-code history with structured semantic delta.
arXiv Detail & Related papers (2026-02-02T23:47:54Z)
LoongFlow: Directed Evolutionary Search via a Cognitive Plan-Execute-Summarize Paradigm [8.050281821865978]
LoongFlow is a self-evolving agent framework that achieves state-of-the-art solution quality with significantly reduced computational costs.<n>Unlike "blind" mutation operators, LoongFlow integrates Large Language Models into a cognitive "Plan-Execute-Summarize" (PES) paradigm.<n>To sustain long-term architectural coherence, we incorporate a hybrid evolutionary memory system.
arXiv Detail & Related papers (2025-12-30T08:39:28Z)
EvoIR: Towards All-in-One Image Restoration via Evolutionary Frequency Modulation [54.37259500020744]
EvoIR is an AiOIR-specific framework that introduces evolutionary frequency modulation for dynamic and adaptive image restoration.<n>Specifically, EvoIR employs the Frequency-Modulated Module (FMM) that decomposes features into high- and low-frequency branches in an explicit manner.<n>Central to EvoIR, an Evolutionary Optimization Strategy (EOS) iteratively adjusts frequency-aware objectives through a population-based evolutionary process.
arXiv Detail & Related papers (2025-12-04T18:59:10Z)
LLM4EO: Large Language Model for Evolutionary Optimization in Flexible Job Shop Scheduling [4.782301990330074]
This work leverages Large Language Models (LLMs) to perceive evolutionary dynamics and enable operator-level meta-evolution.<n>The proposed framework, LLM4EO, comprises three components: knowledge-transfer-based operator design, evolution perception and analysis, and adaptive operator evolution.
arXiv Detail & Related papers (2025-11-20T15:56:09Z)
MM-HELIX: Boosting Multimodal Long-Chain Reflective Reasoning with Holistic Platform and Adaptive Hybrid Policy Optimization [103.74675519953898]
Long-chain reflective reasoning is a prerequisite for solving complex real-world problems.<n>We build a benchmark consisting 1,260 samples of 42 challenging synthetic tasks.<n>We generate post-training data and explore learning paradigms for exploiting such data.
arXiv Detail & Related papers (2025-10-09T17:53:58Z)
AutoEvoEval: An Automated Framework for Evolving Close-Ended LLM Evaluation Data [0.6278186810520364]
Large language models (LLMs) have shown remarkable performance on various tasks.<n>Existing evaluation benchmarks are often static and insufficient to fully assess their robustness and generalization.<n>We propose AutoEvoEval, an evolution-based evaluation framework for close-ended tasks such as question answering.
arXiv Detail & Related papers (2025-06-30T11:18:56Z)
Tournament of Prompts: Evolving LLM Instructions Through Structured Debates and Elo Ratings [0.9437165725355702]
We introduce DEEVO, a novel framework that guides prompt evolution through a debate-driven evaluation with an Elo-based selection.<n>Using Elo ratings as a fitness proxy, DEEVO simultaneously drives improvement and preserves valuable diversity in the prompt population.
arXiv Detail & Related papers (2025-05-30T19:33:41Z)
A Survey on Post-training of Large Language Models [185.51013463503946]
Large Language Models (LLMs) have fundamentally transformed natural language processing, making them indispensable across domains ranging from conversational systems to scientific exploration.<n>These challenges necessitate advanced post-training language models (PoLMs) to address shortcomings, such as restricted reasoning capacities, ethical uncertainties, and suboptimal domain-specific performance.<n>This paper presents the first comprehensive survey of PoLMs, systematically tracing their evolution across five core paradigms: Fine-tuning, which enhances task-specific accuracy; Alignment, which ensures ethical coherence and alignment with human preferences; Reasoning, which advances multi-step inference despite challenges in reward design; Integration and Adaptation, which
arXiv Detail & Related papers (2025-03-08T05:41:42Z)
Generate, Discriminate, Evolve: Enhancing Context Faithfulness via Fine-Grained Sentence-Level Self-Evolution [61.80716438091887]
GenDiE (Generate, Discriminate, Evolve) is a novel self-evolving framework that enhances context faithfulness through fine-grained sentence-level optimization.<n>By treating each sentence in a response as an independent optimization unit, GenDiE effectively addresses the limitations of previous approaches.<n>Experiments on ASQA (in-domain LFQA) and ConFiQA datasets demonstrate that GenDiE surpasses various baselines in both faithfulness and correctness.
arXiv Detail & Related papers (2025-03-03T16:08:33Z)
A Survey on Self-Evolution of Large Language Models [116.54238664264928]
Large language models (LLMs) have significantly advanced in various fields and intelligent agent applications. To address this issue, self-evolution approaches that enable LLMs to autonomously acquire, refine, and learn from experiences generated by the model itself are rapidly growing.
arXiv Detail & Related papers (2024-04-22T17:43:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.