LOGIGEN: Logic-Driven Generation of Verifiable Agentic Tasks
- URL: http://arxiv.org/abs/2603.00540v1
- Date: Sat, 28 Feb 2026 08:35:30 GMT
- Title: LOGIGEN: Logic-Driven Generation of Verifiable Agentic Tasks
- Authors: Yucheng Zeng, Weipeng Lu, Linyun Liu, Shupeng Li, Zitian Qu, Chenghao Zhu, Shaofei Li, Zhengdong Tan, Mengyue Liu, Haotian Zhao, Zhe Zhou, Jianmin Wu,
- Abstract summary: We introduce textbfLOGIGEN, a logic-driven framework that synthesizes verifiable training data.<n>On $2$-Bench, LOGIGEN-32B(RL) achieves a textbf79.5% success rate, substantially outperforming the base model.
- Score: 4.6880826836662814
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The evolution of Large Language Models (LLMs) from static instruction-followers to autonomous agents necessitates operating within complex, stateful environments to achieve precise state-transition objectives. However, this paradigm is bottlenecked by data scarcity, as existing tool-centric reverse-synthesis pipelines fail to capture the rigorous logic of real-world applications. We introduce \textbf{LOGIGEN}, a logic-driven framework that synthesizes verifiable training data based on three core pillars: \textbf{Hard-Compiled Policy Grounding}, \textbf{Logic-Driven Forward Synthesis}, and \textbf{Deterministic State Verification}. Specifically, a Triple-Agent Orchestration is employed: the \textbf{Architect} compiles natural-language policy into database constraints to enforce hard rules; the \textbf{Set Designer} initializes boundary-adjacent states to trigger critical policy conflicts; and the \textbf{Explorer} searches this environment to discover causal solution paths. This framework yields a dataset of 20,000 complex tasks across 8 domains, where validity is strictly guaranteed by checking exact state equivalence. Furthermore, we propose a verification-based training protocol where Supervised Fine-Tuning (SFT) on verifiable trajectories establishes compliance with hard-compiled policy, while Reinforcement Learning (RL) guided by dense state-rewards refines long-horizon goal achievement. On $τ^2$-Bench, LOGIGEN-32B(RL) achieves a \textbf{79.5\% success rate}, substantially outperforming the base model (40.7\%). These results demonstrate that logic-driven synthesis combined with verification-based training effectively constructs the causally valid trajectories needed for next-generation agents.
Related papers
- Relatron: Automating Relational Machine Learning over Relational Databases [50.94254514286021]
We present a study that unifies RDL and DFS in a shared design space and conducts architecture-centric searches across diverse RDB tasks.<n>Our analysis yields three key findings: (1) RDL does not consistently outperform DFS, with performance being highly task-dependent; (2) no single architecture dominates across tasks, underscoring the need for task-aware model selection; and accuracy is an unreliable guide for choice architecture.
arXiv Detail & Related papers (2026-02-26T02:45:22Z) - NGDB-Zoo: Towards Efficient and Scalable Neural Graph Databases Training [55.35217340229661]
We present NGDB-Zoo, a unified framework that resolves bottlenecks by synergizing operator-level training with semantic augmentation.<n>We demonstrate that NGDB-Zoo maintains high GPU utilization across diverse logical patterns and significantly mitigates friction in hybrid neuro-symbolic reasoning.
arXiv Detail & Related papers (2026-02-25T05:46:42Z) - ReSyn: Autonomously Scaling Synthetic Environments for Reasoning Models [18.359969463106644]
Reinforcement learning with verifiable rewards (RLVR) has emerged as a promising approach for training reasoning language models (RLMs)<n>In this work, we scale RLVR by introducing ReSyn, a pipeline that generates diverse reasoning environments equipped with instance generators and verifiers.
arXiv Detail & Related papers (2026-02-23T18:34:29Z) - AgentSkiller: Scaling Generalist Agent Intelligence through Semantically Integrated Cross-Domain Data Synthesis [30.512393568258105]
Large Language Model agents demonstrate potential in solving real-world problems via tools, yet generalist intelligence is bottlenecked by scarce high-quality, long-horizon data.<n>We propose AgentSkiller, a fully automated framework synthesizing multi-turn interaction data across realistic, semantically linked domains.
arXiv Detail & Related papers (2026-02-10T03:21:42Z) - Guided Verifier: Collaborative Multimodal Reasoning via Dynamic Process Supervision [11.159231524113764]
Reinforcement Learning (RL) has emerged as a pivotal mechanism for enhancing the complex reasoning capabilities of Multimodal Large Language Models (MLLMs)<n>In this paper, we propose the textbfGuided Verifier framework to address these structural limitations.<n>We develop a specialized data synthesis pipeline targeting multimodal hallucinations, constructing textbfCoRe dataset of process-level negatives and textbfCorrect-guide textbfReasoning trajectories to train the guided verifier.
arXiv Detail & Related papers (2026-02-04T07:38:42Z) - OMG-Agent: Toward Robust Missing Modality Generation with Decoupled Coarse-to-Fine Agentic Workflows [9.617220633655716]
We present textbfunderlineOmni-textbfunderlineModality textbfunderlineGeneration Agent (textbfOMG-Agent)
arXiv Detail & Related papers (2026-02-04T02:25:40Z) - ASTRA: Automated Synthesis of agentic Trajectories and Reinforcement Arenas [13.919124676472022]
ASTRA is an end-to-end framework for training tool-augmented language model agents.<n>ASTRA integrates scalable data synthesis and verifiable reinforcement learning.<n> Experiments on multiple agentic tool-use benchmarks demonstrate that ASTRA-trained models achieve state-of-the-art performance.
arXiv Detail & Related papers (2026-01-29T11:22:23Z) - Let It Flow: Agentic Crafting on Rock and Roll, Building the ROME Model within an Open Agentic Learning Ecosystem [90.17610617854247]
We introduce the Agentic Learning Ecosystem (ALE), a foundational infrastructure that optimize the production pipeline for agentic model.<n>ALE consists of three components: ROLL, a post-training framework for weight optimization; ROCK, a sandbox environment manager for trajectory generation; and iFlow CLI, an agent framework for efficient context engineering.<n>We release ROME, an open-source agent grounded by ALE and trained on over one million trajectories.
arXiv Detail & Related papers (2025-12-31T14:03:39Z) - Managing the Stochastic: Foundations of Learning in Neuro-Symbolic Systems for Software Engineering [0.27195102129094995]
Current approaches to AI coding agents blur the lines between the Large Language Model and the agent itself.<n>This paper proposes setting the control boundary such that the LLM is treated as a component of the environment environment.
arXiv Detail & Related papers (2025-12-18T15:28:21Z) - KBQA-R1: Reinforcing Large Language Models for Knowledge Base Question Answering [64.62317305868264]
We present textbfKBQA-R1, a framework that shifts the paradigm from text imitation to interaction optimization via Reinforcement Learning.<n>Treating KBQA as a multi-turn decision process, our model learns to navigate the knowledge base using a list of actions.<n>Experiments on WebQSP, GrailQA, and GraphQuestions demonstrate that KBQA-R1 achieves state-of-the-art performance.
arXiv Detail & Related papers (2025-12-10T17:45:42Z) - EvoSyn: Generalizable Evolutionary Data Synthesis for Verifiable Learning [63.03672166010434]
We introduce an evolutionary, task-agnostic, strategy-guided, executably-checkable data synthesis framework.<n>It jointly synthesizes problems, diverse candidate solutions, and verification artifacts.<n>It iteratively discovers strategies via a consistency-based evaluator that enforces agreement between human-annotated and strategy-induced checks.
arXiv Detail & Related papers (2025-10-20T11:56:35Z) - Dynamic Generation of Multi-LLM Agents Communication Topologies with Graph Diffusion Models [99.85131798240808]
We introduce a novel generative framework called textitGuided Topology Diffusion (GTD)<n>Inspired by conditional discrete graph diffusion models, GTD formulates topology synthesis as an iterative construction process.<n>At each step, the generation is steered by a lightweight proxy model that predicts multi-objective rewards.<n>Experiments show that GTD can generate highly task-adaptive, sparse, and efficient communication topologies.
arXiv Detail & Related papers (2025-10-09T05:28:28Z) - Thinking Longer, Not Larger: Enhancing Software Engineering Agents via Scaling Test-Time Compute [61.00662702026523]
We propose a unified Test-Time Compute scaling framework that leverages increased inference-time instead of larger models.<n>Our framework incorporates two complementary strategies: internal TTC and external TTC.<n>We demonstrate our textbf32B model achieves a 46% issue resolution rate, surpassing significantly larger models such as DeepSeek R1 671B and OpenAI o1.
arXiv Detail & Related papers (2025-03-31T07:31:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.