Related papers: LOGIGEN: Logic-Driven Generation of Verifiable Agentic Tasks

LOGIGEN: Logic-Driven Generation of Verifiable Agentic Tasks

URL: http://arxiv.org/abs/2603.00540v1
Date: Sat, 28 Feb 2026 08:35:30 GMT
Title: LOGIGEN: Logic-Driven Generation of Verifiable Agentic Tasks
Authors: Yucheng Zeng, Weipeng Lu, Linyun Liu, Shupeng Li, Zitian Qu, Chenghao Zhu, Shaofei Li, Zhengdong Tan, Mengyue Liu, Haotian Zhao, Zhe Zhou, Jianmin Wu,
Abstract summary: We introduce textbfLOGIGEN, a logic-driven framework that synthesizes verifiable training data.<n>On $2$-Bench, LOGIGEN-32B(RL) achieves a textbf79.5% success rate, substantially outperforming the base model.
Score: 4.6880826836662814
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The evolution of Large Language Models (LLMs) from static instruction-followers to autonomous agents necessitates operating within complex, stateful environments to achieve precise state-transition objectives. However, this paradigm is bottlenecked by data scarcity, as existing tool-centric reverse-synthesis pipelines fail to capture the rigorous logic of real-world applications. We introduce \textbf{LOGIGEN}, a logic-driven framework that synthesizes verifiable training data based on three core pillars: \textbf{Hard-Compiled Policy Grounding}, \textbf{Logic-Driven Forward Synthesis}, and \textbf{Deterministic State Verification}. Specifically, a Triple-Agent Orchestration is employed: the \textbf{Architect} compiles natural-language policy into database constraints to enforce hard rules; the \textbf{Set Designer} initializes boundary-adjacent states to trigger critical policy conflicts; and the \textbf{Explorer} searches this environment to discover causal solution paths. This framework yields a dataset of 20,000 complex tasks across 8 domains, where validity is strictly guaranteed by checking exact state equivalence. Furthermore, we propose a verification-based training protocol where Supervised Fine-Tuning (SFT) on verifiable trajectories establishes compliance with hard-compiled policy, while Reinforcement Learning (RL) guided by dense state-rewards refines long-horizon goal achievement. On $τ^2$-Bench, LOGIGEN-32B(RL) achieves a \textbf{79.5\% success rate}, substantially outperforming the base model (40.7\%). These results demonstrate that logic-driven synthesis combined with verification-based training effectively constructs the causally valid trajectories needed for next-generation agents.

Related papers

Relatron: Automating Relational Machine Learning over Relational Databases [50.94254514286021]
We present a study that unifies RDL and DFS in a shared design space and conducts architecture-centric searches across diverse RDB tasks.<n>Our analysis yields three key findings: (1) RDL does not consistently outperform DFS, with performance being highly task-dependent; (2) no single architecture dominates across tasks, underscoring the need for task-aware model selection; and accuracy is an unreliable guide for choice architecture.
arXiv Detail & Related papers (2026-02-26T02:45:22Z)
NGDB-Zoo: Towards Efficient and Scalable Neural Graph Databases Training [55.35217340229661]
We present NGDB-Zoo, a unified framework that resolves bottlenecks by synergizing operator-level training with semantic augmentation.<n>We demonstrate that NGDB-Zoo maintains high GPU utilization across diverse logical patterns and significantly mitigates friction in hybrid neuro-symbolic reasoning.
arXiv Detail & Related papers (2026-02-25T05:46:42Z)
ReSyn: Autonomously Scaling Synthetic Environments for Reasoning Models [18.359969463106644]
Reinforcement learning with verifiable rewards (RLVR) has emerged as a promising approach for training reasoning language models (RLMs)<n>In this work, we scale RLVR by introducing ReSyn, a pipeline that generates diverse reasoning environments equipped with instance generators and verifiers.
arXiv Detail & Related papers (2026-02-23T18:34:29Z)
AgentSkiller: Scaling Generalist Agent Intelligence through Semantically Integrated Cross-Domain Data Synthesis [30.512393568258105]
Large Language Model agents demonstrate potential in solving real-world problems via tools, yet generalist intelligence is bottlenecked by scarce high-quality, long-horizon data.<n>We propose AgentSkiller, a fully automated framework synthesizing multi-turn interaction data across realistic, semantically linked domains.
arXiv Detail & Related papers (2026-02-10T03:21:42Z)
Guided Verifier: Collaborative Multimodal Reasoning via Dynamic Process Supervision [11.159231524113764]
Reinforcement Learning (RL) has emerged as a pivotal mechanism for enhancing the complex reasoning capabilities of Multimodal Large Language Models (MLLMs)<n>In this paper, we propose the textbfGuided Verifier framework to address these structural limitations.<n>We develop a specialized data synthesis pipeline targeting multimodal hallucinations, constructing textbfCoRe dataset of process-level negatives and textbfCorrect-guide textbfReasoning trajectories to train the guided verifier.
arXiv Detail & Related papers (2026-02-04T07:38:42Z)
OMG-Agent: Toward Robust Missing Modality Generation with Decoupled Coarse-to-Fine Agentic Workflows [9.617220633655716]
We present textbfunderlineOmni-textbfunderlineModality textbfunderlineGeneration Agent (textbfOMG-Agent)
arXiv Detail & Related papers (2026-02-04T02:25:40Z)
ASTRA: Automated Synthesis of agentic Trajectories and Reinforcement Arenas [13.919124676472022]
ASTRA is an end-to-end framework for training tool-augmented language model agents.<n>ASTRA integrates scalable data synthesis and verifiable reinforcement learning.<n> Experiments on multiple agentic tool-use benchmarks demonstrate that ASTRA-trained models achieve state-of-the-art performance.
arXiv Detail & Related papers (2026-01-29T11:22:23Z)
Let It Flow: Agentic Crafting on Rock and Roll, Building the ROME Model within an Open Agentic Learning Ecosystem [90.17610617854247]
We introduce the Agentic Learning Ecosystem (ALE), a foundational infrastructure that optimize the production pipeline for agentic model.<n>ALE consists of three components: ROLL, a post-training framework for weight optimization; ROCK, a sandbox environment manager for trajectory generation; and iFlow CLI, an agent framework for efficient context engineering.<n>We release ROME, an open-source agent grounded by ALE and trained on over one million trajectories.
arXiv Detail & Related papers (2025-12-31T14:03:39Z)
Managing the Stochastic: Foundations of Learning in Neuro-Symbolic Systems for Software Engineering [0.27195102129094995]
Current approaches to AI coding agents blur the lines between the Large Language Model and the agent itself.<n>This paper proposes setting the control boundary such that the LLM is treated as a component of the environment environment.
arXiv Detail & Related papers (2025-12-18T15:28:21Z)
KBQA-R1: Reinforcing Large Language Models for Knowledge Base Question Answering [64.62317305868264]
We present textbfKBQA-R1, a framework that shifts the paradigm from text imitation to interaction optimization via Reinforcement Learning.<n>Treating KBQA as a multi-turn decision process, our model learns to navigate the knowledge base using a list of actions.<n>Experiments on WebQSP, GrailQA, and GraphQuestions demonstrate that KBQA-R1 achieves state-of-the-art performance.
arXiv Detail & Related papers (2025-12-10T17:45:42Z)
EvoSyn: Generalizable Evolutionary Data Synthesis for Verifiable Learning [63.03672166010434]
We introduce an evolutionary, task-agnostic, strategy-guided, executably-checkable data synthesis framework.<n>It jointly synthesizes problems, diverse candidate solutions, and verification artifacts.<n>It iteratively discovers strategies via a consistency-based evaluator that enforces agreement between human-annotated and strategy-induced checks.
arXiv Detail & Related papers (2025-10-20T11:56:35Z)
Dynamic Generation of Multi-LLM Agents Communication Topologies with Graph Diffusion Models [99.85131798240808]
We introduce a novel generative framework called textitGuided Topology Diffusion (GTD)<n>Inspired by conditional discrete graph diffusion models, GTD formulates topology synthesis as an iterative construction process.<n>At each step, the generation is steered by a lightweight proxy model that predicts multi-objective rewards.<n>Experiments show that GTD can generate highly task-adaptive, sparse, and efficient communication topologies.
arXiv Detail & Related papers (2025-10-09T05:28:28Z)
Thinking Longer, Not Larger: Enhancing Software Engineering Agents via Scaling Test-Time Compute [61.00662702026523]
We propose a unified Test-Time Compute scaling framework that leverages increased inference-time instead of larger models.<n>Our framework incorporates two complementary strategies: internal TTC and external TTC.<n>We demonstrate our textbf32B model achieves a 46% issue resolution rate, surpassing significantly larger models such as DeepSeek R1 671B and OpenAI o1.
arXiv Detail & Related papers (2025-03-31T07:31:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.