TopoCurate:Modeling Interaction Topology for Tool-Use Agent Training
- URL: http://arxiv.org/abs/2603.01714v1
- Date: Mon, 02 Mar 2026 10:38:54 GMT
- Title: TopoCurate:Modeling Interaction Topology for Tool-Use Agent Training
- Authors: Jinluan Yang, Yuxin Liu, Zhengyu Chen, Chengcheng Han, Yueqing Sun, Qi Gu, Hui Su, Xunliang Cai, Fei Wu, Kun Kuang,
- Abstract summary: Training tool-use agents typically rely on Supervised Fine-Tuning (SFT) on successful trajectories and Reinforcement Learning (RL) on pass-rate-selected tasks.<n>We propose TopoCurate, an interaction-aware framework that projects multi-trial rollouts from the same task into a unified semantic quotient topology.<n>TopoCurate achieves consistent gains of 4.2% (SFT) and 6.9% (RL) over state-of-the-art baselines.
- Score: 53.93696896939915
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Training tool-use agents typically relies on outcome-based filtering: Supervised Fine-Tuning (SFT) on successful trajectories and Reinforcement Learning (RL) on pass-rate-selected tasks. However, this paradigm ignores interaction dynamics: successful trajectories may lack error recovery or exhibit redundancy, while pass rates fail to distinguish structurally informative tasks from trivial ones. We propose \textbf{TopoCurate}, an interaction-aware framework that projects multi-trial rollouts from the same task into a unified semantic quotient topology. By merging equivalent action-observation states, this projection transforms scattered linear trajectories into a structured manifold that explicitly captures how tool invocations and environmental responses drive the divergence between effective strategies and failure modes. Leveraging this representation, we introduce a dual-selection mechanism: for SFT, we prioritize trajectories demonstrating reflective recovery, semantic efficiency, and strategic diversity to mitigate covariate shift and mode collapse; for RL, we select tasks with high error branch ratios and strategic heterogeneity, maximizing gradient Signal-to-Noise Ratio to address vanishing signals in sparse-reward settings. Evaluations on BFCLv3 and Tau2 Bench show that TopoCurate achieves consistent gains of 4.2\% (SFT) and 6.9\% (RL) over state-of-the-art baselines. We will release the code and data soon for further investigations.
Related papers
- GFRRN: Explore the Gaps in Single Image Reflection Removal [23.018215754935753]
We propose a Gap-Free Reflection Removal Network (GFRRN) for single image reflection removal.<n>In this work, we first adopt the parameter efficient fine-tuning (PEFT) strategy to align the training directions.<n>Then, a label generator is designed to unify the reflection labels for both synthetic and real-world data.<n>Extensive experiments demonstrate the effectiveness of our GFRRN, achieving superior performance against state-of-the-art SIRR methods.
arXiv Detail & Related papers (2026-02-26T07:17:49Z) - Guided Verifier: Collaborative Multimodal Reasoning via Dynamic Process Supervision [11.159231524113764]
Reinforcement Learning (RL) has emerged as a pivotal mechanism for enhancing the complex reasoning capabilities of Multimodal Large Language Models (MLLMs)<n>In this paper, we propose the textbfGuided Verifier framework to address these structural limitations.<n>We develop a specialized data synthesis pipeline targeting multimodal hallucinations, constructing textbfCoRe dataset of process-level negatives and textbfCorrect-guide textbfReasoning trajectories to train the guided verifier.
arXiv Detail & Related papers (2026-02-04T07:38:42Z) - ConsistentRFT: Reducing Visual Hallucinations in Flow-based Reinforcement Fine-Tuning [85.20505958752928]
Reinforcement Fine-Tuning (RFT) on flow-based models is crucial for preference alignment.<n>RFT often introduce visual hallucinations like over-optimized details and semantic misalignment.<n>This work preliminarily explores why visual hallucinations arise and how to reduce them.
arXiv Detail & Related papers (2026-02-03T11:49:46Z) - Model Specific Task Similarity for Vision Language Model Selection via Layer Conductance [92.72779885657373]
We propose a framework that grounds model selection in the internal functional dynamics of the visual encoder.<n>Our approach represents each task via layer wise conductance and derives a target-conditioned block importance distribution through entropy regularized alignment.<n>Building on this, we introduce Directional Conductance Divergence (DCD), an asymmetric metric that quantifies how effectively a source task covers the target's salient functional blocks.
arXiv Detail & Related papers (2026-02-01T17:29:43Z) - From Sparse Decisions to Dense Reasoning: A Multi-attribute Trajectory Paradigm for Multimodal Moderation [59.27094165576015]
We propose a novel learning paradigm (UniMod) that transitions from sparse decision-making to dense reasoning traces.<n>By constructing structured trajectories encompassing evidence grounding, modality assessment, risk mapping, policy decision, and response generation, we reformulate monolithic decision tasks into a multi-dimensional boundary learning process.<n>We introduce specialized optimization strategies to decouple task-specific parameters and rebalance training dynamics, effectively resolving interference between diverse objectives in multi-task learning.
arXiv Detail & Related papers (2026-01-28T09:29:40Z) - MatchTIR: Fine-Grained Supervision for Tool-Integrated Reasoning via Bipartite Matching [60.886768806064936]
Tool-Integrated Reasoning empowers large language models to tackle complex tasks by interleaving reasoning steps with external tool interactions.<n>Existing reinforcement learning methods rely on outcome- or trajectory-level rewards, assigning uniform advantages to all steps within a trajectory.<n>We propose MatchTIR, a framework that introduces fine-grained supervision via bipartite matching-based turn-level reward assignment and dual-level advantage estimation.
arXiv Detail & Related papers (2026-01-15T18:59:23Z) - Contrast & Compress: Learning Lightweight Embeddings for Short Trajectories [11.6132604160666]
We propose a novel framework for learning fixed-dimensional embeddings for short trajectories by leveraging a Transformer encoder.<n>We analyze the influence of Cosine and FFT-based similarity metrics within the contrastive learning paradigm.<n>Our empirical evaluation on the Argoverse 2 dataset demonstrates that embeddings shaped by Cosine similarity objectives yield superior clustering of trajectories.
arXiv Detail & Related papers (2025-06-03T07:53:04Z) - READ: Improving Relation Extraction from an ADversarial Perspective [33.44949503459933]
We propose an adversarial training method specifically designed for relation extraction (RE)
Our approach introduces both sequence- and token-level perturbations to the sample and uses a separate perturbation vocabulary to improve the search for entity and context perturbations.
arXiv Detail & Related papers (2024-04-02T16:42:44Z) - End-to-End Meta-Bayesian Optimisation with Transformer Neural Processes [52.818579746354665]
This paper proposes the first end-to-end differentiable meta-BO framework that generalises neural processes to learn acquisition functions via transformer architectures.
We enable this end-to-end framework with reinforcement learning (RL) to tackle the lack of labelled acquisition data.
arXiv Detail & Related papers (2023-05-25T10:58:46Z) - Learning Invariant Representation via Contrastive Feature Alignment for
Clutter Robust SAR Target Recognition [10.993101256393679]
This letter proposes a solution called Contrastive Feature Alignment (CFA) to learn invariant representation for robust recognition.
CFA combines both classification and CWMSE losses to train the model jointly.
The proposed CFA combines both classification and CWMSE losses to train the model jointly, which allows for the progressive learning of invariant target representation.
arXiv Detail & Related papers (2023-04-04T12:35:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.