PARL-MT: Learning to Call Functions in Multi-Turn Conversation with Progress Awareness
- URL: http://arxiv.org/abs/2509.23206v3
- Date: Thu, 09 Oct 2025 02:23:18 GMT
- Title: PARL-MT: Learning to Call Functions in Multi-Turn Conversation with Progress Awareness
- Authors: Huacan Chai, Zijie Cao, Maolin Ran, Yingxuan Yang, Jianghao Lin, Xin Peng, Hairui Wang, Renjie Ding, Ziyu Wan, Muning Wen, Weiwen Liu, Weinan Zhang, Fei Huang, Ying Wen,
- Abstract summary: We introduce PARL-MT, a framework that explicitly incorporates progress awareness into LLM training for multi-turn function calling.<n>We show that PARL-MT significantly outperforms existing methods, highlighting the effectiveness of progress awareness in enabling robust and efficient multi-turn function calling.
- Score: 57.020401590532686
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models (LLMs) have achieved impressive success in single-turn function calling, yet real-world applications such as travel planning or multi-stage data analysis typically unfold across multi-turn conversations. In these settings, LLMs must not only issue accurate function calls at each step but also maintain progress awareness, the ability to summarize past interactions and plan future actions to ensure coherent, long-horizon task execution. Existing approaches, however, either reduce multi-turn training to isolated single-turn samples, which neglects task-level planning, or employ end-to-end reinforcement learning (RL) that struggles with redundancy and lacks explicit integration of progress awareness. To overcome these limitations, we introduce PARL-MT, a framework that explicitly incorporates progress awareness into LLM training for multi-turn function calling. PARL-MT combines (i) a Progress Awareness Generation (PAG) pipeline, which automatically constructs datasets coupling conversation summaries with future task planning, and (ii) a Progress Awareness-Guided Reinforcement Learning (PAG-RL) algorithm, which integrates progress awareness into RL training to reduce contextual redundancy and improve alignment between local actions and global task completion. Empirical results on two public benchmarks demonstrate that PARL-MT significantly outperforms existing methods, highlighting the effectiveness of progress awareness in enabling robust and efficient multi-turn function calling.
Related papers
- Dynamic Routing Between Experts: A Data-Efficient Approach to Continual Learning in Vision-Language Models [10.431923437214719]
Vision-Language Models (VLMs) suffer from catastrophic forgetting when sequentially fine-tuned on new tasks.<n>We introduce a routing-based approach that enables the integration of new tasks while preserving the foundational knowledge acquired during pretraining.
arXiv Detail & Related papers (2025-11-03T18:39:32Z) - MEJO: MLLM-Engaged Surgical Triplet Recognition via Inter- and Intra-Task Joint Optimization [52.149337961205624]
We propose a framework that empowers both inter- and intra-task optimization for surgical triplet recognition.<n>For inter-task optimization, we introduce the Shared-Specific-Disentangled (S$2$D) learning scheme that decomposes representations into task-shared and task-specific components.<n>For intra-task optimization conflicts, we develop a Coordinated Gradient Learning (CGL) strategy, which dissects and rebalances the positive-negative ambiguities.
arXiv Detail & Related papers (2025-09-16T09:48:52Z) - Omni-Thinker: Scaling Multi-Task RL in LLMs with Hybrid Reward and Task Scheduling [66.0871543682453]
We present Omni-Thinker, a unified reinforcement learning framework that scales large language models across diverse tasks.<n>Our scheduler orders tasks according to accuracy backward transfer (BWT), reducing forgetting and improving multi-task performance.
arXiv Detail & Related papers (2025-07-20T01:50:16Z) - Improving LLM Agent Planning with In-Context Learning via Atomic Fact Augmentation and Lookahead Search [48.348209577994865]
Large Language Models (LLMs) are increasingly capable but often require significant guidance or extensive interaction history to perform effectively in complex, interactive environments.<n>We introduce a novel LLM agent framework that enhances planning capabilities through in-context learning.<n>Our agent learns to extract task-critical atomic facts'' from its interaction trajectories.
arXiv Detail & Related papers (2025-06-10T18:36:31Z) - Planning without Search: Refining Frontier LLMs with Offline Goal-Conditioned RL [62.984693936073974]
Large language models (LLMs) excel in tasks like question answering and dialogue.<n>Complex tasks requiring interaction, such as negotiation and persuasion, require additional long-horizon reasoning and planning.<n>We propose a novel approach that uses goal-conditioned value functions to guide the reasoning of LLM agents.
arXiv Detail & Related papers (2025-05-23T16:51:54Z) - Multi-Modal Self-Supervised Semantic Communication [52.76990720898666]
We propose a multi-modal semantic communication system that leverages multi-modal self-supervised learning to enhance task-agnostic feature extraction.<n>The proposed approach effectively captures both modality-invariant and modality-specific features while minimizing training-related communication overhead.<n>The findings underscore the advantages of multi-modal self-supervised learning in semantic communication, paving the way for more efficient and scalable edge inference systems.
arXiv Detail & Related papers (2025-03-18T06:13:02Z) - Complex LLM Planning via Automated Heuristics Discovery [48.07520536415374]
We consider enhancing large language models (LLMs) for complex planning tasks.<n>We propose automated inferences discovery (AutoHD), a novel approach that enables LLMs to explicitly generate functions to guide-time search.<n>Our proposed method requires no additional model training or finetuning--and the explicit definition of functions generated by the LLMs provides interpretability and insights into the reasoning process.
arXiv Detail & Related papers (2025-02-26T16:52:31Z) - Rewarding What Matters: Step-by-Step Reinforcement Learning for Task-Oriented Dialogue [17.47550065558479]
Reinforcement learning (RL) is a powerful approach to enhance task-oriented dialogue (TOD) systems.
Existing RL methods tend to mainly focus on generation tasks, while neglecting dialogue state tracking (DST) for understanding.
We introduce step-by-step rewards throughout the token generation to extend RL into both understanding and generation tasks.
arXiv Detail & Related papers (2024-06-20T16:15:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.