Improving Generalization in Intent Detection: GRPO with Reward-Based Curriculum Sampling
- URL: http://arxiv.org/abs/2504.13592v2
- Date: Mon, 21 Apr 2025 03:29:14 GMT
- Title: Improving Generalization in Intent Detection: GRPO with Reward-Based Curriculum Sampling
- Authors: Zihao Feng, Xiaoxue Wang, Ziwei Bai, Donghang Su, Bowen Wu, Qun Yu, Baoxun Wang,
- Abstract summary: Existing approaches, such as zero-shot reformulations, struggle with performance degradation on unseen intents.<n>We employ Reinforcement Learning (RL) combined with a Reward-based Curriculum Sampling (RCS) during Group Relative Policy Optimization training in intent detection tasks.
- Score: 5.321647713109401
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Intent detection, a critical component in task-oriented dialogue (TOD) systems, faces significant challenges in adapting to the rapid influx of integrable tools with complex interrelationships. Existing approaches, such as zero-shot reformulations and LLM-based dynamic recognition, struggle with performance degradation when encountering unseen intents, leading to erroneous task routing. To enhance the model's generalization performance on unseen tasks, we employ Reinforcement Learning (RL) combined with a Reward-based Curriculum Sampling (RCS) during Group Relative Policy Optimization (GRPO) training in intent detection tasks. Experiments demonstrate that RL-trained models substantially outperform supervised fine-tuning (SFT) baselines in generalization. Besides, the introduction of the RCS, significantly bolsters the effectiveness of RL in intent detection by focusing the model on challenging cases during training. Moreover, incorporating Chain-of-Thought (COT) processes in RL notably improves generalization in complex intent detection tasks, underscoring the importance of thought in challenging scenarios. This work advances the generalization of intent detection tasks, offering practical insights for deploying adaptable dialogue systems.
Related papers
- ReMA: Learning to Meta-think for LLMs with Multi-Agent Reinforcement Learning [54.787341008881036]
We introduce Reinforced Meta-thinking Agents (ReMA), a novel framework that leverages Multi-Agent Reinforcement Learning (MARL) to elicit meta-thinking behaviors.<n>ReMA decouples the reasoning process into two hierarchical agents: a high-level meta-thinking agent responsible for generating strategic oversight and plans, and a low-level reasoning agent for detailed executions.<n> Experimental results demonstrate that ReMA outperforms single-agent RL baselines on complex reasoning tasks.
arXiv Detail & Related papers (2025-03-12T16:05:31Z) - GTR: Guided Thought Reinforcement Prevents Thought Collapse in RL-based VLM Agent Training [62.536191233049614]
Reinforcement learning with verifiable outcome rewards (RLVR) has effectively scaled up chain-of-thought (CoT) reasoning in large language models (LLMs)
This work investigates this problem through extensive experiments on complex card games, such as 24 points, and embodied tasks from ALFWorld.
We find that when rewards are based solely on action outcomes, RL fails to incentivize CoT reasoning in VLMs, instead leading to a phenomenon we termed thought collapse.
arXiv Detail & Related papers (2025-03-11T15:17:02Z) - Salience-Invariant Consistent Policy Learning for Generalization in Visual Reinforcement Learning [12.9372563969007]
Generalizing policies to unseen scenarios remains a critical challenge in visual reinforcement learning.<n>In unseen environments, distracting pixels may lead agents to extract representations containing task-irrelevant information.<n>We propose the Salience-Invariant Consistent Policy Learning algorithm, an efficient framework for zero-shot generalization.
arXiv Detail & Related papers (2025-02-12T12:00:16Z) - Toward Task Generalization via Memory Augmentation in Meta-Reinforcement Learning [43.69919534800985]
In reinforcement learning (RL), agents often struggle to perform well on tasks that differ from those encountered during training.<n>This limitation presents a challenge to the broader deployment of RL in diverse and dynamic task settings.<n>We introduce memory augmentation, a memory-based RL approach to improve task generalization.
arXiv Detail & Related papers (2025-02-03T17:00:19Z) - Importance Sampling-Guided Meta-Training for Intelligent Agents in Highly Interactive Environments [43.144056801987595]
This study introduces a novel training framework that integrates guided meta RL with importance sampling (IS) to optimize training distributions.
By estimating a naturalistic distribution from real-world datasets, the framework ensures a balanced focus across common and extreme driving scenarios.
arXiv Detail & Related papers (2024-07-22T17:57:12Z) - Semantically Aligned Task Decomposition in Multi-Agent Reinforcement
Learning [56.26889258704261]
We propose a novel "disentangled" decision-making method, Semantically Aligned task decomposition in MARL (SAMA)
SAMA prompts pretrained language models with chain-of-thought that can suggest potential goals, provide suitable goal decomposition and subgoal allocation as well as self-reflection-based replanning.
SAMA demonstrates considerable advantages in sample efficiency compared to state-of-the-art ASG methods.
arXiv Detail & Related papers (2023-05-18T10:37:54Z) - Meta-Reinforcement Learning Based on Self-Supervised Task Representation
Learning [23.45043290237396]
MoSS is a context-based Meta-reinforcement learning algorithm based on Self-Supervised task representation learning.
On MuJoCo and Meta-World benchmarks, MoSS outperforms prior in terms of performance, sample efficiency (3-50x faster), adaptation efficiency, and generalization.
arXiv Detail & Related papers (2023-04-29T15:46:19Z) - Mastering the Unsupervised Reinforcement Learning Benchmark from Pixels [112.63440666617494]
Reinforcement learning algorithms can succeed but require large amounts of interactions between the agent and the environment.
We propose a new method to solve it, using unsupervised model-based RL, for pre-training the agent.
We show robust performance on the Real-Word RL benchmark, hinting at resiliency to environment perturbations during adaptation.
arXiv Detail & Related papers (2022-09-24T14:22:29Z) - Learning Task-relevant Representations for Generalization via
Characteristic Functions of Reward Sequence Distributions [63.773813221460614]
Generalization across different environments with the same tasks is critical for successful applications of visual reinforcement learning.
We propose a novel approach, namely Characteristic Reward Sequence Prediction (CRESP), to extract the task-relevant information.
Experiments demonstrate that CRESP significantly improves the performance of generalization on unseen environments.
arXiv Detail & Related papers (2022-05-20T14:52:03Z) - Contextualize Me -- The Case for Context in Reinforcement Learning [49.794253971446416]
Contextual Reinforcement Learning (cRL) provides a framework to model such changes in a principled manner.
We show how cRL contributes to improving zero-shot generalization in RL through meaningful benchmarks and structured reasoning about generalization tasks.
arXiv Detail & Related papers (2022-02-09T15:01:59Z) - Dynamics Generalization via Information Bottleneck in Deep Reinforcement
Learning [90.93035276307239]
We propose an information theoretic regularization objective and an annealing-based optimization method to achieve better generalization ability in RL agents.
We demonstrate the extreme generalization benefits of our approach in different domains ranging from maze navigation to robotic tasks.
This work provides a principled way to improve generalization in RL by gradually removing information that is redundant for task-solving.
arXiv Detail & Related papers (2020-08-03T02:24:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.