Proactive Guidance of Multi-Turn Conversation in Industrial Search
- URL: http://arxiv.org/abs/2505.24251v1
- Date: Fri, 30 May 2025 06:16:30 GMT
- Title: Proactive Guidance of Multi-Turn Conversation in Industrial Search
- Authors: Xiaoyu Li, Xiao Li, Li Gao, Yiding Liu, Xiaoyang Wang, Shuaiqiang Wang, Junfeng Wang, Dawei Yin,
- Abstract summary: We propose a novel two-phase framework to provide proactive guidance.<n>Goal-adaptive Supervised Fine-Tuning (G-SFT) provides goal-relevant contextual information.<n>Click-oriented Reinforcement Learning (C-RL) constructs preference pairs from user click signals, and proactively improves click-through rates.
- Score: 38.18559057329515
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The evolution of Large Language Models (LLMs) has significantly advanced multi-turn conversation systems, emphasizing the need for proactive guidance to enhance users' interactions. However, these systems face challenges in dynamically adapting to shifts in users' goals and maintaining low latency for real-time interactions. In the Baidu Search AI assistant, an industrial-scale multi-turn search system, we propose a novel two-phase framework to provide proactive guidance. The first phase, Goal-adaptive Supervised Fine-Tuning (G-SFT), employs a goal adaptation agent that dynamically adapts to user goal shifts and provides goal-relevant contextual information. G-SFT also incorporates scalable knowledge transfer to distill insights from LLMs into a lightweight model for real-time interaction. The second phase, Click-oriented Reinforcement Learning (C-RL), adopts a generate-rank paradigm, systematically constructs preference pairs from user click signals, and proactively improves click-through rates through more engaging guidance. This dual-phase architecture achieves complementary objectives: G-SFT ensures accurate goal tracking, while C-RL optimizes interaction quality through click signal-driven reinforcement learning. Extensive experiments demonstrate that our framework achieves 86.10% accuracy in offline evaluation (+23.95% over baseline) and 25.28% CTR in online deployment (149.06% relative improvement), while reducing inference latency by 69.55% through scalable knowledge distillation.
Related papers
- Efficient Beam Selection for ISAC in Cell-Free Massive MIMO via Digital Twin-Assisted Deep Reinforcement Learning [37.540612510652174]
We derive the distribution of joint target detection probabilities across multiple receiving APs under false alarm rate constraints.<n>We then formulate the beam selection procedure as a Markov decision process (MDP)<n>To eliminate the high costs and associated risks of real-time agent-environment interactions, we propose a novel digital twin (DT)-assisted offline DRL approach.
arXiv Detail & Related papers (2025-06-23T12:17:57Z) - ReAgent-V: A Reward-Driven Multi-Agent Framework for Video Understanding [71.654781631463]
ReAgent-V is a novel agentic video understanding framework.<n>It integrates efficient frame selection with real-time reward generation during inference.<n>Extensive experiments on 12 datasets demonstrate significant gains in generalization and reasoning.
arXiv Detail & Related papers (2025-06-02T04:23:21Z) - Graph Based Deep Reinforcement Learning Aided by Transformers for Multi-Agent Cooperation [2.8169258551959544]
We propose a novel framework that integrates Graph Neural Networks (GNNs), Deep Reinforcement Learning (DRL), and transformer-based mechanisms for enhanced multi-agent coordination and collective task execution.<n>Our approach leverages GNNs to model agent-agent and agent-goal interactions through adaptive graph construction, enabling efficient information aggregation and decision-making under constrained communication.
arXiv Detail & Related papers (2025-04-11T01:46:18Z) - GPT Meets Graphs and KAN Splines: Testing Novel Frameworks on Multitask Fine-Tuned GPT-2 with LoRA [0.0]
We explore the potential of integrating learnable and interpretable modules--specifically Kolmogorov-Arnold Networks (KAN) and graph-based representations--within a pre-trained GPT-2 model.
arXiv Detail & Related papers (2025-03-25T19:58:25Z) - DexHandDiff: Interaction-aware Diffusion Planning for Adaptive Dexterous Manipulation [78.60543357822957]
Dexterous manipulation with contact-rich interactions is crucial for advanced robotics.<n>We introduce DexHandDiff, an interaction-aware diffusion planning framework for adaptive dexterous manipulation.<n>Our framework achieves an average of 70.7% success rate on goal adaptive dexterous tasks, highlighting its robustness and flexibility in contact-rich manipulation.
arXiv Detail & Related papers (2024-11-27T18:03:26Z) - CARE Transformer: Mobile-Friendly Linear Visual Transformer via Decoupled Dual Interaction [77.8576094863446]
We propose a new detextbfCoupled dutextbfAl-interactive lineatextbfR atttextbfEntion (CARE) mechanism.
We first propose an asymmetrical feature decoupling strategy that asymmetrically decouples the learning process for local inductive bias and long-range dependencies.
By adopting a decoupled learning way and fully exploiting complementarity across features, our method can achieve both high efficiency and accuracy.
arXiv Detail & Related papers (2024-11-25T07:56:13Z) - Intent Detection in the Age of LLMs [3.755082744150185]
Intent detection is a critical component of task-oriented dialogue systems (TODS)
Traditional approaches relied on computationally efficient supervised sentence transformer encoder models.
The emergence of generative large language models (LLMs) with intrinsic world knowledge presents new opportunities to address these challenges.
arXiv Detail & Related papers (2024-10-02T15:01:55Z) - Enhancing Spectrum Efficiency in 6G Satellite Networks: A GAIL-Powered Policy Learning via Asynchronous Federated Inverse Reinforcement Learning [67.95280175998792]
A novel adversarial imitation learning (GAIL)-powered policy learning approach is proposed for optimizing beamforming, spectrum allocation, and remote user equipment (RUE) association ins.
We employ inverse RL (IRL) to automatically learn reward functions without manual tuning.
We show that the proposed MA-AL method outperforms traditional RL approaches, achieving a $14.6%$ improvement in convergence and reward value.
arXiv Detail & Related papers (2024-09-27T13:05:02Z) - GKT: A Novel Guidance-Based Knowledge Transfer Framework For Efficient Cloud-edge Collaboration LLM Deployment [74.40196814292426]
We introduce a novel and intuitive Guidance-based Knowledge Transfer (GKT) framework.
GKT uses a larger Large Language Models as a ''teacher'' to create guidance prompts, paired with a smaller ''student'' model to finalize responses.
It achieves a maximum accuracy improvement of 14.18%, along with a 10.72 times speed-up on GSM8K and an accuracy improvement of 14.00 % along with a 7.73 times speed-up in CSQA.
arXiv Detail & Related papers (2024-05-30T02:37:35Z) - RLEEGNet: Integrating Brain-Computer Interfaces with Adaptive AI for
Intuitive Responsiveness and High-Accuracy Motor Imagery Classification [0.0]
We introduce a framework that leverages Reinforcement Learning with Deep Q-Networks (DQN) for classification tasks.
We present a preprocessing technique for multiclass motor imagery (MI) classification in a One-Versus-The-Rest (OVR) manner.
The integration of DQN with a 1D-CNN-LSTM architecture optimize the decision-making process in real-time.
arXiv Detail & Related papers (2024-02-09T02:03:13Z) - HiFlash: Communication-Efficient Hierarchical Federated Learning with
Adaptive Staleness Control and Heterogeneity-aware Client-Edge Association [38.99309610943313]
Federated learning (FL) is a promising paradigm that enables collaboratively learning a shared model across massive clients.
For many existing FL systems, clients need to frequently exchange model parameters of large data size with the remote cloud server directly via wide-area networks (WAN)
We resort to the hierarchical federated learning paradigm of HiFL, which reaps the benefits of mobile edge computing.
arXiv Detail & Related papers (2023-01-16T14:39:04Z) - Ada-Segment: Automated Multi-loss Adaptation for Panoptic Segmentation [95.31590177308482]
We propose an automated multi-loss adaptation (named Ada-Segment) to flexibly adjust multiple training losses over the course of training.
With an end-to-end architecture, Ada-Segment generalizes to different datasets without the need of re-tuning hyper parameters.
Ada-Segment brings 2.7% panoptic quality (PQ) improvement on COCO val split from the vanilla baseline, achieving the state-of-the-art 48.5% PQ on COCO test-dev split and 32.9% PQ on ADE20K dataset.
arXiv Detail & Related papers (2020-12-07T11:43:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.