Related papers: Scalable and Robust Self-Learning for Skill Routing in Large-Scale Conversational AI Systems

Scalable and Robust Self-Learning for Skill Routing in Large-Scale Conversational AI Systems

URL: http://arxiv.org/abs/2204.07135v1
Date: Thu, 14 Apr 2022 17:46:14 GMT
Title: Scalable and Robust Self-Learning for Skill Routing in Large-Scale Conversational AI Systems
Authors: Mohammad Kachuee, Jinseok Nam, Sarthak Ahuja, Jin-Myung Won, Sungjin Lee
Abstract summary: State-of-the-art systems use a model-based approach to enable natural conversations. We propose a scalable self-learning approach to explore routing alternatives.
Score: 13.705147776518421
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Skill routing is an important component in large-scale conversational systems. In contrast to traditional rule-based skill routing, state-of-the-art systems use a model-based approach to enable natural conversations. To provide supervision signal required to train such models, ideas such as human annotation, replication of a rule-based system, relabeling based on user paraphrases, and bandit-based learning were suggested. However, these approaches: (a) do not scale in terms of the number of skills and skill on-boarding, (b) require a very costly expert annotation/rule-design, (c) introduce risks in the user experience with each model update. In this paper, we present a scalable self-learning approach to explore routing alternatives without causing abrupt policy changes that break the user experience, learn from the user interaction, and incrementally improve the routing via frequent model refreshes. To enable such robust frequent model updates, we suggest a simple and effective approach that ensures controlled policy updates for individual domains, followed by an off-policy evaluation for making deployment decisions without any need for lengthy A/B experimentation. We conduct various offline and online A/B experiments on a commercial large-scale conversational system to demonstrate the effectiveness of the proposed method in real-world production settings.

Related papers

Training a Generally Curious Agent [86.84089201249104]
We present PAPRIKA, a fine-tuning approach that enables language models to develop general decision-making capabilities. Experimental results show that models fine-tuned with PAPRIKA can effectively transfer their learned decision-making capabilities to entirely unseen tasks. These results suggest a promising path towards AI systems that can autonomously solve novel sequential decision-making problems.
arXiv Detail & Related papers (2025-02-24T18:56:58Z)
Enabling Real-Time Conversations with Minimal Training Costs [61.80370154101649]
This paper presents a new duplex decoding approach that enhances large language models with duplex ability, requiring minimal training. Experimental results indicate that our proposed method significantly enhances the naturalness and human-likeness of user-AI interactions with minimal training costs.
arXiv Detail & Related papers (2024-09-18T06:27:26Z)
Reformulating Conversational Recommender Systems as Tri-Phase Offline Policy Learning [5.453444582931813]
Tri-Phase Offline Policy Learning-based Conversational Recommender System (TCRS) This paper introduces the Tri-Phase Offline Policy Learning-based Conversational Recommender System (TCRS)
arXiv Detail & Related papers (2024-08-13T10:58:29Z)
Understanding the (Extra-)Ordinary: Validating Deep Model Decisions with Prototypical Concept-based Explanations [13.60538902487872]
We present a novel post-hoc concept-based XAI framework that conveys besides instance-wise (local) also class-wise (global) decision-making strategies via prototypes. We demonstrate the effectiveness of our approach in identifying out-of-distribution samples, spurious model behavior and data quality issues across three datasets.
arXiv Detail & Related papers (2023-11-28T10:53:26Z)
PILOT: A Pre-Trained Model-Based Continual Learning Toolbox [71.63186089279218]
This paper introduces a pre-trained model-based continual learning toolbox known as PILOT. On the one hand, PILOT implements some state-of-the-art class-incremental learning algorithms based on pre-trained models, such as L2P, DualPrompt, and CODA-Prompt. On the other hand, PILOT fits typical class-incremental learning algorithms within the context of pre-trained models to evaluate their effectiveness.
arXiv Detail & Related papers (2023-09-13T17:55:11Z)
Online Matching: A Real-time Bandit System for Large-scale Recommendations [23.954049092470548]
Online Matching is a scalable closed-loop bandit system learning from users' direct feedback on items in real time. Diag-LinUCB is a novel extension of the LinUCB algorithm to enable distributed updates of bandits parameter in a scalable and timely manner.
arXiv Detail & Related papers (2023-07-29T05:46:27Z)
Efficient Human-in-the-loop System for Guiding DNNs Attention [25.501443892795614]
We propose an efficient human-in-the-loop system to interactively direct the attention of classifiers to the regions specified by users. Previous approaches for attention guidance require the preparation of pixel-level annotations and are not designed as interactive systems.
arXiv Detail & Related papers (2022-06-13T09:04:32Z)
Generative Adversarial Reward Learning for Generalized Behavior Tendency Inference [71.11416263370823]
We propose a generative inverse reinforcement learning for user behavioral preference modelling. Our model can automatically learn the rewards from user's actions based on discriminative actor-critic network and Wasserstein GAN.
arXiv Detail & Related papers (2021-05-03T13:14:25Z)
Model-Based Machine Learning for Communications [110.47840878388453]
We review existing strategies for combining model-based algorithms and machine learning from a high level perspective. We focus on symbol detection, which is one of the fundamental tasks of communication receivers.
arXiv Detail & Related papers (2021-01-12T19:55:34Z)
Interactive Search Based on Deep Reinforcement Learning [4.353144350714567]
The project mainly establishes a virtual user environment for offline training. At the same time, we tried to improve a reinforcement learning algorithm based on bi-clustering to expand the action space and recommended path space of the recommendation agent.
arXiv Detail & Related papers (2020-12-09T15:23:53Z)
Self-Supervised Reinforcement Learning for Recommender Systems [77.38665506495553]
We propose self-supervised reinforcement learning for sequential recommendation tasks. Our approach augments standard recommendation models with two output layers: one for self-supervised learning and the other for RL. Based on such an approach, we propose two frameworks namely Self-Supervised Q-learning(SQN) and Self-Supervised Actor-Critic(SAC)
arXiv Detail & Related papers (2020-06-10T11:18:57Z)
Guided Dialog Policy Learning without Adversarial Learning in the Loop [103.20723982440788]
A number of adversarial learning methods have been proposed to learn the reward function together with the dialogue policy. We propose to decompose the adversarial training into two steps. First, we train the discriminator with an auxiliary dialogue generator and then incorporate a derived reward model into a common RL method to guide the dialogue policy learning.
arXiv Detail & Related papers (2020-04-07T11:03:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.