Scalable and Robust Self-Learning for Skill Routing in Large-Scale
Conversational AI Systems
- URL: http://arxiv.org/abs/2204.07135v1
- Date: Thu, 14 Apr 2022 17:46:14 GMT
- Title: Scalable and Robust Self-Learning for Skill Routing in Large-Scale
Conversational AI Systems
- Authors: Mohammad Kachuee, Jinseok Nam, Sarthak Ahuja, Jin-Myung Won, Sungjin
Lee
- Abstract summary: State-of-the-art systems use a model-based approach to enable natural conversations.
We propose a scalable self-learning approach to explore routing alternatives.
- Score: 13.705147776518421
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Skill routing is an important component in large-scale conversational
systems. In contrast to traditional rule-based skill routing, state-of-the-art
systems use a model-based approach to enable natural conversations. To provide
supervision signal required to train such models, ideas such as human
annotation, replication of a rule-based system, relabeling based on user
paraphrases, and bandit-based learning were suggested. However, these
approaches: (a) do not scale in terms of the number of skills and skill
on-boarding, (b) require a very costly expert annotation/rule-design, (c)
introduce risks in the user experience with each model update. In this paper,
we present a scalable self-learning approach to explore routing alternatives
without causing abrupt policy changes that break the user experience, learn
from the user interaction, and incrementally improve the routing via frequent
model refreshes. To enable such robust frequent model updates, we suggest a
simple and effective approach that ensures controlled policy updates for
individual domains, followed by an off-policy evaluation for making deployment
decisions without any need for lengthy A/B experimentation. We conduct various
offline and online A/B experiments on a commercial large-scale conversational
system to demonstrate the effectiveness of the proposed method in real-world
production settings.
Related papers
- Local Methods with Adaptivity via Scaling [71.11111992280566]
This paper aims to merge the local training technique with the adaptive approach to develop efficient distributed learning methods.
We consider the classical Local SGD method and enhance it with a scaling feature.
In addition to theoretical analysis, we validate the performance of our methods in practice by training a neural network.
arXiv Detail & Related papers (2024-06-02T19:50:05Z) - Understanding the (Extra-)Ordinary: Validating Deep Model Decisions with Prototypical Concept-based Explanations [13.60538902487872]
We present a novel post-hoc concept-based XAI framework that conveys besides instance-wise (local) also class-wise (global) decision-making strategies via prototypes.
We demonstrate the effectiveness of our approach in identifying out-of-distribution samples, spurious model behavior and data quality issues across three datasets.
arXiv Detail & Related papers (2023-11-28T10:53:26Z) - PILOT: A Pre-Trained Model-Based Continual Learning Toolbox [71.63186089279218]
This paper introduces a pre-trained model-based continual learning toolbox known as PILOT.
On the one hand, PILOT implements some state-of-the-art class-incremental learning algorithms based on pre-trained models, such as L2P, DualPrompt, and CODA-Prompt.
On the other hand, PILOT fits typical class-incremental learning algorithms within the context of pre-trained models to evaluate their effectiveness.
arXiv Detail & Related papers (2023-09-13T17:55:11Z) - Online Matching: A Real-time Bandit System for Large-scale
Recommendations [23.954049092470548]
Online Matching is a scalable closed-loop bandit system learning from users' direct feedback on items in real time.
Diag-LinUCB is a novel extension of the LinUCB algorithm to enable distributed updates of bandits parameter in a scalable and timely manner.
arXiv Detail & Related papers (2023-07-29T05:46:27Z) - Concept Learning for Interpretable Multi-Agent Reinforcement Learning [5.179808182296037]
We introduce a method for incorporating interpretable concepts from a domain expert into models trained through multi-agent reinforcement learning.
This allows an expert to both reason about the resulting concept policy models in terms of these high-level concepts at run-time, as well as intervene and correct mispredictions to improve performance.
We show that this yields improved interpretability and training stability, with benefits to policy performance and sample efficiency in a simulated and real-world cooperative-competitive multi-agent game.
arXiv Detail & Related papers (2023-02-23T18:53:09Z) - Efficient Human-in-the-loop System for Guiding DNNs Attention [25.501443892795614]
We propose an efficient human-in-the-loop system to interactively direct the attention of classifiers to the regions specified by users.
Previous approaches for attention guidance require the preparation of pixel-level annotations and are not designed as interactive systems.
arXiv Detail & Related papers (2022-06-13T09:04:32Z) - Generative Adversarial Reward Learning for Generalized Behavior Tendency
Inference [71.11416263370823]
We propose a generative inverse reinforcement learning for user behavioral preference modelling.
Our model can automatically learn the rewards from user's actions based on discriminative actor-critic network and Wasserstein GAN.
arXiv Detail & Related papers (2021-05-03T13:14:25Z) - Model-Based Machine Learning for Communications [110.47840878388453]
We review existing strategies for combining model-based algorithms and machine learning from a high level perspective.
We focus on symbol detection, which is one of the fundamental tasks of communication receivers.
arXiv Detail & Related papers (2021-01-12T19:55:34Z) - Interactive Search Based on Deep Reinforcement Learning [4.353144350714567]
The project mainly establishes a virtual user environment for offline training.
At the same time, we tried to improve a reinforcement learning algorithm based on bi-clustering to expand the action space and recommended path space of the recommendation agent.
arXiv Detail & Related papers (2020-12-09T15:23:53Z) - Self-Supervised Reinforcement Learning for Recommender Systems [77.38665506495553]
We propose self-supervised reinforcement learning for sequential recommendation tasks.
Our approach augments standard recommendation models with two output layers: one for self-supervised learning and the other for RL.
Based on such an approach, we propose two frameworks namely Self-Supervised Q-learning(SQN) and Self-Supervised Actor-Critic(SAC)
arXiv Detail & Related papers (2020-06-10T11:18:57Z) - Guided Dialog Policy Learning without Adversarial Learning in the Loop [103.20723982440788]
A number of adversarial learning methods have been proposed to learn the reward function together with the dialogue policy.
We propose to decompose the adversarial training into two steps.
First, we train the discriminator with an auxiliary dialogue generator and then incorporate a derived reward model into a common RL method to guide the dialogue policy learning.
arXiv Detail & Related papers (2020-04-07T11:03:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.