Scalable and Robust Self-Learning for Skill Routing in Large-Scale
Conversational AI Systems
- URL: http://arxiv.org/abs/2204.07135v1
- Date: Thu, 14 Apr 2022 17:46:14 GMT
- Title: Scalable and Robust Self-Learning for Skill Routing in Large-Scale
Conversational AI Systems
- Authors: Mohammad Kachuee, Jinseok Nam, Sarthak Ahuja, Jin-Myung Won, Sungjin
Lee
- Abstract summary: State-of-the-art systems use a model-based approach to enable natural conversations.
We propose a scalable self-learning approach to explore routing alternatives.
- Score: 13.705147776518421
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Skill routing is an important component in large-scale conversational
systems. In contrast to traditional rule-based skill routing, state-of-the-art
systems use a model-based approach to enable natural conversations. To provide
supervision signal required to train such models, ideas such as human
annotation, replication of a rule-based system, relabeling based on user
paraphrases, and bandit-based learning were suggested. However, these
approaches: (a) do not scale in terms of the number of skills and skill
on-boarding, (b) require a very costly expert annotation/rule-design, (c)
introduce risks in the user experience with each model update. In this paper,
we present a scalable self-learning approach to explore routing alternatives
without causing abrupt policy changes that break the user experience, learn
from the user interaction, and incrementally improve the routing via frequent
model refreshes. To enable such robust frequent model updates, we suggest a
simple and effective approach that ensures controlled policy updates for
individual domains, followed by an off-policy evaluation for making deployment
decisions without any need for lengthy A/B experimentation. We conduct various
offline and online A/B experiments on a commercial large-scale conversational
system to demonstrate the effectiveness of the proposed method in real-world
production settings.
Related papers
- Enabling Real-Time Conversations with Minimal Training Costs [61.80370154101649]
This paper presents a new duplex decoding approach that enhances large language models with duplex ability, requiring minimal training.
Experimental results indicate that our proposed method significantly enhances the naturalness and human-likeness of user-AI interactions with minimal training costs.
arXiv Detail & Related papers (2024-09-18T06:27:26Z) - Personalizing Reinforcement Learning from Human Feedback with Variational Preference Learning [12.742158403867002]
Reinforcement Learning from Human Feedback is a powerful paradigm for aligning foundation models to human values and preferences.
Current RLHF techniques cannot account for the naturally occurring differences in individual human preferences across a diverse population.
We develop a class of multimodal RLHF methods to address the need for pluralistic alignment.
arXiv Detail & Related papers (2024-08-19T15:18:30Z) - Understanding the (Extra-)Ordinary: Validating Deep Model Decisions with Prototypical Concept-based Explanations [13.60538902487872]
We present a novel post-hoc concept-based XAI framework that conveys besides instance-wise (local) also class-wise (global) decision-making strategies via prototypes.
We demonstrate the effectiveness of our approach in identifying out-of-distribution samples, spurious model behavior and data quality issues across three datasets.
arXiv Detail & Related papers (2023-11-28T10:53:26Z) - PILOT: A Pre-Trained Model-Based Continual Learning Toolbox [71.63186089279218]
This paper introduces a pre-trained model-based continual learning toolbox known as PILOT.
On the one hand, PILOT implements some state-of-the-art class-incremental learning algorithms based on pre-trained models, such as L2P, DualPrompt, and CODA-Prompt.
On the other hand, PILOT fits typical class-incremental learning algorithms within the context of pre-trained models to evaluate their effectiveness.
arXiv Detail & Related papers (2023-09-13T17:55:11Z) - Online Matching: A Real-time Bandit System for Large-scale
Recommendations [23.954049092470548]
Online Matching is a scalable closed-loop bandit system learning from users' direct feedback on items in real time.
Diag-LinUCB is a novel extension of the LinUCB algorithm to enable distributed updates of bandits parameter in a scalable and timely manner.
arXiv Detail & Related papers (2023-07-29T05:46:27Z) - Efficient Human-in-the-loop System for Guiding DNNs Attention [25.501443892795614]
We propose an efficient human-in-the-loop system to interactively direct the attention of classifiers to the regions specified by users.
Previous approaches for attention guidance require the preparation of pixel-level annotations and are not designed as interactive systems.
arXiv Detail & Related papers (2022-06-13T09:04:32Z) - Generative Adversarial Reward Learning for Generalized Behavior Tendency
Inference [71.11416263370823]
We propose a generative inverse reinforcement learning for user behavioral preference modelling.
Our model can automatically learn the rewards from user's actions based on discriminative actor-critic network and Wasserstein GAN.
arXiv Detail & Related papers (2021-05-03T13:14:25Z) - Model-Based Machine Learning for Communications [110.47840878388453]
We review existing strategies for combining model-based algorithms and machine learning from a high level perspective.
We focus on symbol detection, which is one of the fundamental tasks of communication receivers.
arXiv Detail & Related papers (2021-01-12T19:55:34Z) - Interactive Search Based on Deep Reinforcement Learning [4.353144350714567]
The project mainly establishes a virtual user environment for offline training.
At the same time, we tried to improve a reinforcement learning algorithm based on bi-clustering to expand the action space and recommended path space of the recommendation agent.
arXiv Detail & Related papers (2020-12-09T15:23:53Z) - Self-Supervised Reinforcement Learning for Recommender Systems [77.38665506495553]
We propose self-supervised reinforcement learning for sequential recommendation tasks.
Our approach augments standard recommendation models with two output layers: one for self-supervised learning and the other for RL.
Based on such an approach, we propose two frameworks namely Self-Supervised Q-learning(SQN) and Self-Supervised Actor-Critic(SAC)
arXiv Detail & Related papers (2020-06-10T11:18:57Z) - Guided Dialog Policy Learning without Adversarial Learning in the Loop [103.20723982440788]
A number of adversarial learning methods have been proposed to learn the reward function together with the dialogue policy.
We propose to decompose the adversarial training into two steps.
First, we train the discriminator with an auxiliary dialogue generator and then incorporate a derived reward model into a common RL method to guide the dialogue policy learning.
arXiv Detail & Related papers (2020-04-07T11:03:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.