Related papers: Neural model robustness for skill routing in large-scale conversational AI systems: A design choice exploration

Neural model robustness for skill routing in large-scale conversational AI systems: A design choice exploration

URL: http://arxiv.org/abs/2103.03373v1
Date: Thu, 4 Mar 2021 22:54:33 GMT
Title: Neural model robustness for skill routing in large-scale conversational AI systems: A design choice exploration
Authors: Han Li, Sunghyun Park, Aswarth Dara, Jinseok Nam, Sungjin Lee, Young-Bum Kim, Spyros Matsoukas, Ruhi Sarikaya
Abstract summary: We show how different modeling design choices impact the model robustness in the context of skill routing on a state-of-the-art commercial conversational AI system. We show that applying data augmentation can be a very effective and practical way to drastically improve model robustness.
Score: 34.29393761770914
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Current state-of-the-art large-scale conversational AI or intelligent digital assistant systems in industry comprises a set of components such as Automatic Speech Recognition (ASR) and Natural Language Understanding (NLU). For some of these systems that leverage a shared NLU ontology (e.g., a centralized intent/slot schema), there exists a separate skill routing component to correctly route a request to an appropriate skill, which is either a first-party or third-party application that actually executes on a user request. The skill routing component is needed as there are thousands of skills that can either subscribe to the same intent and/or subscribe to an intent under specific contextual conditions (e.g., device has a screen). Ensuring model robustness or resilience in the skill routing component is an important problem since skills may dynamically change their subscription in the ontology after the skill routing model has been deployed to production. We show how different modeling design choices impact the model robustness in the context of skill routing on a state-of-the-art commercial conversational AI system, specifically on the choices around data augmentation, model architecture, and optimization method. We show that applying data augmentation can be a very effective and practical way to drastically improve model robustness.

Related papers

MoSE: Skill-by-Skill Mixture-of-Expert Learning for Autonomous Driving [14.042949333988785]
We propose a skill-oriented MoE, called MoSE, which mimics human drivers' learning process, skill-by-skill and step-by-step.<n>We build a hierarchical skill dataset and pretrain the router to encourage the model to think step-by-step.<n>With less than 3B sparsely activated parameters, our model outperforms several 8B+ parameters on CODA AD corner case reasoning task.
arXiv Detail & Related papers (2025-07-10T14:48:08Z)
On the creation of narrow AI: hierarchy and nonlocality of neural network skills [8.96017219406018]
We study the problem of creating strong, yet narrow, AI systems.<n>We find that it is sometimes necessary to train networks on a wide distribution of data to learn certain narrow skills.<n>We find that methods based on pruning can still outperform distillation.
arXiv Detail & Related papers (2025-05-21T17:59:21Z)
ELICIT: LLM Augmentation via External In-Context Capability [16.237679215248196]
alg is a framework consisting of two modules designed to effectively store and reuse task vectors. alg serves as a plug-and-play performance booster to enable adaptive elicitation of model capabilities.
arXiv Detail & Related papers (2024-10-12T03:19:06Z)
Adaptive Semantic Token Selection for AI-native Goal-oriented Communications [11.92172357956248]
We propose a novel design for AI-native goal-oriented communications. We exploit transformer neural networks under dynamic inference constraints on bandwidth and computation. We show that our model improves over state-of-the-art token selection mechanisms.
arXiv Detail & Related papers (2024-04-25T13:49:50Z)
An Interactive Agent Foundation Model [49.77861810045509]
We propose an Interactive Agent Foundation Model that uses a novel multi-task agent training paradigm for training AI agents. Our training paradigm unifies diverse pre-training strategies, including visual masked auto-encoders, language modeling, and next-action prediction. We demonstrate the performance of our framework across three separate domains -- Robotics, Gaming AI, and Healthcare.
arXiv Detail & Related papers (2024-02-08T18:58:02Z)
SAI: Solving AI Tasks with Systematic Artificial Intelligence in Communication Network [4.302209772725456]
Systematic Artificial Intelligence (SAI) is a framework designed to solve AI tasks by leveraging Large Language Models (LLMs) and intent-format-based input. SAI can complete numerous complex AI tasks in the communication network, achieving impressive results in network optimization, resource allocation, and other challenging tasks.
arXiv Detail & Related papers (2023-10-13T12:14:58Z)
UniDiff: Advancing Vision-Language Models with Generative and Discriminative Learning [86.91893533388628]
This paper presents UniDiff, a unified multi-modal model that integrates image-text contrastive learning (ITC), text-conditioned image synthesis learning (IS), and reciprocal semantic consistency modeling (RSC) UniDiff demonstrates versatility in both multi-modal understanding and generative tasks.
arXiv Detail & Related papers (2023-06-01T15:39:38Z)
Skill-Based Few-Shot Selection for In-Context Learning [123.26522773708683]
Skill-KNN is a skill-based few-shot selection method for in-context learning. It does not require training or fine-tuning of any models, making it suitable for frequently expanding or changing example banks. Experimental results across five cross-domain semantic parsing datasets and six backbone models show that Skill-KNN significantly outperforms existing methods.
arXiv Detail & Related papers (2023-05-23T16:28:29Z)
Knowledge-in-Context: Towards Knowledgeable Semi-Parametric Language Models [58.42146641102329]
We develop a novel semi-parametric language model architecture, Knowledge-in-Context (KiC) KiC empowers a parametric text-to-text language model with a knowledge-rich external memory. As a knowledge-rich semi-parametric language model, KiC only needs a much smaller part to achieve superior zero-shot performance on unseen tasks.
arXiv Detail & Related papers (2022-10-28T23:18:43Z)
KGI: An Integrated Framework for Knowledge Intensive Language Tasks [16.511913995069097]
In this paper, we propose a system based on an enhanced version of this approach for other knowledge intensive language tasks. Our system achieves results comparable to the best models in the KILT leaderboards.
arXiv Detail & Related papers (2022-04-08T10:36:21Z)
Towards an Interface Description Template for AI-enabled Systems [77.34726150561087]
Reuse is a common system architecture approach that seeks to instantiate a system architecture with existing components. There is currently no framework that guides the selection of necessary information to assess their portability to operate in a system different than the one for which the component was originally purposed. We present ongoing work on establishing an interface description template that captures the main information of an AI-enabled component.
arXiv Detail & Related papers (2020-07-13T20:30:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.