Error Detection in Large-Scale Natural Language Understanding Systems
Using Transformer Models
- URL: http://arxiv.org/abs/2109.01754v1
- Date: Sat, 4 Sep 2021 00:10:48 GMT
- Title: Error Detection in Large-Scale Natural Language Understanding Systems
Using Transformer Models
- Authors: Rakesh Chada, Pradeep Natarajan, Darshan Fofadiya, Prathap Ramachandra
- Abstract summary: Large-scale conversational assistants like Alexa, Siri, Cortana and Google Assistant process every utterance using multiple models for domain, intent and named entity recognition.
We address this challenge to detect domain classification errors using offline Transformer models.
We combine utterance encodings from a RoBERTa model with the Nbest hypothesis produced by the production system. We then fine-tune end-to-end in a multitask setting using a small dataset of humanannotated utterances with domain classification errors.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large-scale conversational assistants like Alexa, Siri, Cortana and Google
Assistant process every utterance using multiple models for domain, intent and
named entity recognition. Given the decoupled nature of model development and
large traffic volumes, it is extremely difficult to identify utterances
processed erroneously by such systems. We address this challenge to detect
domain classification errors using offline Transformer models. We combine
utterance encodings from a RoBERTa model with the Nbest hypothesis produced by
the production system. We then fine-tune end-to-end in a multitask setting
using a small dataset of humanannotated utterances with domain classification
errors. We tested our approach for detecting misclassifications from one domain
that accounts for <0.5% of the traffic in a large-scale conversational AI
system. Our approach achieves an F1 score of 30% outperforming a bi- LSTM
baseline by 16.9% and a standalone RoBERTa model by 4.8%. We improve this
further by 2.2% to 32.2% by ensembling multiple models.
Related papers
- Predictor-Corrector Enhanced Transformers with Exponential Moving Average Coefficient Learning [73.73967342609603]
We introduce a predictor-corrector learning framework to minimize truncation errors.
We also propose an exponential moving average-based coefficient learning method to strengthen our higher-order predictor.
Our model surpasses a robust 3.8B DeepNet by an average of 2.9 SacreBLEU, using only 1/3 parameters.
arXiv Detail & Related papers (2024-11-05T12:26:25Z) - Large Language Monkeys: Scaling Inference Compute with Repeated Sampling [81.34900892130929]
We explore inference compute as another axis for scaling by increasing the number of generated samples.
In domains like coding and formal proofs, where all answers can be automatically verified, these increases in coverage directly translate into improved performance.
We find that identifying correct samples out of many generations remains an important direction for future research in domains without automatic verifiers.
arXiv Detail & Related papers (2024-07-31T17:57:25Z) - Quantized Transformer Language Model Implementations on Edge Devices [1.2979415757860164]
Large-scale transformer-based models like the Bidirectional Representations from Transformers (BERT) are widely used for Natural Language Processing (NLP) applications.
These models are initially pre-trained with a large corpus with millions of parameters and then fine-tuned for a downstream NLP task.
One of the major limitations of these large-scale models is that they cannot be deployed on resource-constrained devices due to their large model size and increased inference latency.
arXiv Detail & Related papers (2023-10-06T01:59:19Z) - Tryage: Real-time, intelligent Routing of User Prompts to Large Language
Models [1.0878040851637998]
With over 200, 000 models in the Hugging Face ecosystem, users grapple with selecting and optimizing models to suit multifaceted and data domains.
Here, we propose a context-aware routing system, Tryage, that leverages a language model router for optimal selection of expert models from a model library.
arXiv Detail & Related papers (2023-08-22T17:48:24Z) - Federated Distillation of Natural Language Understanding with Confident
Sinkhorns [12.681983862338619]
We propose an approach to learn a central (global) model from the federation of (local) models trained on user-devices.
To learn the global model, the objective is to minimize the optimal transport cost of the global model's predictions from the confident sum of soft-targets assigned by local models.
arXiv Detail & Related papers (2021-10-06T00:44:00Z) - Fast Uncertainty Quantification for Deep Object Pose Estimation [91.09217713805337]
Deep learning-based object pose estimators are often unreliable and overconfident.
In this work, we propose a simple, efficient, and plug-and-play UQ method for 6-DoF object pose estimation.
arXiv Detail & Related papers (2020-11-16T06:51:55Z) - Conformer: Convolution-augmented Transformer for Speech Recognition [60.119604551507805]
Recently Transformer and Convolution neural network (CNN) based models have shown promising results in Automatic Speech Recognition (ASR)
We propose the convolution-augmented transformer for speech recognition, named Conformer.
On the widely used LibriSpeech benchmark, our model achieves WER of 2.1%/4.3% without using a language model and 1.9%/3.9% with an external language model on test/testother.
arXiv Detail & Related papers (2020-05-16T20:56:25Z) - The Right Tool for the Job: Matching Model and Instance Complexities [62.95183777679024]
As NLP models become larger, executing a trained model requires significant computational resources incurring monetary and environmental costs.
We propose a modification to contextual representation fine-tuning which, during inference, allows for an early (and fast) "exit"
We test our proposed modification on five different datasets in two tasks: three text classification datasets and two natural language inference benchmarks.
arXiv Detail & Related papers (2020-04-16T04:28:08Z) - AvgOut: A Simple Output-Probability Measure to Eliminate Dull Responses [97.50616524350123]
We build dialogue models that are dynamically aware of what utterances or tokens are dull without any feature-engineering.
The first model, MinAvgOut, directly maximizes the diversity score through the output distributions of each batch.
The second model, Label Fine-Tuning (LFT), prepends to the source sequence a label continuously scaled by the diversity score to control the diversity level.
The third model, RL, adopts Reinforcement Learning and treats the diversity score as a reward signal.
arXiv Detail & Related papers (2020-01-15T18:32:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.