Related papers: Error Detection in Large-Scale Natural Language Understanding Systems Using Transformer Models

Error Detection in Large-Scale Natural Language Understanding Systems Using Transformer Models

URL: http://arxiv.org/abs/2109.01754v1
Date: Sat, 4 Sep 2021 00:10:48 GMT
Title: Error Detection in Large-Scale Natural Language Understanding Systems Using Transformer Models
Authors: Rakesh Chada, Pradeep Natarajan, Darshan Fofadiya, Prathap Ramachandra
Abstract summary: Large-scale conversational assistants like Alexa, Siri, Cortana and Google Assistant process every utterance using multiple models for domain, intent and named entity recognition. We address this challenge to detect domain classification errors using offline Transformer models. We combine utterance encodings from a RoBERTa model with the Nbest hypothesis produced by the production system. We then fine-tune end-to-end in a multitask setting using a small dataset of humanannotated utterances with domain classification errors.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large-scale conversational assistants like Alexa, Siri, Cortana and Google Assistant process every utterance using multiple models for domain, intent and named entity recognition. Given the decoupled nature of model development and large traffic volumes, it is extremely difficult to identify utterances processed erroneously by such systems. We address this challenge to detect domain classification errors using offline Transformer models. We combine utterance encodings from a RoBERTa model with the Nbest hypothesis produced by the production system. We then fine-tune end-to-end in a multitask setting using a small dataset of humanannotated utterances with domain classification errors. We tested our approach for detecting misclassifications from one domain that accounts for <0.5% of the traffic in a large-scale conversational AI system. Our approach achieves an F1 score of 30% outperforming a bi- LSTM baseline by 16.9% and a standalone RoBERTa model by 4.8%. We improve this further by 2.2% to 32.2% by ensembling multiple models.

Related papers

Predictor-Corrector Enhanced Transformers with Exponential Moving Average Coefficient Learning [73.73967342609603]
We introduce a predictor-corrector learning framework to minimize truncation errors. We also propose an exponential moving average-based coefficient learning method to strengthen our higher-order predictor. Our model surpasses a robust 3.8B DeepNet by an average of 2.9 SacreBLEU, using only 1/3 parameters.
arXiv Detail & Related papers (2024-11-05T12:26:25Z)
Large Language Monkeys: Scaling Inference Compute with Repeated Sampling [81.34900892130929]
We explore inference compute as another axis for scaling by increasing the number of generated samples. In domains like coding and formal proofs, where all answers can be automatically verified, these increases in coverage directly translate into improved performance. We find that identifying correct samples out of many generations remains an important direction for future research in domains without automatic verifiers.
arXiv Detail & Related papers (2024-07-31T17:57:25Z)
Quantized Transformer Language Model Implementations on Edge Devices [1.2979415757860164]
Large-scale transformer-based models like the Bidirectional Representations from Transformers (BERT) are widely used for Natural Language Processing (NLP) applications. These models are initially pre-trained with a large corpus with millions of parameters and then fine-tuned for a downstream NLP task. One of the major limitations of these large-scale models is that they cannot be deployed on resource-constrained devices due to their large model size and increased inference latency.
arXiv Detail & Related papers (2023-10-06T01:59:19Z)
Tryage: Real-time, intelligent Routing of User Prompts to Large Language Models [1.0878040851637998]
With over 200, 000 models in the Hugging Face ecosystem, users grapple with selecting and optimizing models to suit multifaceted and data domains. Here, we propose a context-aware routing system, Tryage, that leverages a language model router for optimal selection of expert models from a model library.
arXiv Detail & Related papers (2023-08-22T17:48:24Z)
Part-Based Models Improve Adversarial Robustness [57.699029966800644]
We show that combining human prior knowledge with end-to-end learning can improve the robustness of deep neural networks. Our model combines a part segmentation model with a tiny classifier and is trained end-to-end to simultaneously segment objects into parts. Our experiments indicate that these models also reduce texture bias and yield better robustness against common corruptions and spurious correlations.
arXiv Detail & Related papers (2022-09-15T15:41:47Z)
Federated Distillation of Natural Language Understanding with Confident Sinkhorns [12.681983862338619]
We propose an approach to learn a central (global) model from the federation of (local) models trained on user-devices. To learn the global model, the objective is to minimize the optimal transport cost of the global model's predictions from the confident sum of soft-targets assigned by local models.
arXiv Detail & Related papers (2021-10-06T00:44:00Z)
Fast Uncertainty Quantification for Deep Object Pose Estimation [91.09217713805337]
Deep learning-based object pose estimators are often unreliable and overconfident. In this work, we propose a simple, efficient, and plug-and-play UQ method for 6-DoF object pose estimation.
arXiv Detail & Related papers (2020-11-16T06:51:55Z)
Conformer: Convolution-augmented Transformer for Speech Recognition [60.119604551507805]
Recently Transformer and Convolution neural network (CNN) based models have shown promising results in Automatic Speech Recognition (ASR) We propose the convolution-augmented transformer for speech recognition, named Conformer. On the widely used LibriSpeech benchmark, our model achieves WER of 2.1%/4.3% without using a language model and 1.9%/3.9% with an external language model on test/testother.
arXiv Detail & Related papers (2020-05-16T20:56:25Z)
The Right Tool for the Job: Matching Model and Instance Complexities [62.95183777679024]
As NLP models become larger, executing a trained model requires significant computational resources incurring monetary and environmental costs. We propose a modification to contextual representation fine-tuning which, during inference, allows for an early (and fast) "exit" We test our proposed modification on five different datasets in two tasks: three text classification datasets and two natural language inference benchmarks.
arXiv Detail & Related papers (2020-04-16T04:28:08Z)
AvgOut: A Simple Output-Probability Measure to Eliminate Dull Responses [97.50616524350123]
We build dialogue models that are dynamically aware of what utterances or tokens are dull without any feature-engineering. The first model, MinAvgOut, directly maximizes the diversity score through the output distributions of each batch. The second model, Label Fine-Tuning (LFT), prepends to the source sequence a label continuously scaled by the diversity score to control the diversity level. The third model, RL, adopts Reinforcement Learning and treats the diversity score as a reward signal.
arXiv Detail & Related papers (2020-01-15T18:32:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.