Semantic Distance: A New Metric for ASR Performance Analysis Towards
Spoken Language Understanding
- URL: http://arxiv.org/abs/2104.02138v1
- Date: Mon, 5 Apr 2021 20:25:07 GMT
- Title: Semantic Distance: A New Metric for ASR Performance Analysis Towards
Spoken Language Understanding
- Authors: Suyoun Kim, Abhinav Arora, Duc Le, Ching-Feng Yeh, Christian Fuegen,
Ozlem Kalinli, Michael L. Seltzer
- Abstract summary: We propose a novel Semantic Distance (SemDist) measure as an alternative evaluation metric for ASR systems.
We demonstrate the effectiveness of our proposed metric on various downstream tasks, including intent recognition, semantic parsing, and named entity recognition.
- Score: 26.958001571944678
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Word Error Rate (WER) has been the predominant metric used to evaluate the
performance of automatic speech recognition (ASR) systems. However, WER is
sometimes not a good indicator for downstream Natural Language Understanding
(NLU) tasks, such as intent recognition, slot filling, and semantic parsing in
task-oriented dialog systems. This is because WER takes into consideration only
literal correctness instead of semantic correctness, the latter of which is
typically more important for these downstream tasks. In this study, we propose
a novel Semantic Distance (SemDist) measure as an alternative evaluation metric
for ASR systems to address this issue. We define SemDist as the distance
between a reference and hypothesis pair in a sentence-level embedding space. To
represent the reference and hypothesis as a sentence embedding, we exploit
RoBERTa, a state-of-the-art pre-trained deep contextualized language model
based on the transformer architecture. We demonstrate the effectiveness of our
proposed metric on various downstream tasks, including intent recognition,
semantic parsing, and named entity recognition.
Related papers
- Automatic Speech Recognition System-Independent Word Error Rate Estimation [23.25173244408922]
Word error rate (WER) is a metric used to evaluate the quality of transcriptions produced by Automatic Speech Recognition (ASR) systems.
In this paper, a hypothesis generation method for ASR System-Independent WER estimation is proposed.
arXiv Detail & Related papers (2024-04-25T16:57:05Z) - Improved Contextual Recognition In Automatic Speech Recognition Systems
By Semantic Lattice Rescoring [4.819085609772069]
We propose a novel approach for enhancing contextual recognition within ASR systems via semantic lattice processing.
Our solution consists of using Hidden Markov Models and Gaussian Mixture Models (HMM-GMM) along with Deep Neural Networks (DNN) models for better accuracy.
We demonstrate the effectiveness of our proposed framework on the LibriSpeech dataset with empirical analyses.
arXiv Detail & Related papers (2023-10-14T23:16:05Z) - Unsupervised Semantic Variation Prediction using the Distribution of
Sibling Embeddings [17.803726860514193]
Detection of semantic variation of words is an important task for various NLP applications.
We argue that mean representations alone cannot accurately capture such semantic variations.
We propose a method that uses the entire cohort of the contextualised embeddings of the target word.
arXiv Detail & Related papers (2023-05-15T13:58:21Z) - Textual Entailment Recognition with Semantic Features from Empirical
Text Representation [60.31047947815282]
A text entails a hypothesis if and only if the true value of the hypothesis follows the text.
In this paper, we propose a novel approach to identifying the textual entailment relationship between text and hypothesis.
We employ an element-wise Manhattan distance vector-based feature that can identify the semantic entailment relationship between the text-hypothesis pair.
arXiv Detail & Related papers (2022-10-18T10:03:51Z) - Evaluating User Perception of Speech Recognition System Quality with
Semantic Distance Metric [22.884709676587377]
Word Error Rate (WER) has been traditionally used to evaluate ASR system quality.
We propose evaluating ASR output hypotheses quality with SemDist that can measure semantic correctness.
arXiv Detail & Related papers (2021-10-11T16:09:01Z) - Contextualized Semantic Distance between Highly Overlapped Texts [85.1541170468617]
Overlapping frequently occurs in paired texts in natural language processing tasks like text editing and semantic similarity evaluation.
This paper aims to address the issue with a mask-and-predict strategy.
We take the words in the longest common sequence as neighboring words and use masked language modeling (MLM) to predict the distributions on their positions.
Experiments on Semantic Textual Similarity show NDD to be more sensitive to various semantic differences, especially on highly overlapped paired texts.
arXiv Detail & Related papers (2021-10-04T03:59:15Z) - Instant One-Shot Word-Learning for Context-Specific Neural
Sequence-to-Sequence Speech Recognition [62.997667081978825]
We present an end-to-end ASR system with a word/phrase memory and a mechanism to access this memory to recognize the words and phrases correctly.
In this paper we demonstrate that through this mechanism our system is able to recognize more than 85% of newly added words that it previously failed to recognize.
arXiv Detail & Related papers (2021-07-05T21:08:34Z) - Semantic-WER: A Unified Metric for the Evaluation of ASR Transcript for
End Usability [1.599072005190786]
State-of-the-art systems have achieved a word error rate (WER) less than 5%.
Semantic-WER (SWER) is a metric to evaluate the ASR transcripts for downstream applications in general.
arXiv Detail & Related papers (2021-06-03T17:35:14Z) - NLP-CIC @ DIACR-Ita: POS and Neighbor Based Distributional Models for
Lexical Semantic Change in Diachronic Italian Corpora [62.997667081978825]
We present our systems and findings on unsupervised lexical semantic change for the Italian language.
The task is to determine whether a target word has evolved its meaning with time, only relying on raw-text from two time-specific datasets.
We propose two models representing the target words across the periods to predict the changing words using threshold and voting schemes.
arXiv Detail & Related papers (2020-11-07T11:27:18Z) - Improving Readability for Automatic Speech Recognition Transcription [50.86019112545596]
We propose a novel NLP task called ASR post-processing for readability (APR)
APR aims to transform the noisy ASR output into a readable text for humans and downstream tasks while maintaining the semantic meaning of the speaker.
We compare fine-tuned models based on several open-sourced and adapted pre-trained models with the traditional pipeline method.
arXiv Detail & Related papers (2020-04-09T09:26:42Z) - Towards Accurate Scene Text Recognition with Semantic Reasoning Networks [52.86058031919856]
We propose a novel end-to-end trainable framework named semantic reasoning network (SRN) for accurate scene text recognition.
GSRM is introduced to capture global semantic context through multi-way parallel transmission.
Results on 7 public benchmarks, including regular text, irregular text and non-Latin long text, verify the effectiveness and robustness of the proposed method.
arXiv Detail & Related papers (2020-03-27T09:19:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.