Related papers: Factual Consistency Oriented Speech Recognition

Factual Consistency Oriented Speech Recognition

URL: http://arxiv.org/abs/2302.12369v1
Date: Fri, 24 Feb 2023 00:01:41 GMT
Title: Factual Consistency Oriented Speech Recognition
Authors: Naoyuki Kanda, Takuya Yoshioka, Yang Liu
Abstract summary: The proposed framework optimize the ASR model to maximize an expected factual consistency score between ASR hypotheses and ground-truth transcriptions. It is shown that training the ASR models with the proposed framework improves the speech summarization quality as measured by the factual consistency of meeting conversation summaries.
Score: 23.754107608608106
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper presents a novel optimization framework for automatic speech recognition (ASR) with the aim of reducing hallucinations produced by an ASR model. The proposed framework optimizes the ASR model to maximize an expected factual consistency score between ASR hypotheses and ground-truth transcriptions, where the factual consistency score is computed by a separately trained estimator. Experimental results using the AMI meeting corpus and the VoxPopuli corpus show that the ASR model trained with the proposed framework generates ASR hypotheses that have significantly higher consistency scores with ground-truth transcriptions while maintaining the word error rates close to those of cross entropy-trained ASR models. Furthermore, it is shown that training the ASR models with the proposed framework improves the speech summarization quality as measured by the factual consistency of meeting conversation summaries generated by a large language model.

Related papers

Robust Zero-Shot Text-to-Speech Synthesis with Reverse Inference Optimization [34.51491788470738]
We propose reverse inference optimization (RIO) to enhance the robustness of autoregressive-model-based text-to-speech (TTS) systems. RIO uses reverse inference as the standard to select exemplars used in RLHF from the speech samples generated by the TTS system itself. RIO significantly improves the stability of zero-shot TTS performance by reducing the discrepancies between training and inference conditions.
arXiv Detail & Related papers (2024-07-02T13:04:04Z)
Crossmodal ASR Error Correction with Discrete Speech Units [16.58209270191005]
We propose a post-ASR processing approach for ASR Error Correction (AEC) We explore pre-training and fine-tuning strategies and uncover an ASR domain discrepancy phenomenon. We propose the incorporation of discrete speech units to align with and enhance the word embeddings for improving AEC quality.
arXiv Detail & Related papers (2024-05-26T19:58:38Z)
HyPoradise: An Open Baseline for Generative Speech Recognition with Large Language Models [81.56455625624041]
We introduce the first open-source benchmark to utilize external large language models (LLMs) for ASR error correction. The proposed benchmark contains a novel dataset, HyPoradise (HP), encompassing more than 334,000 pairs of N-best hypotheses. LLMs with reasonable prompt and its generative capability can even correct those tokens that are missing in N-best list.
arXiv Detail & Related papers (2023-09-27T14:44:10Z)
How to Estimate Model Transferability of Pre-Trained Speech Models? [84.11085139766108]
"Score-based assessment" framework for estimating transferability of pre-trained speech models. We leverage upon two representation theories, Bayesian likelihood estimation and optimal transport, to generate rank scores for the PSM candidates. Our framework efficiently computes transferability scores without actual fine-tuning of candidate models or layers.
arXiv Detail & Related papers (2023-06-01T04:52:26Z)
Towards Improved Room Impulse Response Estimation for Speech Recognition [53.04440557465013]
We propose a novel approach for blind room impulse response (RIR) estimation systems in the context of far-field automatic speech recognition (ASR) We first draw the connection between improved RIR estimation and improved ASR performance, as a means of evaluating neural RIR estimators. We then propose a generative adversarial network (GAN) based architecture that encodes RIR features from reverberant speech and constructs an RIR from the encoded features.
arXiv Detail & Related papers (2022-11-08T00:40:27Z)
A Comparative Study on Speaker-attributed Automatic Speech Recognition in Multi-party Meetings [53.120885867427305]
Three approaches are evaluated for speaker-attributed automatic speech recognition (SA-ASR) in a meeting scenario. The WD-SOT approach achieves 10.7% relative reduction on averaged speaker-dependent character error rate (SD-CER) The TS-ASR approach also outperforms the FD-SOT approach and brings 16.5% relative average SD-CER reduction.
arXiv Detail & Related papers (2022-03-31T06:39:14Z)
Attention-based Multi-hypothesis Fusion for Speech Summarization [83.04957603852571]
Speech summarization can be achieved by combining automatic speech recognition (ASR) and text summarization (TS) ASR errors directly affect the quality of the output summary in the cascade approach. We propose a cascade speech summarization model that is robust to ASR errors and that exploits multiple hypotheses generated by ASR to attenuate the effect of ASR errors on the summary.
arXiv Detail & Related papers (2021-11-16T03:00:29Z)
Cross-sentence Neural Language Models for Conversational Speech Recognition [17.317583079824423]
We propose an effective cross-sentence neural LM approach that reranks the ASR N-best hypotheses of an upcoming sentence. We also explore to extract task-specific global topical information of the cross-sentence history.
arXiv Detail & Related papers (2021-06-13T05:30:16Z)
Joint Contextual Modeling for ASR Correction and Language Understanding [60.230013453699975]
We propose multi-task neural approaches to perform contextual language correction on ASR outputs jointly with language understanding (LU) We show that the error rates of off the shelf ASR and following LU systems can be reduced significantly by 14% relative with joint models trained using small amounts of in-domain data.
arXiv Detail & Related papers (2020-01-28T22:09:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.