Related papers: Tree-constrained Pointer Generator with Graph Neural Network Encodings for Contextual Speech Recognition

Tree-constrained Pointer Generator with Graph Neural Network Encodings for Contextual Speech Recognition

URL: http://arxiv.org/abs/2207.00857v1
Date: Sat, 2 Jul 2022 15:12:18 GMT
Title: Tree-constrained Pointer Generator with Graph Neural Network Encodings for Contextual Speech Recognition
Authors: Guangzhi Sun, Chao Zhang, Philip C. Woodland
Abstract summary: This paper proposes the use of graph neural network (GNN) encodings in a tree-constrained pointer generator ( TCPGen) component for end-to-end contextual ASR. TCPGen with GNN encodings achieved about a further 15% relative WER reduction on the biasing words compared to the original TCPGen.
Score: 19.372248692745167
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Incorporating biasing words obtained as contextual knowledge is critical for many automatic speech recognition (ASR) applications. This paper proposes the use of graph neural network (GNN) encodings in a tree-constrained pointer generator (TCPGen) component for end-to-end contextual ASR. By encoding the biasing words in the prefix-tree with a tree-based GNN, lookahead for future wordpieces in end-to-end ASR decoding is achieved at each tree node by incorporating information about all wordpieces on the tree branches rooted from it, which allows a more accurate prediction of the generation probability of the biasing words. Systems were evaluated on the Librispeech corpus using simulated biasing tasks, and on the AMI corpus by proposing a novel visual-grounded contextual ASR pipeline that extracts biasing words from slides alongside each meeting. Results showed that TCPGen with GNN encodings achieved about a further 15% relative WER reduction on the biasing words compared to the original TCPGen, with a negligible increase in the computation cost for decoding.

Related papers

Phoneme-aware Encoding for Prefix-tree-based Contextual ASR [45.161909551392085]
Tree-constrained Pointer Generator ( TCPGen) has shown promise for this purpose. We propose extending it with phoneme-aware encoding to better recognize words of unusual pronunciations.
arXiv Detail & Related papers (2023-12-15T07:37:09Z)
Scalable Learning of Latent Language Structure With Logical Offline Cycle Consistency [71.42261918225773]
Conceptually, LOCCO can be viewed as a form of self-learning where the semantic being trained is used to generate annotations for unlabeled text. As an added bonus, the annotations produced by LOCCO can be trivially repurposed to train a neural text generation model.
arXiv Detail & Related papers (2023-05-31T16:47:20Z)
Graph Neural Networks for Contextual ASR with the Tree-Constrained Pointer Generator [9.053645441056256]
This paper proposes an innovative method for achieving end-to-end contextual ASR using graph neural network (GNN) encodings. GNN encodings facilitate lookahead for future word pieces in the process of ASR decoding at each tree node. The performance of the systems was evaluated using the Librispeech and AMI corpus, following the visual-grounded contextual ASR pipeline.
arXiv Detail & Related papers (2023-05-30T08:20:58Z)
A Scalable Graph Neural Network Decoder for Short Block Codes [49.25571364253986]
We propose a novel decoding algorithm for short block codes based on an edge-weighted graph neural network (EW-GNN) The EW-GNN decoder operates on the Tanner graph with an iterative message-passing structure. We show that the EW-GNN decoder outperforms the BP and deep-learning-based BP methods in terms of the decoding error rate.
arXiv Detail & Related papers (2022-11-13T17:13:12Z)
Minimising Biasing Word Errors for Contextual ASR with the Tree-Constrained Pointer Generator [19.372248692745167]
Contextual knowledge is essential for reducing speech recognition errors on high-valued long-tail words. This paper proposes a novel tree-constrained pointer generator ( TCPGen) component that enables end-to-end ASR models to bias towards a list of long-tail words.
arXiv Detail & Related papers (2022-05-18T16:40:50Z)
Speaker Embedding-aware Neural Diarization: a Novel Framework for Overlapped Speech Diarization in the Meeting Scenario [51.5031673695118]
We reformulate overlapped speech diarization as a single-label prediction problem. We propose the speaker embedding-aware neural diarization (SEND) system.
arXiv Detail & Related papers (2022-03-18T06:40:39Z)
Incorporating Constituent Syntax for Coreference Resolution [50.71868417008133]
We propose a graph-based method to incorporate constituent syntactic structures. We also explore to utilise higher-order neighbourhood information to encode rich structures in constituent trees. Experiments on the English and Chinese portions of OntoNotes 5.0 benchmark show that our proposed model either beats a strong baseline or achieves new state-of-the-art performance.
arXiv Detail & Related papers (2022-02-22T07:40:42Z)
TENT: Text Classification Based on ENcoding Tree Learning [9.927112304745542]
We propose TENT to obtain better text classification performance and reduce the reliance on computing power. Specifically, we first establish a dependency analysis graph for each text and then convert each graph into its corresponding encoding tree. Experimental results show that our method outperforms other baselines on several datasets.
arXiv Detail & Related papers (2021-10-05T13:55:47Z)
Tree-constrained Pointer Generator for End-to-end Contextual Speech Recognition [16.160767678589895]
TCPGen is proposed that incorporates such knowledge as a list of biasing words into both attention-based encoder-decoder and transducer end-to-end ASR models. TCPGen structures the biasing words into an efficient prefix tree to serve as its symbolic input and creates a neural shortcut to facilitate recognising biasing words during decoding.
arXiv Detail & Related papers (2021-09-01T21:41:59Z)
Select, Extract and Generate: Neural Keyphrase Generation with Layer-wise Coverage Attention [75.44523978180317]
We propose emphSEG-Net, a neural keyphrase generation model that is composed of two major components. The experimental results on seven keyphrase generation benchmarks from scientific and web documents demonstrate that SEG-Net outperforms the state-of-the-art neural generative methods by a large margin.
arXiv Detail & Related papers (2020-08-04T18:00:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.