Graph Neural Networks for Contextual ASR with the Tree-Constrained
Pointer Generator
- URL: http://arxiv.org/abs/2305.18824v1
- Date: Tue, 30 May 2023 08:20:58 GMT
- Title: Graph Neural Networks for Contextual ASR with the Tree-Constrained
Pointer Generator
- Authors: Guangzhi Sun, Chao Zhang, Phil Woodland
- Abstract summary: This paper proposes an innovative method for achieving end-to-end contextual ASR using graph neural network (GNN) encodings.
GNN encodings facilitate lookahead for future word pieces in the process of ASR decoding at each tree node.
The performance of the systems was evaluated using the Librispeech and AMI corpus, following the visual-grounded contextual ASR pipeline.
- Score: 9.053645441056256
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The incorporation of biasing words obtained through contextual knowledge is
of paramount importance in automatic speech recognition (ASR) applications.
This paper proposes an innovative method for achieving end-to-end contextual
ASR using graph neural network (GNN) encodings based on the tree-constrained
pointer generator method. GNN node encodings facilitate lookahead for future
word pieces in the process of ASR decoding at each tree node by incorporating
information about all word pieces on the tree branches rooted from it. This
results in a more precise prediction of the generation probability of the
biasing words. The study explores three GNN encoding techniques, namely tree
recursive neural networks, graph convolutional network (GCN), and GraphSAGE,
along with different combinations of the complementary GCN and GraphSAGE
structures. The performance of the systems was evaluated using the Librispeech
and AMI corpus, following the visual-grounded contextual ASR pipeline. The
findings indicate that using GNN encodings achieved consistent and significant
reductions in word error rate (WER), particularly for words that are rare or
have not been seen during the training process. Notably, the most effective
combination of GNN encodings obtained more than 60% WER reduction for rare and
unseen words compared to standard end-to-end systems.
Related papers
- DEGREE: Decomposition Based Explanation For Graph Neural Networks [55.38873296761104]
We propose DEGREE to provide a faithful explanation for GNN predictions.
By decomposing the information generation and aggregation mechanism of GNNs, DEGREE allows tracking the contributions of specific components of the input graph to the final prediction.
We also design a subgraph level interpretation algorithm to reveal complex interactions between graph nodes that are overlooked by previous methods.
arXiv Detail & Related papers (2023-05-22T10:29:52Z) - Tree-constrained Pointer Generator with Graph Neural Network Encodings
for Contextual Speech Recognition [19.372248692745167]
This paper proposes the use of graph neural network (GNN) encodings in a tree-constrained pointer generator ( TCPGen) component for end-to-end contextual ASR.
TCPGen with GNN encodings achieved about a further 15% relative WER reduction on the biasing words compared to the original TCPGen.
arXiv Detail & Related papers (2022-07-02T15:12:18Z) - GNNRank: Learning Global Rankings from Pairwise Comparisons via Directed
Graph Neural Networks [68.61934077627085]
We introduce GNNRank, a modeling framework compatible with any GNN capable of learning digraph embeddings.
We show that our methods attain competitive and often superior performance compared with existing approaches.
arXiv Detail & Related papers (2022-02-01T04:19:50Z) - TextRGNN: Residual Graph Neural Networks for Text Classification [13.912147013558846]
TextRGNN is an improved GNN structure that introduces residual connection to deepen the convolution network depth.
Our structure can obtain a wider node receptive field and effectively suppress the over-smoothing of node features.
It can significantly improve the classification accuracy whether in corpus level or text level, and achieve SOTA performance on a wide range of text classification datasets.
arXiv Detail & Related papers (2021-12-30T13:48:58Z) - TENT: Text Classification Based on ENcoding Tree Learning [9.927112304745542]
We propose TENT to obtain better text classification performance and reduce the reliance on computing power.
Specifically, we first establish a dependency analysis graph for each text and then convert each graph into its corresponding encoding tree.
Experimental results show that our method outperforms other baselines on several datasets.
arXiv Detail & Related papers (2021-10-05T13:55:47Z) - Tree-constrained Pointer Generator for End-to-end Contextual Speech
Recognition [16.160767678589895]
TCPGen is proposed that incorporates such knowledge as a list of biasing words into both attention-based encoder-decoder and transducer end-to-end ASR models.
TCPGen structures the biasing words into an efficient prefix tree to serve as its symbolic input and creates a neural shortcut to facilitate recognising biasing words during decoding.
arXiv Detail & Related papers (2021-09-01T21:41:59Z) - Graph Neural Networks for Natural Language Processing: A Survey [64.36633422999905]
We present a comprehensive overview onGraph Neural Networks (GNNs) for Natural Language Processing.
We propose a new taxonomy of GNNs for NLP, which organizes existing research of GNNs for NLP along three axes: graph construction,graph representation learning, and graph based encoder-decoder models.
arXiv Detail & Related papers (2021-06-10T23:59:26Z) - Enhance Information Propagation for Graph Neural Network by
Heterogeneous Aggregations [7.3136594018091134]
Graph neural networks are emerging as continuation of deep learning success w.r.t. graph data.
We propose to enhance information propagation among GNN layers by combining heterogeneous aggregations.
We empirically validate the effectiveness of HAG-Net on a number of graph classification benchmarks.
arXiv Detail & Related papers (2021-02-08T08:57:56Z) - Binarized Graph Neural Network [65.20589262811677]
We develop a binarized graph neural network to learn the binary representations of the nodes with binary network parameters.
Our proposed method can be seamlessly integrated into the existing GNN-based embedding approaches.
Experiments indicate that the proposed binarized graph neural network, namely BGN, is orders of magnitude more efficient in terms of both time and space.
arXiv Detail & Related papers (2020-04-19T09:43:14Z) - Learning to Hash with Graph Neural Networks for Recommender Systems [103.82479899868191]
Graph representation learning has attracted much attention in supporting high quality candidate search at scale.
Despite its effectiveness in learning embedding vectors for objects in the user-item interaction network, the computational costs to infer users' preferences in continuous embedding space are tremendous.
We propose a simple yet effective discrete representation learning framework to jointly learn continuous and discrete codes.
arXiv Detail & Related papers (2020-03-04T06:59:56Z) - EdgeNets:Edge Varying Graph Neural Networks [179.99395949679547]
This paper puts forth a general framework that unifies state-of-the-art graph neural networks (GNNs) through the concept of EdgeNet.
An EdgeNet is a GNN architecture that allows different nodes to use different parameters to weigh the information of different neighbors.
This is a general linear and local operation that a node can perform and encompasses under one formulation all existing graph convolutional neural networks (GCNNs) as well as graph attention networks (GATs)
arXiv Detail & Related papers (2020-01-21T15:51:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.