Recurrent Neural Networks with Mixed Hierarchical Structures and EM
Algorithm for Natural Language Processing
- URL: http://arxiv.org/abs/2201.08919v1
- Date: Fri, 21 Jan 2022 23:08:33 GMT
- Title: Recurrent Neural Networks with Mixed Hierarchical Structures and EM
Algorithm for Natural Language Processing
- Authors: Zhaoxin Luo and Michael Zhu
- Abstract summary: We develop an approach called the latent indicator layer to identify and learn implicit hierarchical information.
We also develop an EM algorithm to handle the latent indicator layer in training.
We show that the EM-HRNN model with bootstrap training outperforms other RNN-based models in document classification tasks.
- Score: 9.645196221785694
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: How to obtain hierarchical representations with an increasing level of
abstraction becomes one of the key issues of learning with deep neural
networks. A variety of RNN models have recently been proposed to incorporate
both explicit and implicit hierarchical information in modeling languages in
the literature. In this paper, we propose a novel approach called the latent
indicator layer to identify and learn implicit hierarchical information (e.g.,
phrases), and further develop an EM algorithm to handle the latent indicator
layer in training. The latent indicator layer further simplifies a text's
hierarchical structure, which allows us to seamlessly integrate different
levels of attention mechanisms into the structure. We called the resulting
architecture as the EM-HRNN model. Furthermore, we develop two bootstrap
strategies to effectively and efficiently train the EM-HRNN model on long text
documents. Simulation studies and real data applications demonstrate that the
EM-HRNN model with bootstrap training outperforms other RNN-based models in
document classification tasks. The performance of the EM-HRNN model is
comparable to a Transformer-based method called Bert-base, though the former is
much smaller model and does not require pre-training.
Related papers
- Language Models are Graph Learners [70.14063765424012]
Language Models (LMs) are challenging the dominance of domain-specific models, including Graph Neural Networks (GNNs) and Graph Transformers (GTs)
We propose a novel approach that empowers off-the-shelf LMs to achieve performance comparable to state-of-the-art GNNs on node classification tasks.
arXiv Detail & Related papers (2024-10-03T08:27:54Z) - Informed deep hierarchical classification: a non-standard analysis inspired approach [0.0]
It consists in a multi-output deep neural network equipped with specific projection operators placed before each output layer.
The design of such an architecture, called lexicographic hybrid deep neural network (LH-DNN), has been possible by combining tools from different and quite distant research fields.
To assess the efficacy of the approach, the resulting network is compared against the B-CNN, a convolutional neural network tailored for hierarchical classification tasks.
arXiv Detail & Related papers (2024-09-25T14:12:50Z) - Revisiting N-Gram Models: Their Impact in Modern Neural Networks for Handwritten Text Recognition [4.059708117119894]
This study addresses whether explicit language models, specifically n-gram models, still contribute to the performance of state-of-the-art deep learning architectures in the field of handwriting recognition.
We evaluate two prominent neural network architectures, PyLaia and DAN, with and without the integration of explicit n-gram language models.
The results show that incorporating character or subword n-gram models significantly improves the performance of ATR models on all datasets.
arXiv Detail & Related papers (2024-04-30T07:37:48Z) - Analyzing Populations of Neural Networks via Dynamical Model Embedding [10.455447557943463]
A core challenge in the interpretation of deep neural networks is identifying commonalities between the underlying algorithms implemented by distinct networks trained for the same task.
Motivated by this problem, we introduce DYNAMO, an algorithm that constructs low-dimensional manifold where each point corresponds to a neural network model, and two points are nearby if the corresponding neural networks enact similar high-level computational processes.
DYNAMO takes as input a collection of pre-trained neural networks and outputs a meta-model that emulates the dynamics of the hidden states as well as the outputs of any model in the collection.
arXiv Detail & Related papers (2023-02-27T19:00:05Z) - Learning with Multigraph Convolutional Filters [153.20329791008095]
We introduce multigraph convolutional neural networks (MGNNs) as stacked and layered structures where information is processed according to an MSP model.
We also develop a procedure for tractable computation of filter coefficients in the MGNNs and a low cost method to reduce the dimensionality of the information transferred between layers.
arXiv Detail & Related papers (2022-10-28T17:00:50Z) - Self Semi Supervised Neural Architecture Search for Semantic
Segmentation [6.488575826304023]
We propose a Neural Architecture Search strategy based on self supervision and semi-supervised learning for the task of semantic segmentation.
Our approach builds an optimized neural network model for this task.
Experiments on the Cityscapes and PASCAL VOC 2012 datasets demonstrate that the discovered neural network is more efficient than a state-of-the-art hand-crafted NN model.
arXiv Detail & Related papers (2022-01-29T19:49:44Z) - Multi-Scale Semantics-Guided Neural Networks for Efficient
Skeleton-Based Human Action Recognition [140.18376685167857]
A simple yet effective multi-scale semantics-guided neural network is proposed for skeleton-based action recognition.
MS-SGN achieves the state-of-the-art performance on the NTU60, NTU120, and SYSU datasets.
arXiv Detail & Related papers (2021-11-07T03:50:50Z) - HRKD: Hierarchical Relational Knowledge Distillation for Cross-domain
Language Model Compression [53.90578309960526]
Large pre-trained language models (PLMs) have shown overwhelming performances compared with traditional neural network methods.
We propose a hierarchical relational knowledge distillation (HRKD) method to capture both hierarchical and domain relational information.
arXiv Detail & Related papers (2021-10-16T11:23:02Z) - PredRNN: A Recurrent Neural Network for Spatiotemporal Predictive
Learning [109.84770951839289]
We present PredRNN, a new recurrent network for learning visual dynamics from historical context.
We show that our approach obtains highly competitive results on three standard datasets.
arXiv Detail & Related papers (2021-03-17T08:28:30Z) - Train your classifier first: Cascade Neural Networks Training from upper
layers to lower layers [54.47911829539919]
We develop a novel top-down training method which can be viewed as an algorithm for searching for high-quality classifiers.
We tested this method on automatic speech recognition (ASR) tasks and language modelling tasks.
The proposed method consistently improves recurrent neural network ASR models on Wall Street Journal, self-attention ASR models on Switchboard, and AWD-LSTM language models on WikiText-2.
arXiv Detail & Related papers (2021-02-09T08:19:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.