Depth-Adaptive Graph Recurrent Network for Text Classification
- URL: http://arxiv.org/abs/2003.00166v1
- Date: Sat, 29 Feb 2020 03:09:55 GMT
- Title: Depth-Adaptive Graph Recurrent Network for Text Classification
- Authors: Yijin Liu, Fandong Meng, Yufeng Chen, Jinan Xu and Jie Zhou
- Abstract summary: Sentence-State LSTM (S-LSTM) is a powerful and high efficient graph recurrent network.
We propose a depth-adaptive mechanism for the S-LSTM, which allows the model to learn how many computational steps to conduct for different words as required.
- Score: 71.20237659479703
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The Sentence-State LSTM (S-LSTM) is a powerful and high efficient graph
recurrent network, which views words as nodes and performs layer-wise recurrent
steps between them simultaneously. Despite its successes on text
representations, the S-LSTM still suffers from two drawbacks. Firstly, given a
sentence, certain words are usually more ambiguous than others, and thus more
computation steps need to be taken for these difficult words and vice versa.
However, the S-LSTM takes fixed computation steps for all words, irrespective
of their hardness. The secondary one comes from the lack of sequential
information (e.g., word order) that is inherently important for natural
language. In this paper, we try to address these issues and propose a
depth-adaptive mechanism for the S-LSTM, which allows the model to learn how
many computational steps to conduct for different words as required. In
addition, we integrate an extra RNN layer to inject sequential information,
which also serves as an input feature for the decision of adaptive depths.
Results on the classic text classification task (24 datasets in various sizes
and domains) show that our model brings significant improvements against the
conventional S-LSTM and other high-performance models (e.g., the Transformer),
meanwhile achieving a good accuracy-speed trade off.
Related papers
- Spatial Semantic Recurrent Mining for Referring Image Segmentation [63.34997546393106]
We propose Stextsuperscript2RM to achieve high-quality cross-modality fusion.
It follows a working strategy of trilogy: distributing language feature, spatial semantic recurrent coparsing, and parsed-semantic balancing.
Our proposed method performs favorably against other state-of-the-art algorithms.
arXiv Detail & Related papers (2024-05-15T00:17:48Z) - MuSR: Testing the Limits of Chain-of-thought with Multistep Soft Reasoning [63.80739044622555]
We introduce MuSR, a dataset for evaluating language models on soft reasoning tasks specified in a natural language narrative.
This dataset has two crucial features. First, it is created through a novel neurosymbolic synthetic-to-natural generation algorithm.
Second, our dataset instances are free text narratives corresponding to real-world domains of reasoning.
arXiv Detail & Related papers (2023-10-24T17:59:20Z) - Simultaneous Machine Translation with Large Language Models [51.470478122113356]
We investigate the possibility of applying Large Language Models to SimulMT tasks.
We conducted experiments using the textttLlama2-7b-chat model on nine different languages from the MUST-C dataset.
The results show that LLM outperforms dedicated MT models in terms of BLEU and LAAL metrics.
arXiv Detail & Related papers (2023-09-13T04:06:47Z) - SLTUNET: A Simple Unified Model for Sign Language Translation [40.93099095994472]
We propose a simple unified neural model designed to support multiple sign-to-gloss, gloss-to-text and sign-to-text translation tasks.
Jointly modeling different tasks endows SLTUNET with the capability to explore the cross-task relatedness that could help narrow the modality gap.
We show in experiments that SLTUNET achieves competitive and even state-of-the-art performance on ENIX-2014T and CSL-Daily.
arXiv Detail & Related papers (2023-05-02T20:41:59Z) - Co-Driven Recognition of Semantic Consistency via the Fusion of
Transformer and HowNet Sememes Knowledge [6.184249194474601]
This paper proposes a co-driven semantic consistency recognition method based on the fusion of Transformer and HowNet sememes knowledge.
BiLSTM is exploited to encode the conceptual semantic information and infer the semantic consistency.
arXiv Detail & Related papers (2023-02-21T09:53:19Z) - Sentence-Level Sign Language Recognition Framework [0.0]
Sentence-level SLR required mapping videos of sign language sentences to sequences of gloss labels.
CTC is used to avoid pre-segmenting the sentences into individual words.
We evaluate the performance of proposed models on RWTH-PHOENIX-Weather.
arXiv Detail & Related papers (2022-11-13T01:45:41Z) - Predictive Representation Learning for Language Modeling [33.08232449211759]
Correlates of secondary information appear in LSTM representations even though they are not part of an emphexplicitly supervised prediction task.
We propose Predictive Representation Learning (PRL), which explicitly constrains LSTMs to encode specific predictions.
arXiv Detail & Related papers (2021-05-29T05:03:47Z) - Bidirectional LSTM-CRF Attention-based Model for Chinese Word
Segmentation [2.3991565023534087]
We propose a Bidirectional LSTM-CRF Attention-based Model for Chinese word segmentation.
Our model performs better than the baseline methods modeling by other neural networks.
arXiv Detail & Related papers (2021-05-20T11:46:53Z) - A journey in ESN and LSTM visualisations on a language task [77.34726150561087]
We trained ESNs and LSTMs on a Cross-Situationnal Learning (CSL) task.
The results are of three kinds: performance comparison, internal dynamics analyses and visualization of latent space.
arXiv Detail & Related papers (2020-12-03T08:32:01Z) - Understanding Self-supervised Learning with Dual Deep Networks [74.92916579635336]
We propose a novel framework to understand contrastive self-supervised learning (SSL) methods that employ dual pairs of deep ReLU networks.
We prove that in each SGD update of SimCLR with various loss functions, the weights at each layer are updated by a emphcovariance operator.
To further study what role the covariance operator plays and which features are learned in such a process, we model data generation and augmentation processes through a emphhierarchical latent tree model (HLTM)
arXiv Detail & Related papers (2020-10-01T17:51:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.