Neural Latent Dependency Model for Sequence Labeling
- URL: http://arxiv.org/abs/2011.05009v1
- Date: Tue, 10 Nov 2020 10:05:21 GMT
- Title: Neural Latent Dependency Model for Sequence Labeling
- Authors: Yang Zhou, Yong Jiang, Zechuan Hu, Kewei Tu
- Abstract summary: A classic approach to sequence labeling is linear chain conditional random fields (CRFs)
One limitation of linear chain CRFs is their inability to model long-range dependencies between labels.
High order CRFs extend linear chain CRFs by no longer than their order, but the computational complexity grows exponentially in the order.
We propose a Neural Latent Dependency Model (NLDM) that models arbitrary length between labels with a latent tree structure.
- Score: 47.32215014130811
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sequence labeling is a fundamental problem in machine learning, natural
language processing and many other fields. A classic approach to sequence
labeling is linear chain conditional random fields (CRFs). When combined with
neural network encoders, they achieve very good performance in many sequence
labeling tasks. One limitation of linear chain CRFs is their inability to model
long-range dependencies between labels. High order CRFs extend linear chain
CRFs by modeling dependencies no longer than their order, but the computational
complexity grows exponentially in the order. In this paper, we propose the
Neural Latent Dependency Model (NLDM) that models dependencies of arbitrary
length between labels with a latent tree structure. We develop an end-to-end
training algorithm and a polynomial-time inference algorithm of our model. We
evaluate our model on both synthetic and real datasets and show that our model
outperforms strong baselines.
Related papers
- Regular-pattern-sensitive CRFs for Distant Label Interactions [10.64258723923874]
Regular-pattern-sensitive CRFs (RPCRFs) are a method of enriching standard linear-chain CRFs with the ability to learn long-distance label interactions.
We show how a RPCRF can be automatically constructed from a set of user-specified patterns, and demonstrate the model's effectiveness on synthetic data.
arXiv Detail & Related papers (2024-11-19T13:08:03Z) - LongVQ: Long Sequence Modeling with Vector Quantization on Structured Memory [63.41820940103348]
Self-attention mechanism's computational cost limits its practicality for long sequences.
We propose a new method called LongVQ to compress the global abstraction as a length-fixed codebook.
LongVQ effectively maintains dynamic global and local patterns, which helps to complement the lack of long-range dependency issues.
arXiv Detail & Related papers (2024-04-17T08:26:34Z) - Non-autoregressive Sequence-to-Sequence Vision-Language Models [63.77614880533488]
We propose a parallel decoding sequence-to-sequence vision-language model that marginalizes over multiple inference paths in the decoder.
The model achieves performance on-par with its state-of-the-art autoregressive counterpart, but is faster at inference time.
arXiv Detail & Related papers (2024-03-04T17:34:59Z) - Hierarchically Gated Recurrent Neural Network for Sequence Modeling [36.14544998133578]
We propose a gated linear RNN model dubbed Hierarchically Gated Recurrent Neural Network (HGRN)
Experiments on language modeling, image classification, and long-range arena benchmarks showcase the efficiency and effectiveness of our proposed model.
arXiv Detail & Related papers (2023-11-08T16:50:05Z) - SequenceMatch: Imitation Learning for Autoregressive Sequence Modelling with Backtracking [60.109453252858806]
A maximum-likelihood (MLE) objective does not match a downstream use-case of autoregressively generating high-quality sequences.
We formulate sequence generation as an imitation learning (IL) problem.
This allows us to minimize a variety of divergences between the distribution of sequences generated by an autoregressive model and sequences from a dataset.
Our resulting method, SequenceMatch, can be implemented without adversarial training or architectural changes.
arXiv Detail & Related papers (2023-06-08T17:59:58Z) - HiPool: Modeling Long Documents Using Graph Neural Networks [24.91040673099863]
Long sequences in Natural Language Processing (NLP) are a challenging problem.
Recent pretraining language models achieve satisfying performances in many NLP tasks.
We propose a new challenging benchmark, totaling six datasets with up to 53k samples and 4034 average tokens' length.
arXiv Detail & Related papers (2023-05-05T06:58:24Z) - Dynamic Graph Message Passing Networks for Visual Recognition [112.49513303433606]
Modelling long-range dependencies is critical for scene understanding tasks in computer vision.
A fully-connected graph is beneficial for such modelling, but its computational overhead is prohibitive.
We propose a dynamic graph message passing network, that significantly reduces the computational complexity.
arXiv Detail & Related papers (2022-09-20T14:41:37Z) - Granger Causality using Neural Networks [7.62566998854384]
We propose novel classes of models that can handle underlying non-linearity in a computationally efficient manner.
We show one can directly decouple lags and individual time series importance via decoupled penalties.
This is important as we want to select the lag order during the process of GC estimation.
arXiv Detail & Related papers (2022-08-07T12:02:48Z) - EIGNN: Efficient Infinite-Depth Graph Neural Networks [51.97361378423152]
Graph neural networks (GNNs) are widely used for modelling graph-structured data in numerous applications.
Motivated by this limitation, we propose a GNN model with infinite depth, which we call Efficient Infinite-Depth Graph Neural Networks (EIGNN)
We show that EIGNN has a better ability to capture long-range dependencies than recent baselines, and consistently achieves state-of-the-art performance.
arXiv Detail & Related papers (2022-02-22T08:16:58Z) - Multi-Scale Label Relation Learning for Multi-Label Classification Using
1-Dimensional Convolutional Neural Networks [0.5801044612920815]
We present Multi-Scale Label Dependence Relation Networks (MSDN), a novel approach to multi-label classification (MLC)
MSDN uses 1-dimensional convolution kernels to learn label dependencies at multi-scale.
We demonstrate that our model can achieve better accuracies with much smaller number of model parameters compared to RNN-based MLC models.
arXiv Detail & Related papers (2021-07-13T09:26:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.