Constrained Decoding for Computationally Efficient Named Entity
Recognition Taggers
- URL: http://arxiv.org/abs/2010.04362v1
- Date: Fri, 9 Oct 2020 04:07:52 GMT
- Title: Constrained Decoding for Computationally Efficient Named Entity
Recognition Taggers
- Authors: Brian Lester, Daniel Pressel, Amy Hemmeter, Sagnik Ray Choudhury,
Srinivas Bangalore
- Abstract summary: Current work eschews prior knowledge of how the span encoding scheme works and relies on the conditional random field (CRF) learning which transitions are illegal and which are not to facilitate global coherence.
We find that by constraining the output to suppress illegal transitions we can train a tagger with a cross-entropy loss twice as fast as a CRF with differences in F1 that are statistically insignificant.
- Score: 15.279850826041066
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Current state-of-the-art models for named entity recognition (NER) are neural
models with a conditional random field (CRF) as the final layer. Entities are
represented as per-token labels with a special structure in order to decode
them into spans. Current work eschews prior knowledge of how the span encoding
scheme works and relies on the CRF learning which transitions are illegal and
which are not to facilitate global coherence. We find that by constraining the
output to suppress illegal transitions we can train a tagger with a
cross-entropy loss twice as fast as a CRF with differences in F1 that are
statistically insignificant, effectively eliminating the need for a CRF. We
analyze the dynamics of tag co-occurrence to explain when these constraints are
most effective and provide open source implementations of our tagger in both
PyTorch and TensorFlow.
Related papers
- Accelerating Error Correction Code Transformers [56.75773430667148]
We introduce a novel acceleration method for transformer-based decoders.
We achieve a 90% compression ratio and reduce arithmetic operation energy consumption by at least 224 times on modern hardware.
arXiv Detail & Related papers (2024-10-08T11:07:55Z) - A One-Layer Decoder-Only Transformer is a Two-Layer RNN: With an Application to Certified Robustness [17.0639534812572]
ARC-Tran is a novel approach for verifying the robustness of decoder-only Transformers against arbitrary perturbation spaces.
Our evaluation shows that ARC-Tran trains models more robust to arbitrary perturbation spaces than those produced by existing techniques.
arXiv Detail & Related papers (2024-05-27T17:10:04Z) - TFE-GNN: A Temporal Fusion Encoder Using Graph Neural Networks for
Fine-grained Encrypted Traffic Classification [35.211600580761726]
We propose a byte-level traffic graph construction approach based on point-wise mutual information (PMI) and a model named Temporal Fusion.
In particular, we design a dual embedding layer, a GNN-based traffic graph encoder as well as a cross-gated feature fusion mechanism.
The experimental results on two real datasets demonstrate that TFE-GNN outperforms multiple state-of-the-art methods in fine-grained encrypted traffic classification tasks.
arXiv Detail & Related papers (2023-07-31T14:32:40Z) - Sequence Transduction with Graph-based Supervision [96.04967815520193]
We present a new transducer objective function that generalizes the RNN-T loss to accept a graph representation of the labels.
We demonstrate that transducer-based ASR with CTC-like lattice achieves better results compared to standard RNN-T.
arXiv Detail & Related papers (2021-11-01T21:51:42Z) - Learning with Noisy Labels via Sparse Regularization [76.31104997491695]
Learning with noisy labels is an important task for training accurate deep neural networks.
Some commonly-used loss functions, such as Cross Entropy (CE), suffer from severe overfitting to noisy labels.
We introduce the sparse regularization strategy to approximate the one-hot constraint.
arXiv Detail & Related papers (2021-07-31T09:40:23Z) - Constraining Linear-chain CRFs to Regular Languages [10.759863489447204]
A major challenge in structured prediction is to represent the interdependencies within output structures.
We present a generalization of CRFs that can enforce a broad class of constraints, including nonlocal ones.
We prove that constrained training is never worse than constrained decoding, and show empirically that it can be substantially better in practice.
arXiv Detail & Related papers (2021-06-14T11:23:59Z) - WNARS: WFST based Non-autoregressive Streaming End-to-End Speech
Recognition [59.975078145303605]
We propose a novel framework, namely WNARS, using hybrid CTC-attention AED models and weighted finite-state transducers.
On the AISHELL-1 task, our WNARS achieves a character error rate of 5.22% with 640ms latency, to the best of our knowledge, which is the state-of-the-art performance for online ASR.
arXiv Detail & Related papers (2021-04-08T07:56:03Z) - On the Equivalence of Decoupled Graph Convolution Network and Label
Propagation [60.34028546202372]
Some work shows that coupling is inferior to decoupling, which supports deep graph propagation better.
Despite effectiveness, the working mechanisms of the decoupled GCN are not well understood.
We propose a new label propagation method named propagation then training Adaptively (PTA), which overcomes the flaws of the decoupled GCN.
arXiv Detail & Related papers (2020-10-23T13:57:39Z) - Fast and Accurate Neural CRF Constituency Parsing [16.90190521285297]
This work presents a fast and accurate neural CRF constituency computation.
We batchify the inside algorithm for loss by direct large tensor operations on GPU, and avoid the outside algorithm for computation via efficient back-propagation.
Experiments on PTB, CTB5.1, and CTB7 show that our two-stage CRF achieves new state-of-the-art performance on both settings of w/o and w/ BERT.
arXiv Detail & Related papers (2020-08-09T14:38:48Z) - Attentive WaveBlock: Complementarity-enhanced Mutual Networks for
Unsupervised Domain Adaptation in Person Re-identification and Beyond [97.25179345878443]
This paper proposes a novel light-weight module, the Attentive WaveBlock (AWB)
AWB can be integrated into the dual networks of mutual learning to enhance the complementarity and further depress noise in the pseudo-labels.
Experiments demonstrate that the proposed method achieves state-of-the-art performance with significant improvements on multiple UDA person re-identification tasks.
arXiv Detail & Related papers (2020-06-11T15:40:40Z) - Computationally Efficient NER Taggers with Combined Embeddings and
Constrained Decoding [10.643105866460978]
Current State-of-the-Art models in Named Entity Recognition (NER) are neural models with a Conditional Random Field (CRF) as the final network layer, and pre-trained "contextual embeddings"
In this work, we explore two simple techniques that substantially improve NER performance over a strong baseline with negligible cost.
While training a tagger on CoNLL 2003 we find a $786$% speed-up over a contextual embeddings-based tagger without sacrificing strong performance.
arXiv Detail & Related papers (2020-01-05T04:50:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.