Related papers: RNN Transducers for Nested Named Entity Recognition with constraints on alignment for long sequences

RNN Transducers for Nested Named Entity Recognition with constraints on alignment for long sequences

URL: http://arxiv.org/abs/2203.03543v1
Date: Tue, 8 Feb 2022 05:38:20 GMT
Title: RNN Transducers for Nested Named Entity Recognition with constraints on alignment for long sequences
Authors: Hagen Soltau, Izhak Shafran, Mingqiu Wang and Laurent El Shafey
Abstract summary: We introduce a new model for NER tasks -- an transducer (RNN-T) RNN-T models learn the alignment using a loss function that sums over all alignments. In NER tasks, however, the alignment between words and target labels are available from annotations. We demonstrate that our fixed alignment model outperforms the standard RNN-score model.
Score: 4.545971444299925
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Popular solutions to Named Entity Recognition (NER) include conditional random fields, sequence-to-sequence models, or utilizing the question-answering framework. However, they are not suitable for nested and overlapping spans with large ontologies and for predicting the position of the entities. To fill this gap, we introduce a new model for NER task -- an RNN transducer (RNN-T). These models are trained using paired input and output sequences without explicitly specifying the alignment between them, similar to other seq-to-seq models. RNN-T models learn the alignment using a loss function that sums over all alignments. In NER tasks, however, the alignment between words and target labels are available from the human annotations. We propose a fixed alignment RNN-T model that utilizes the given alignment, while preserving the benefits of RNN-Ts such as modeling output dependencies. As a more general case, we also propose a constrained alignment model where users can specify a relaxation of the given input alignment and the model will learn an alignment within the given constraints. In other words, we propose a family of seq-to-seq models which can leverage alignments between input and target sequences when available. Through empirical experiments on a challenging real-world medical NER task with multiple nested ontologies, we demonstrate that our fixed alignment model outperforms the standard RNN-T model, improving F1-score from 0.70 to 0.74.

Related papers

CoRMF: Criticality-Ordered Recurrent Mean Field Ising Solver [4.364088891019632]
We propose an RNN-based efficient Ising model solver, the Criticality-ordered Recurrent Mean Field (CoRMF) By leveraging the approximated tree structure of the underlying Ising graph, the newly-obtained criticality order enables the unification between variational mean-field and RNN. CoRFM solves the Ising problems in a self-train fashion without data/evidence, and the inference tasks can be executed by directly sampling from RNN.
arXiv Detail & Related papers (2024-03-05T16:55:06Z)
SequenceMatch: Imitation Learning for Autoregressive Sequence Modelling with Backtracking [60.109453252858806]
A maximum-likelihood (MLE) objective does not match a downstream use-case of autoregressively generating high-quality sequences. We formulate sequence generation as an imitation learning (IL) problem. This allows us to minimize a variety of divergences between the distribution of sequences generated by an autoregressive model and sequences from a dataset. Our resulting method, SequenceMatch, can be implemented without adversarial training or architectural changes.
arXiv Detail & Related papers (2023-06-08T17:59:58Z)
Enhancing Few-shot NER with Prompt Ordering based Data Augmentation [59.69108119752584]
We propose a Prompt Ordering based Data Augmentation (PODA) method to improve the training of unified autoregressive generation frameworks. Experimental results on three public NER datasets and further analyses demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2023-05-19T16:25:43Z)
CGXplain: Rule-Based Deep Neural Network Explanations Using Dual Linear Programs [4.632241550169363]
Rule-based surrogate models are an effective way to approximate a Deep Neural Network's (DNN) decision boundaries. This paper introduces the CGX (Column Generation eXplainer) to address these limitations.
arXiv Detail & Related papers (2023-04-11T13:16:26Z)
Adaptive Discounting of Implicit Language Models in RNN-Transducers [33.63456351411599]
We show how a lightweight adaptive LM discounting technique can be used with any RNN-T architecture. We obtain up to 4% and 14% relative reductions in overall WER and rare word PER, respectively, on a conversational, code-mixed Hindi-English ASR task.
arXiv Detail & Related papers (2022-02-21T08:44:56Z)
Sequence Transduction with Graph-based Supervision [96.04967815520193]
We present a new transducer objective function that generalizes the RNN-T loss to accept a graph representation of the labels. We demonstrate that transducer-based ASR with CTC-like lattice achieves better results compared to standard RNN-T.
arXiv Detail & Related papers (2021-11-01T21:51:42Z)
Structured Reordering for Modeling Latent Alignments in Sequence Transduction [86.94309120789396]
We present an efficient dynamic programming algorithm performing exact marginal inference of separable permutations. The resulting seq2seq model exhibits better systematic generalization than standard models on synthetic problems and NLP tasks.
arXiv Detail & Related papers (2021-06-06T21:53:54Z)
Explaining and Improving Model Behavior with k Nearest Neighbor Representations [107.24850861390196]
We propose using k nearest neighbor representations to identify training examples responsible for a model's predictions. We show that kNN representations are effective at uncovering learned spurious associations. Our results indicate that the kNN approach makes the finetuned model more robust to adversarial inputs.
arXiv Detail & Related papers (2020-10-18T16:55:25Z)
Robust Question Answering Through Sub-part Alignment [53.94003466761305]
We model question answering as an alignment problem. We train our model on SQuAD v1.1 and test it on several adversarial and out-of-domain datasets.
arXiv Detail & Related papers (2020-04-30T09:10:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.