Computationally Efficient NER Taggers with Combined Embeddings and
Constrained Decoding
- URL: http://arxiv.org/abs/2001.01167v2
- Date: Wed, 24 Mar 2021 16:21:07 GMT
- Title: Computationally Efficient NER Taggers with Combined Embeddings and
Constrained Decoding
- Authors: Brian Lester, Daniel Pressel, Amy Hemmeter, and Sagnik Ray Choudhury
- Abstract summary: Current State-of-the-Art models in Named Entity Recognition (NER) are neural models with a Conditional Random Field (CRF) as the final network layer, and pre-trained "contextual embeddings"
In this work, we explore two simple techniques that substantially improve NER performance over a strong baseline with negligible cost.
While training a tagger on CoNLL 2003 we find a $786$% speed-up over a contextual embeddings-based tagger without sacrificing strong performance.
- Score: 10.643105866460978
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Current State-of-the-Art models in Named Entity Recognition (NER) are neural
models with a Conditional Random Field (CRF) as the final network layer, and
pre-trained "contextual embeddings". The CRF layer is used to facilitate global
coherence between labels, and the contextual embeddings provide a better
representation of words in context. However, both of these improvements come at
a high computational cost. In this work, we explore two simple techniques that
substantially improve NER performance over a strong baseline with negligible
cost. First, we use multiple pre-trained embeddings as word representations via
concatenation. Second, we constrain the tagger, trained using a cross-entropy
loss, during decoding to eliminate illegal transitions. While training a tagger
on CoNLL 2003 we find a $786$\% speed-up over a contextual embeddings-based
tagger without sacrificing strong performance. We also show that the
concatenation technique works across multiple tasks and datasets. We analyze
aspects of similarity and coverage between pre-trained embeddings and the
dynamics of tag co-occurrence to explain why these techniques work. We provide
an open source implementation of our tagger using these techniques in three
popular deep learning frameworks --- TensorFlow, Pytorch, and DyNet.
Related papers
- Let Me DeCode You: Decoder Conditioning with Tabular Data [0.15487122608774898]
We introduce a novel approach, DeCode, that utilizes label-derived features for model conditioning to support the decoder in the reconstruction process dynamically.
DeCode focuses on improving 3D segmentation performance through the incorporation of conditioning embedding with learned numerical representation of 3D-label shape features.
Our results show that DeCode significantly outperforms traditional, unconditioned models in terms of generalization to unseen data, achieving higher accuracy at a reduced computational cost.
arXiv Detail & Related papers (2024-07-12T17:14:33Z) - Bit Cipher -- A Simple yet Powerful Word Representation System that
Integrates Efficiently with Language Models [4.807347156077897]
Bit-cipher is a word representation system that eliminates the need of backpropagation and hyper-efficient dimensionality reduction techniques.
We perform probing experiments on part-of-speech (POS) tagging and named entity recognition (NER) to assess bit-cipher's competitiveness with classic embeddings.
By replacing embedding layers with cipher embeddings, our experiments illustrate the notable efficiency of cipher in accelerating the training process and attaining better optima.
arXiv Detail & Related papers (2023-11-18T08:47:35Z) - Improved Convergence Guarantees for Shallow Neural Networks [91.3755431537592]
We prove convergence of depth 2 neural networks, trained via gradient descent, to a global minimum.
Our model has the following features: regression with quadratic loss function, fully connected feedforward architecture, RelU activations, Gaussian data instances, adversarial labels.
They strongly suggest that, at least in our model, the convergence phenomenon extends well beyond the NTK regime''
arXiv Detail & Related papers (2022-12-05T14:47:52Z) - Optimizing Bi-Encoder for Named Entity Recognition via Contrastive
Learning [80.36076044023581]
We present an efficient bi-encoder framework for named entity recognition (NER)
We frame NER as a metric learning problem that maximizes the similarity between the vector representations of an entity mention and its type.
A major challenge to this bi-encoder formulation for NER lies in separating non-entity spans from entity mentions.
arXiv Detail & Related papers (2022-08-30T23:19:04Z) - Distributed Adversarial Training to Robustify Deep Neural Networks at
Scale [100.19539096465101]
Current deep neural networks (DNNs) are vulnerable to adversarial attacks, where adversarial perturbations to the inputs can change or manipulate classification.
To defend against such attacks, an effective approach, known as adversarial training (AT), has been shown to mitigate robust training.
We propose a large-batch adversarial training framework implemented over multiple machines.
arXiv Detail & Related papers (2022-06-13T15:39:43Z) - Bandit Sampling for Multiplex Networks [8.771092194928674]
We propose an algorithm for scalable learning on multiplex networks with a large number of layers.
Online learning algorithm learns how to sample relevant neighboring layers so that only the layers with relevant information are aggregated during training.
We present experimental results on both synthetic and real-world scenarios.
arXiv Detail & Related papers (2022-02-08T03:26:34Z) - Defensive Tensorization [113.96183766922393]
We propose tensor defensiveization, an adversarial defence technique that leverages a latent high-order factorization of the network.
We empirically demonstrate the effectiveness of our approach on standard image classification benchmarks.
We validate the versatility of our approach across domains and low-precision architectures by considering an audio task and binary networks.
arXiv Detail & Related papers (2021-10-26T17:00:16Z) - ProgFed: Effective, Communication, and Computation Efficient Federated Learning by Progressive Training [65.68511423300812]
We propose ProgFed, a progressive training framework for efficient and effective federated learning.
ProgFed inherently reduces computation and two-way communication costs while maintaining the strong performance of the final models.
Our results show that ProgFed converges at the same rate as standard training on full models.
arXiv Detail & Related papers (2021-10-11T14:45:00Z) - Constrained Decoding for Computationally Efficient Named Entity
Recognition Taggers [15.279850826041066]
Current work eschews prior knowledge of how the span encoding scheme works and relies on the conditional random field (CRF) learning which transitions are illegal and which are not to facilitate global coherence.
We find that by constraining the output to suppress illegal transitions we can train a tagger with a cross-entropy loss twice as fast as a CRF with differences in F1 that are statistically insignificant.
arXiv Detail & Related papers (2020-10-09T04:07:52Z) - Multiple Word Embeddings for Increased Diversity of Representation [15.279850826041066]
We show a technique that substantially and consistently improves performance over a strong baseline with negligible increase in run time.
We analyze aspects of pre-trained embedding similarity and vocabulary coverage and find that the representational diversity is the driving force of why this technique works.
arXiv Detail & Related papers (2020-09-30T02:33:09Z) - Text Classification with Few Examples using Controlled Generalization [58.971750512415134]
Current practice relies on pre-trained word embeddings to map words unseen in training to similar seen ones.
Our alternative begins with sparse pre-trained representations derived from unlabeled parsed corpora.
We show that a feed-forward network over these vectors is especially effective in low-data scenarios.
arXiv Detail & Related papers (2020-05-18T06:04:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.