ProtoNER: Few shot Incremental Learning for Named Entity Recognition
using Prototypical Networks
- URL: http://arxiv.org/abs/2310.02372v1
- Date: Tue, 3 Oct 2023 18:52:19 GMT
- Title: ProtoNER: Few shot Incremental Learning for Named Entity Recognition
using Prototypical Networks
- Authors: Ritesh Kumar, Saurabh Goyal, Ashish Verma, Vatche Isahagian
- Abstract summary: Prototypical Network based end-to-end KVP extraction model is presented.
No dependency on dataset used for initial training of the model.
No intermediate synthetic data generation which tends to add noise and results in model's performance degradation.
- Score: 7.317342506617286
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Key value pair (KVP) extraction or Named Entity Recognition(NER) from
visually rich documents has been an active area of research in document
understanding and data extraction domain. Several transformer based models such
as LayoutLMv2, LayoutLMv3, and LiLT have emerged achieving state of the art
results. However, addition of even a single new class to the existing model
requires (a) re-annotation of entire training dataset to include this new class
and (b) retraining the model again. Both of these issues really slow down the
deployment of updated model. \\ We present \textbf{ProtoNER}: Prototypical
Network based end-to-end KVP extraction model that allows addition of new
classes to an existing model while requiring minimal number of newly annotated
training samples. The key contributions of our model are: (1) No dependency on
dataset used for initial training of the model, which alleviates the need to
retain original training dataset for longer duration as well as data
re-annotation which is very time consuming task, (2) No intermediate synthetic
data generation which tends to add noise and results in model's performance
degradation, and (3) Hybrid loss function which allows model to retain
knowledge about older classes as well as learn about newly added classes.\\
Experimental results show that ProtoNER finetuned with just 30 samples is able
to achieve similar results for the newly added classes as that of regular model
finetuned with 2600 samples.
Related papers
- A Two-Phase Recall-and-Select Framework for Fast Model Selection [13.385915962994806]
We propose a two-phase (coarse-recall and fine-selection) model selection framework.
It aims to enhance the efficiency of selecting a robust model by leveraging the models' training performances on benchmark datasets.
It has been demonstrated that the proposed methodology facilitates the selection of a high-performing model at a rate about 3x times faster than conventional baseline methods.
arXiv Detail & Related papers (2024-03-28T14:44:44Z) - Adapt & Align: Continual Learning with Generative Models Latent Space
Alignment [15.729732755625474]
We introduce Adapt & Align, a method for continual learning of neural networks by aligning latent representations in generative models.
Neural Networks suffer from abrupt loss in performance when retrained with additional data.
We propose a new method that mitigates those problems by employing generative models and splitting the process of their update into two parts.
arXiv Detail & Related papers (2023-12-21T10:02:17Z) - Generalization Properties of Retrieval-based Models [50.35325326050263]
Retrieval-based machine learning methods have enjoyed success on a wide range of problems.
Despite growing literature showcasing the promise of these models, the theoretical underpinning for such models remains underexplored.
We present a formal treatment of retrieval-based models to characterize their generalization ability.
arXiv Detail & Related papers (2022-10-06T00:33:01Z) - CHALLENGER: Training with Attribution Maps [63.736435657236505]
We show that utilizing attribution maps for training neural networks can improve regularization of models and thus increase performance.
In particular, we show that our generic domain-independent approach yields state-of-the-art results in vision, natural language processing and on time series tasks.
arXiv Detail & Related papers (2022-05-30T13:34:46Z) - Bridging Non Co-occurrence with Unlabeled In-the-wild Data for
Incremental Object Detection [56.22467011292147]
Several incremental learning methods are proposed to mitigate catastrophic forgetting for object detection.
Despite the effectiveness, these methods require co-occurrence of the unlabeled base classes in the training data of the novel classes.
We propose the use of unlabeled in-the-wild data to bridge the non-occurrence caused by the missing base classes during the training of additional novel classes.
arXiv Detail & Related papers (2021-10-28T10:57:25Z) - Towards Open-World Feature Extrapolation: An Inductive Graph Learning
Approach [80.8446673089281]
We propose a new learning paradigm with graph representation and learning.
Our framework contains two modules: 1) a backbone network (e.g., feedforward neural nets) as a lower model takes features as input and outputs predicted labels; 2) a graph neural network as an upper model learns to extrapolate embeddings for new features via message passing over a feature-data graph built from observed data.
arXiv Detail & Related papers (2021-10-09T09:02:45Z) - Few-Shot Named Entity Recognition: A Comprehensive Study [92.40991050806544]
We investigate three schemes to improve the model generalization ability for few-shot settings.
We perform empirical comparisons on 10 public NER datasets with various proportions of labeled data.
We create new state-of-the-art results on both few-shot and training-free settings.
arXiv Detail & Related papers (2020-12-29T23:43:16Z) - Predictive process mining by network of classifiers and clusterers: the
PEDF model [0.0]
The PEDF model learns based on events' sequences, durations, and extra features.
The model requires to extract two sets of data from log files.
arXiv Detail & Related papers (2020-11-22T23:27:19Z) - Robust Finite Mixture Regression for Heterogeneous Targets [70.19798470463378]
We propose an FMR model that finds sample clusters and jointly models multiple incomplete mixed-type targets simultaneously.
We provide non-asymptotic oracle performance bounds for our model under a high-dimensional learning framework.
The results show that our model can achieve state-of-the-art performance.
arXiv Detail & Related papers (2020-10-12T03:27:07Z) - Two-Level Residual Distillation based Triple Network for Incremental
Object Detection [21.725878050355824]
We propose a novel incremental object detector based on Faster R-CNN to continuously learn from new object classes without using old data.
It is a triple network where an old model and a residual model as assistants for helping the incremental model learning on new classes without forgetting the previous learned knowledge.
arXiv Detail & Related papers (2020-07-27T11:04:57Z) - An Efficient Method of Training Small Models for Regression Problems
with Knowledge Distillation [1.433758865948252]
We propose a new formalism of knowledge distillation for regression problems.
First, we propose a new loss function, teacher outlier loss rejection, which rejects outliers in training samples using teacher model predictions.
By considering the multi-task network, training of the feature extraction of student models becomes more effective.
arXiv Detail & Related papers (2020-02-28T08:46:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.