Few-Shot Named Entity Recognition: A Comprehensive Study
- URL: http://arxiv.org/abs/2012.14978v1
- Date: Tue, 29 Dec 2020 23:43:16 GMT
- Title: Few-Shot Named Entity Recognition: A Comprehensive Study
- Authors: Jiaxin Huang, Chunyuan Li, Krishan Subudhi, Damien Jose, Shobana
Balakrishnan, Weizhu Chen, Baolin Peng, Jianfeng Gao, Jiawei Han
- Abstract summary: We investigate three schemes to improve the model generalization ability for few-shot settings.
We perform empirical comparisons on 10 public NER datasets with various proportions of labeled data.
We create new state-of-the-art results on both few-shot and training-free settings.
- Score: 92.40991050806544
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper presents a comprehensive study to efficiently build named entity
recognition (NER) systems when a small number of in-domain labeled data is
available. Based upon recent Transformer-based self-supervised pre-trained
language models (PLMs), we investigate three orthogonal schemes to improve the
model generalization ability for few-shot settings: (1) meta-learning to
construct prototypes for different entity types, (2) supervised pre-training on
noisy web data to extract entity-related generic representations and (3)
self-training to leverage unlabeled in-domain data. Different combinations of
these schemes are also considered. We perform extensive empirical comparisons
on 10 public NER datasets with various proportions of labeled data, suggesting
useful insights for future research. Our experiments show that (i) in the
few-shot learning setting, the proposed NER schemes significantly improve or
outperform the commonly used baseline, a PLM-based linear classifier fine-tuned
on domain labels; (ii) We create new state-of-the-art results on both few-shot
and training-free settings compared with existing methods. We will release our
code and pre-trained models for reproducible research.
Related papers
- VANER: Leveraging Large Language Model for Versatile and Adaptive Biomedical Named Entity Recognition [3.4923338594757674]
Large language models (LLMs) can be used to train a model capable of extracting various types of entities.
In this paper, we utilize the open-sourced LLM LLaMA2 as the backbone model, and design specific instructions to distinguish between different types of entities and datasets.
Our model VANER, trained with a small partition of parameters, significantly outperforms previous LLMs-based models and, for the first time, as a model based on LLM, surpasses the majority of conventional state-of-the-art BioNER systems.
arXiv Detail & Related papers (2024-04-27T09:00:39Z) - Hybrid Multi-stage Decoding for Few-shot NER with Entity-aware Contrastive Learning [32.62763647036567]
Few-shot named entity recognition can identify new types of named entities based on a few labeled examples.
We propose the Hybrid Multi-stage Decoding for Few-shot NER with Entity-aware Contrastive Learning (MsFNER)
MsFNER splits the general NER into two stages: entity-span detection and entity classification.
arXiv Detail & Related papers (2024-04-10T12:31:09Z) - Self-Evolved Diverse Data Sampling for Efficient Instruction Tuning [47.02160072880698]
We introduce a self-evolving mechanism that allows the model itself to actively sample subsets that are equally or even more effective.
The key to our data sampling technique lies in the enhancement of diversity in the chosen subsets.
Extensive experiments across three datasets and benchmarks demonstrate the effectiveness of DiverseEvol.
arXiv Detail & Related papers (2023-11-14T14:10:40Z) - Pre-trained Language Models for Keyphrase Generation: A Thorough
Empirical Study [76.52997424694767]
We present an in-depth empirical study of keyphrase extraction and keyphrase generation using pre-trained language models.
We show that PLMs have competitive high-resource performance and state-of-the-art low-resource performance.
Further results show that in-domain BERT-like PLMs can be used to build strong and data-efficient keyphrase generation models.
arXiv Detail & Related papers (2022-12-20T13:20:21Z) - Attentive Prototypes for Source-free Unsupervised Domain Adaptive 3D
Object Detection [85.11649974840758]
3D object detection networks tend to be biased towards the data they are trained on.
We propose a single-frame approach for source-free, unsupervised domain adaptation of lidar-based 3D object detectors.
arXiv Detail & Related papers (2021-11-30T18:42:42Z) - Guiding Generative Language Models for Data Augmentation in Few-Shot
Text Classification [59.698811329287174]
We leverage GPT-2 for generating artificial training instances in order to improve classification performance.
Our results show that fine-tuning GPT-2 in a handful of label instances leads to consistent classification improvements.
arXiv Detail & Related papers (2021-11-17T12:10:03Z) - Unsupervised Domain Adaptive Learning via Synthetic Data for Person
Re-identification [101.1886788396803]
Person re-identification (re-ID) has gained more and more attention due to its widespread applications in video surveillance.
Unfortunately, the mainstream deep learning methods still need a large quantity of labeled data to train models.
In this paper, we develop a data collector to automatically generate synthetic re-ID samples in a computer game, and construct a data labeler to simultaneously annotate them.
arXiv Detail & Related papers (2021-09-12T15:51:41Z) - No Fear of Heterogeneity: Classifier Calibration for Federated Learning
with Non-IID Data [78.69828864672978]
A central challenge in training classification models in the real-world federated system is learning with non-IID data.
We propose a novel and simple algorithm called Virtual Representations (CCVR), which adjusts the classifier using virtual representations sampled from an approximated ssian mixture model.
Experimental results demonstrate that CCVR state-of-the-art performance on popular federated learning benchmarks including CIFAR-10, CIFAR-100, and CINIC-10.
arXiv Detail & Related papers (2021-06-09T12:02:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.