Coarse-to-Fine Pre-training for Named Entity Recognition
- URL: http://arxiv.org/abs/2010.08210v1
- Date: Fri, 16 Oct 2020 07:39:20 GMT
- Title: Coarse-to-Fine Pre-training for Named Entity Recognition
- Authors: Mengge Xue, Bowen Yu, Zhenyu Zhang, Tingwen Liu, Yue Zhang, Bin Wang
- Abstract summary: We propose a NER-specific pre-training framework to in-ject coarse-to-fine automatically mined entityknowledge into pre-trained models.
Our framework achieves significant improvements against several pre-trained base-lines, establishing the new state-of-the-art per-formance on three benchmarks.
- Score: 26.00489191164784
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: More recently, Named Entity Recognition hasachieved great advances aided by
pre-trainingapproaches such as BERT. However, currentpre-training techniques
focus on building lan-guage modeling objectives to learn a gen-eral
representation, ignoring the named entity-related knowledge. To this end, we
proposea NER-specific pre-training framework to in-ject coarse-to-fine
automatically mined entityknowledge into pre-trained models. Specifi-cally, we
first warm-up the model via an en-tity span identification task by training it
withWikipedia anchors, which can be deemed asgeneral-typed entities. Then we
leverage thegazetteer-based distant supervision strategy totrain the model
extract coarse-grained typedentities. Finally, we devise a
self-supervisedauxiliary task to mine the fine-grained namedentity knowledge
via clustering.Empiricalstudies on three public NER datasets demon-strate that
our framework achieves significantimprovements against several pre-trained
base-lines, establishing the new state-of-the-art per-formance on three
benchmarks. Besides, weshow that our framework gains promising re-sults without
using human-labeled trainingdata, demonstrating its effectiveness in label-few
and low-resource scenarios
Related papers
- A Bayesian Unification of Self-Supervised Clustering and Energy-Based
Models [11.007541337967027]
We perform a Bayesian analysis of state-of-the-art self-supervised learning objectives.
We show that our objective function allows to outperform existing self-supervised learning strategies.
We also demonstrate that GEDI can be integrated into a neuro-symbolic framework.
arXiv Detail & Related papers (2023-12-30T04:46:16Z) - Retrieval-Enhanced Contrastive Vision-Text Models [61.783728119255365]
We propose to equip vision-text models with the ability to refine their embedding with cross-modal retrieved information from a memory at inference time.
Remarkably, we show that this can be done with a light-weight, single-layer, fusion transformer on top of a frozen CLIP.
Our experiments validate that our retrieval-enhanced contrastive (RECO) training improves CLIP performance substantially on several challenging fine-grained tasks.
arXiv Detail & Related papers (2023-06-12T15:52:02Z) - Self-Distillation for Further Pre-training of Transformers [83.84227016847096]
We propose self-distillation as a regularization for a further pre-training stage.
We empirically validate the efficacy of self-distillation on a variety of benchmark datasets for image and text classification tasks.
arXiv Detail & Related papers (2022-09-30T02:25:12Z) - A self-training framework for glaucoma grading in OCT B-scans [6.382852973055393]
We present a self-training-based framework for glaucoma grading using OCT B-scans under the presence of domain shift.
A two-step learning methodology resorts to pseudo-labels generated during the first step to augment the training dataset on the target domain.
We propose a novel glaucoma-specific backbone which introduces residual and attention modules via skip-connections to refine the embedding features of the latent space.
arXiv Detail & Related papers (2021-11-23T10:33:55Z) - Distantly-Supervised Named Entity Recognition with Noise-Robust Learning
and Language Model Augmented Self-Training [66.80558875393565]
We study the problem of training named entity recognition (NER) models using only distantly-labeled data.
We propose a noise-robust learning scheme comprised of a new loss function and a noisy label removal step.
Our method achieves superior performance, outperforming existing distantly-supervised NER models by significant margins.
arXiv Detail & Related papers (2021-09-10T17:19:56Z) - Few-Shot Named Entity Recognition: A Comprehensive Study [92.40991050806544]
We investigate three schemes to improve the model generalization ability for few-shot settings.
We perform empirical comparisons on 10 public NER datasets with various proportions of labeled data.
We create new state-of-the-art results on both few-shot and training-free settings.
arXiv Detail & Related papers (2020-12-29T23:43:16Z) - BOND: BERT-Assisted Open-Domain Named Entity Recognition with Distant
Supervision [49.42215511723874]
We propose a new computational framework -- BOND -- to improve the prediction performance of NER models.
Specifically, we propose a two-stage training algorithm: In the first stage, we adapt the pre-trained language model to the NER tasks using the distant labels.
In the second stage, we drop the distant labels, and propose a self-training approach to further improve the model performance.
arXiv Detail & Related papers (2020-06-28T04:55:39Z) - Improving Semantic Segmentation via Self-Training [75.07114899941095]
We show that we can obtain state-of-the-art results using a semi-supervised approach, specifically a self-training paradigm.
We first train a teacher model on labeled data, and then generate pseudo labels on a large set of unlabeled data.
Our robust training framework can digest human-annotated and pseudo labels jointly and achieve top performances on Cityscapes, CamVid and KITTI datasets.
arXiv Detail & Related papers (2020-04-30T17:09:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.