Distantly-Supervised Named Entity Recognition with Adaptive Teacher
Learning and Fine-grained Student Ensemble
- URL: http://arxiv.org/abs/2212.06522v1
- Date: Tue, 13 Dec 2022 12:14:09 GMT
- Title: Distantly-Supervised Named Entity Recognition with Adaptive Teacher
Learning and Fine-grained Student Ensemble
- Authors: Xiaoye Qu, Jun Zeng, Daizong Liu, Zhefeng Wang, Baoxing Huai, Pan Zhou
- Abstract summary: Self-training teacher-student frameworks are proposed to improve the robustness of NER models.
In this paper, we propose an adaptive teacher learning comprised of two teacher-student networks.
Fine-grained student ensemble updates each fragment of the teacher model with a temporal moving average of the corresponding fragment of the student, which enhances consistent predictions on each model fragment against noise.
- Score: 56.705249154629264
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Distantly-Supervised Named Entity Recognition (DS-NER) effectively alleviates
the data scarcity problem in NER by automatically generating training samples.
Unfortunately, the distant supervision may induce noisy labels, thus
undermining the robustness of the learned models and restricting the practical
application. To relieve this problem, recent works adopt self-training
teacher-student frameworks to gradually refine the training labels and improve
the generalization ability of NER models. However, we argue that the
performance of the current self-training frameworks for DS-NER is severely
underestimated by their plain designs, including both inadequate student
learning and coarse-grained teacher updating. Therefore, in this paper, we make
the first attempt to alleviate these issues by proposing: (1) adaptive teacher
learning comprised of joint training of two teacher-student networks and
considering both consistent and inconsistent predictions between two teachers,
thus promoting comprehensive student learning. (2) fine-grained student
ensemble that updates each fragment of the teacher model with a temporal moving
average of the corresponding fragment of the student, which enhances consistent
predictions on each model fragment against noise. To verify the effectiveness
of our proposed method, we conduct experiments on four DS-NER datasets. The
experimental results demonstrate that our method significantly surpasses
previous SOTA methods.
Related papers
- Speculative Knowledge Distillation: Bridging the Teacher-Student Gap Through Interleaved Sampling [81.00825302340984]
We introduce Speculative Knowledge Distillation (SKD) to generate high-quality training data on-the-fly.
In SKD, the student proposes tokens, and the teacher replaces poorly ranked ones based on its own distribution.
We evaluate SKD on various text generation tasks, including translation, summarization, math, and instruction following.
arXiv Detail & Related papers (2024-10-15T06:51:25Z) - Teaching-Assistant-in-the-Loop: Improving Knowledge Distillation from Imperfect Teacher Models in Low-Budget Scenarios [3.818273633647809]
We propose a three-component framework leveraging three signal types.
The first signal is the student's self-consistency (consistency of student multiple outputs), which is a proxy of the student's confidence.
We show that our proposed two-stage framework brings a relative improvement of up to 20.79% compared to fine-tuning without any signals across datasets.
arXiv Detail & Related papers (2024-06-08T02:17:43Z) - Periodically Exchange Teacher-Student for Source-Free Object Detection [7.222926042027062]
Source-free object detection (SFOD) aims to adapt the source detector to unlabeled target domain data in the absence of source domain data.
Most SFOD methods follow the same self-training paradigm using mean-teacher (MT) framework where the student model is guided by only one single teacher model.
We propose the Periodically Exchange Teacher-Student (PETS) method, a simple yet novel approach that introduces a multiple-teacher framework consisting of a static teacher, a dynamic teacher, and a student model.
arXiv Detail & Related papers (2023-11-23T11:30:54Z) - Improving the Robustness of Distantly-Supervised Named Entity Recognition via Uncertainty-Aware Teacher Learning and Student-Student Collaborative Learning [24.733773208117363]
We propose Uncertainty-Aware Teacher Learning to reduce the number of incorrect pseudo labels in the self-training stage.
We also propose Student-Student Collaborative Learning that allows the transfer of reliable labels between two student networks.
We evaluate our proposed method on five DS-NER datasets, demonstrating that our method is superior to the state-of-the-art DS-NER methods.
arXiv Detail & Related papers (2023-11-14T09:09:58Z) - Competitive Ensembling Teacher-Student Framework for Semi-Supervised
Left Atrium MRI Segmentation [8.338801567668233]
Semi-supervised learning has greatly advanced medical image segmentation since it effectively alleviates the need of acquiring abundant annotations from experts.
In this paper, we present a simple yet efficient competitive ensembling teacher student framework for semi-supervised for left atrium segmentation from 3D MR images.
arXiv Detail & Related papers (2023-10-21T09:23:34Z) - Adversarial Dual-Student with Differentiable Spatial Warping for
Semi-Supervised Semantic Segmentation [70.2166826794421]
We propose a differentiable geometric warping to conduct unsupervised data augmentation.
We also propose a novel adversarial dual-student framework to improve the Mean-Teacher.
Our solution significantly improves the performance and state-of-the-art results are achieved on both datasets.
arXiv Detail & Related papers (2022-03-05T17:36:17Z) - Distantly-Supervised Named Entity Recognition with Noise-Robust Learning
and Language Model Augmented Self-Training [66.80558875393565]
We study the problem of training named entity recognition (NER) models using only distantly-labeled data.
We propose a noise-robust learning scheme comprised of a new loss function and a noisy label removal step.
Our method achieves superior performance, outperforming existing distantly-supervised NER models by significant margins.
arXiv Detail & Related papers (2021-09-10T17:19:56Z) - Noisy Self-Knowledge Distillation for Text Summarization [83.49809205891496]
We apply self-knowledge distillation to text summarization which we argue can alleviate problems with maximum-likelihood training.
Our student summarization model is trained with guidance from a teacher which generates smoothed labels to help regularize training.
We demonstrate experimentally on three benchmarks that our framework boosts the performance of both pretrained and non-pretrained summarizers.
arXiv Detail & Related papers (2020-09-15T12:53:09Z) - Temporal Self-Ensembling Teacher for Semi-Supervised Object Detection [9.64328205496046]
This paper focuses on Semi-supervisedd Object Detection (SSOD)
The teacher model serves a dual role as a teacher and a student.
The class imbalance issue in SSOD hinders an efficient knowledge transfer from teacher to student.
arXiv Detail & Related papers (2020-07-13T01:17:25Z) - Learning to Reweight with Deep Interactions [104.68509759134878]
We propose an improved data reweighting algorithm, in which the student model provides its internal states to the teacher model.
Experiments on image classification with clean/noisy labels and neural machine translation empirically demonstrate that our algorithm makes significant improvement over previous methods.
arXiv Detail & Related papers (2020-07-09T09:06:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.