Related papers: From Consensus to Disagreement: Multi-Teacher Distillation for Semi-Supervised Relation Extraction

From Consensus to Disagreement: Multi-Teacher Distillation for Semi-Supervised Relation Extraction

URL: http://arxiv.org/abs/2112.01048v1
Date: Thu, 2 Dec 2021 08:20:23 GMT
Title: From Consensus to Disagreement: Multi-Teacher Distillation for Semi-Supervised Relation Extraction
Authors: Wanli Li and Tieyun Qian
Abstract summary: Semi-supervised relation extraction (SSRE) has been proven to be a promising way for this problem through annotating unlabeled samples as additional training data. However, the difference set, which contains rich information about unlabeled data, has been long neglected by prior studies. We develop a simple and general multi-teacher distillation framework, which can be easily integrated into any existing SSRE methods.
Score: 10.513626483108126
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Lack of labeled data is a main obstacle in relation extraction. Semi-supervised relation extraction (SSRE) has been proven to be a promising way for this problem through annotating unlabeled samples as additional training data. Almost all prior researches along this line adopt multiple models to make the annotations more reliable by taking the intersection set of predicted results from these models. However, the difference set, which contains rich information about unlabeled data, has been long neglected by prior studies. In this paper, we propose to learn not only from the consensus but also the disagreement among different models in SSRE. To this end, we develop a simple and general multi-teacher distillation (MTD) framework, which can be easily integrated into any existing SSRE methods. Specifically, we first let the teachers correspond to the multiple models and select the samples in the intersection set of the last iteration in SSRE methods to augment labeled data as usual. We then transfer the class distributions for samples in the difference set as soft labels to guide the student. We finally perform prediction using the trained student model. Experimental results on two public datasets demonstrate that our framework significantly promotes the performance of the base SSRE methods with pretty low computational cost.

Related papers

Leveraging Text-to-Image Generation for Handling Spurious Correlation [24.940576844328408]
Deep neural networks trained with Empirical Risk Minimization (ERM) perform well when both training and test data come from the same domain. ERM models may rely on spurious correlations that often exist between labels and irrelevant features of images, making predictions unreliable when those features do not exist. We propose a technique to generate training samples with text-to-image (T2I) diffusion models for addressing the spurious correlation problem.
arXiv Detail & Related papers (2025-03-21T15:28:22Z)
Class Balance Matters to Active Class-Incremental Learning [61.11786214164405]
We aim to start from a pool of large-scale unlabeled data and then annotate the most informative samples for incremental learning. We propose Class-Balanced Selection (CBS) strategy to achieve both class balance and informativeness in chosen samples. Our CBS can be plugged and played into those CIL methods which are based on pretrained models with prompts tunning technique.
arXiv Detail & Related papers (2024-12-09T16:37:27Z)
Data curation via joint example selection further accelerates multimodal learning [3.329535792151987]
We show that jointly selecting batches of data is more effective for learning than selecting examples independently. We derive a simple and tractable algorithm for selecting such batches, which significantly accelerate training beyond individually-prioritized data points.
arXiv Detail & Related papers (2024-06-25T16:52:37Z)
Exploring the Boundaries of Semi-Supervised Facial Expression Recognition using In-Distribution, Out-of-Distribution, and Unconstrained Data [23.4909421082857]
We present a study on 11 of the most recent semi-supervised methods, in the context of facial expression recognition (FER) Our investigation covers semi-supervised learning from in-distribution, out-of-distribution, unconstrained, and very small unlabelled data. With an equal number of labelled samples, semi-supervised learning delivers a considerable improvement over supervised learning.
arXiv Detail & Related papers (2023-06-02T01:40:08Z)
Self-Evolution Learning for Mixup: Enhance Data Augmentation on Few-Shot Text Classification Tasks [75.42002070547267]
We propose a self evolution learning (SE) based mixup approach for data augmentation in text classification. We introduce a novel instance specific label smoothing approach, which linearly interpolates the model's output and one hot labels of the original samples to generate new soft for label mixing up.
arXiv Detail & Related papers (2023-05-22T23:43:23Z)
Split-PU: Hardness-aware Training Strategy for Positive-Unlabeled Learning [42.26185670834855]
Positive-Unlabeled (PU) learning aims to learn a model with rare positive samples and abundant unlabeled samples. This paper focuses on improving the commonly-used nnPU with a novel training pipeline.
arXiv Detail & Related papers (2022-11-30T05:48:31Z)
Intra-class Adaptive Augmentation with Neighbor Correction for Deep Metric Learning [99.14132861655223]
We propose a novel intra-class adaptive augmentation (IAA) framework for deep metric learning. We reasonably estimate intra-class variations for every class and generate adaptive synthetic samples to support hard samples mining. Our method significantly improves and outperforms the state-of-the-art methods on retrieval performances by 3%-6%.
arXiv Detail & Related papers (2022-11-29T14:52:38Z)
Label-Noise Learning with Intrinsically Long-Tailed Data [65.41318436799993]
We propose a learning framework for label-noise learning with intrinsically long-tailed data. Specifically, we propose two-stage bi-dimensional sample selection (TABASCO) to better separate clean samples from noisy samples.
arXiv Detail & Related papers (2022-08-21T07:47:05Z)
Saliency Grafting: Innocuous Attribution-Guided Mixup with Calibrated Label Mixing [104.630875328668]
Mixup scheme suggests mixing a pair of samples to create an augmented training sample. We present a novel, yet simple Mixup-variant that captures the best of both worlds.
arXiv Detail & Related papers (2021-12-16T11:27:48Z)
Few-shot Learning via Dependency Maximization and Instance Discriminant Analysis [21.8311401851523]
We study the few-shot learning problem, where a model learns to recognize new objects with extremely few labeled data per category. We propose a simple approach to exploit unlabeled data accompanying the few-shot task for improving few-shot performance.
arXiv Detail & Related papers (2021-09-07T02:19:01Z)
Training image classifiers using Semi-Weak Label Data [26.04162590798731]
In Multiple Instance learning (MIL), weak labels are provided at the bag level with only presence/absence information known. This paper introduces a novel semi-weak label learning paradigm as a middle ground to mitigate the problem. We propose a two-stage framework to address the problem of learning from semi-weak labels.
arXiv Detail & Related papers (2021-03-19T03:06:07Z)
Deep Semi-supervised Knowledge Distillation for Overlapping Cervical Cell Instance Segmentation [54.49894381464853]
We propose to leverage both labeled and unlabeled data for instance segmentation with improved accuracy by knowledge distillation. We propose a novel Mask-guided Mean Teacher framework with Perturbation-sensitive Sample Mining. Experiments show that the proposed method improves the performance significantly compared with the supervised method learned from labeled data only.
arXiv Detail & Related papers (2020-07-21T13:27:09Z)
ATSO: Asynchronous Teacher-Student Optimization for Semi-Supervised Medical Image Segmentation [99.90263375737362]
We propose ATSO, an asynchronous version of teacher-student optimization. ATSO partitions the unlabeled data into two subsets and alternately uses one subset to fine-tune the model and updates the label on the other subset. We evaluate ATSO on two popular medical image segmentation datasets and show its superior performance in various semi-supervised settings.
arXiv Detail & Related papers (2020-06-24T04:05:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.