A Sample-Based Training Method for Distantly Supervised Relation
Extraction with Pre-Trained Transformers
- URL: http://arxiv.org/abs/2104.07512v1
- Date: Thu, 15 Apr 2021 15:09:34 GMT
- Title: A Sample-Based Training Method for Distantly Supervised Relation
Extraction with Pre-Trained Transformers
- Authors: Mehrdad Nasser, Mohamad Bagher Sajadi, Behrouz Minaei-Bidgoli
- Abstract summary: We propose a novel sampling method for DSRE that relaxes hardware requirements.
In the proposed method, we limit the number of sentences in a batch by randomly sampling sentences from the bags in the batch.
To alleviate the issues caused by random sampling, we use an ensemble of trained models for prediction.
- Score: 4.726777092009553
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multiple instance learning (MIL) has become the standard learning paradigm
for distantly supervised relation extraction (DSRE). However, due to relation
extraction being performed at bag level, MIL has significant hardware
requirements for training when coupled with large sentence encoders such as
deep transformer neural networks. In this paper, we propose a novel sampling
method for DSRE that relaxes these hardware requirements. In the proposed
method, we limit the number of sentences in a batch by randomly sampling
sentences from the bags in the batch. However, this comes at the cost of losing
valid sentences from bags. To alleviate the issues caused by random sampling,
we use an ensemble of trained models for prediction. We demonstrate the
effectiveness of our approach by using our proposed learning setting to
fine-tuning BERT on the widely NYT dataset. Our approach significantly
outperforms previous state-of-the-art methods in terms of AUC and P@N metrics.
Related papers
- A Bayesian Approach to Data Point Selection [24.98069363998565]
Data point selection (DPS) is becoming a critical topic in deep learning.
Existing approaches to DPS are predominantly based on a bi-level optimisation (BLO) formulation.
We propose a novel Bayesian approach to DPS.
arXiv Detail & Related papers (2024-11-06T09:04:13Z) - Which Pretrain Samples to Rehearse when Finetuning Pretrained Models? [60.59376487151964]
Fine-tuning pretrained models on specific tasks is now the de facto approach for text and vision tasks.
A known pitfall of this approach is the forgetting of pretraining knowledge that happens during finetuning.
We propose a novel sampling scheme, mix-cd, that identifies and prioritizes samples that actually face forgetting.
arXiv Detail & Related papers (2024-02-12T22:32:12Z) - A Data Cartography based MixUp for Pre-trained Language Models [47.90235939359225]
MixUp is a data augmentation strategy where additional samples are generated during training by combining random pairs of training samples and their labels.
We propose TDMixUp, a novel MixUp strategy that leverages Training Dynamics and allows more informative samples to be combined for generating new data samples.
We empirically validate that our method not only achieves competitive performance using a smaller subset of the training data compared with strong baselines, but also yields lower expected calibration error on the pre-trained language model, BERT, on both in-domain and out-of-domain settings in a wide range of NLP tasks.
arXiv Detail & Related papers (2022-05-06T17:59:19Z) - BatchFormer: Learning to Explore Sample Relationships for Robust
Representation Learning [93.38239238988719]
We propose to enable deep neural networks with the ability to learn the sample relationships from each mini-batch.
BatchFormer is applied into the batch dimension of each mini-batch to implicitly explore sample relationships during training.
We perform extensive experiments on over ten datasets and the proposed method achieves significant improvements on different data scarcity applications.
arXiv Detail & Related papers (2022-03-03T05:31:33Z) - Batch Active Learning at Scale [39.26441165274027]
Batch active learning, which adaptively issues batched queries to a labeling oracle, is a common approach for addressing this problem.
In this work, we analyze an efficient active learning algorithm, which focuses on the large batch setting.
We show that our sampling method, which combines notions of uncertainty and diversity, easily scales to batch sizes (100K-1M) several orders of magnitude larger than used in previous studies.
arXiv Detail & Related papers (2021-07-29T18:14:05Z) - CIL: Contrastive Instance Learning Framework for Distantly Supervised
Relation Extraction [52.94486705393062]
We go beyond typical multi-instance learning (MIL) framework and propose a novel contrastive instance learning (CIL) framework.
Specifically, we regard the initial MIL as the relational triple encoder and constraint positive pairs against negative pairs for each instance.
Experiments demonstrate the effectiveness of our proposed framework, with significant improvements over the previous methods on NYT10, GDS and KBP.
arXiv Detail & Related papers (2021-06-21T04:51:59Z) - Jo-SRC: A Contrastive Approach for Combating Noisy Labels [58.867237220886885]
We propose a noise-robust approach named Jo-SRC (Joint Sample Selection and Model Regularization based on Consistency)
Specifically, we train the network in a contrastive learning manner. Predictions from two different views of each sample are used to estimate its "likelihood" of being clean or out-of-distribution.
arXiv Detail & Related papers (2021-03-24T07:26:07Z) - Multi-Scale Positive Sample Refinement for Few-Shot Object Detection [61.60255654558682]
Few-shot object detection (FSOD) helps detectors adapt to unseen classes with few training instances.
We propose a Multi-scale Positive Sample Refinement (MPSR) approach to enrich object scales in FSOD.
MPSR generates multi-scale positive samples as object pyramids and refines the prediction at various scales.
arXiv Detail & Related papers (2020-07-18T09:48:29Z) - Pre-training Is (Almost) All You Need: An Application to Commonsense
Reasoning [61.32992639292889]
Fine-tuning of pre-trained transformer models has become the standard approach for solving common NLP tasks.
We introduce a new scoring method that casts a plausibility ranking task in a full-text format.
We show that our method provides a much more stable training phase across random restarts.
arXiv Detail & Related papers (2020-04-29T10:54:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.