Language Model Pre-training on True Negatives
- URL: http://arxiv.org/abs/2212.00460v1
- Date: Thu, 1 Dec 2022 12:24:19 GMT
- Title: Language Model Pre-training on True Negatives
- Authors: Zhuosheng Zhang, Hai Zhao, Masao Utiyama, Eiichiro Sumita
- Abstract summary: Discriminative pre-trained language models (PLMs) learn to predict original texts from intentionally corrupted ones.
Existing PLMs simply treat all corrupted texts as equal negative without any examination.
We design enhanced pre-training methods to counteract false negative predictions and encourage pre-training language models on true negatives.
- Score: 109.73819321246062
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Discriminative pre-trained language models (PLMs) learn to predict original
texts from intentionally corrupted ones. Taking the former text as positive and
the latter as negative samples, the PLM can be trained effectively for
contextualized representation. However, the training of such a type of PLMs
highly relies on the quality of the automatically constructed samples. Existing
PLMs simply treat all corrupted texts as equal negative without any
examination, which actually lets the resulting model inevitably suffer from the
false negative issue where training is carried out on pseudo-negative data and
leads to less efficiency and less robustness in the resulting PLMs. In this
work, on the basis of defining the false negative issue in discriminative PLMs
that has been ignored for a long time, we design enhanced pre-training methods
to counteract false negative predictions and encourage pre-training language
models on true negatives by correcting the harmful gradient updates subject to
false negative predictions. Experimental results on GLUE and SQuAD benchmarks
show that our counter-false-negative pre-training methods indeed bring about
better performance together with stronger robustness.
Related papers
- Negative-Prompt-driven Alignment for Generative Language Model [34.191590966148816]
We propose NEgative-prompt-driven AlignmenT to guide language models away from undesirable behaviors.
NEAT explicitly penalizes the model for producing harmful outputs, guiding it not only toward desirable behaviors but also steering it away from generating undesirable, biased responses.
Extensive experiments validate NEAT's effectiveness in significantly enhancing language models' alignment with human values and preferences.
arXiv Detail & Related papers (2024-10-16T03:30:09Z) - Contrastive Learning with Negative Sampling Correction [52.990001829393506]
We propose a novel contrastive learning method named Positive-Unlabeled Contrastive Learning (PUCL)
PUCL treats the generated negative samples as unlabeled samples and uses information from positive samples to correct bias in contrastive loss.
PUCL can be applied to general contrastive learning problems and outperforms state-of-the-art methods on various image and graph classification tasks.
arXiv Detail & Related papers (2024-01-13T11:18:18Z) - Your Negative May not Be True Negative: Boosting Image-Text Matching
with False Negative Elimination [62.18768931714238]
We propose a novel False Negative Elimination (FNE) strategy to select negatives via sampling.
The results demonstrate the superiority of our proposed false negative elimination strategy.
arXiv Detail & Related papers (2023-08-08T16:31:43Z) - Making Pre-trained Language Models both Task-solvers and
Self-calibrators [52.98858650625623]
Pre-trained language models (PLMs) serve as backbones for various real-world systems.
Previous work shows that introducing an extra calibration task can mitigate this issue.
We propose a training algorithm LM-TOAST to tackle the challenges.
arXiv Detail & Related papers (2023-07-21T02:51:41Z) - Better Sampling of Negatives for Distantly Supervised Named Entity
Recognition [39.264878763160766]
We propose a simple and straightforward approach for selecting the top negative samples that have high similarities with all the positive samples for training.
Our method achieves consistent performance improvements on four distantly supervised NER datasets.
arXiv Detail & Related papers (2023-05-22T15:35:39Z) - Improving Contrastive Learning of Sentence Embeddings with
Case-Augmented Positives and Retrieved Negatives [17.90820242798732]
Unsupervised contrastive learning methods still lag far behind the supervised counterparts.
We propose switch-case augmentation to flip the case of the first letter of randomly selected words in a sentence.
For negative samples, we sample hard negatives from the whole dataset based on a pre-trained language model.
arXiv Detail & Related papers (2022-06-06T09:46:12Z) - NoiER: An Approach for Training more Reliable Fine-TunedDownstream Task
Models [54.184609286094044]
We propose noise entropy regularisation (NoiER) as an efficient learning paradigm that solves the problem without auxiliary models and additional data.
The proposed approach improved traditional OOD detection evaluation metrics by 55% on average compared to the original fine-tuned models.
arXiv Detail & Related papers (2021-08-29T06:58:28Z) - Contrastive Learning with Adversarial Perturbations for Conditional Text
Generation [49.055659008469284]
We propose a principled method to generate positive and negative samples for contrastive learning of seq2seq models.
Specifically, we generate negative examples by adding small perturbations to the input sequence to minimize its conditional likelihood.
We empirically show that our proposed method significantly improves the generalization of the seq2seq on three text generation tasks.
arXiv Detail & Related papers (2020-12-14T06:20:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.