GAPX: Generalized Autoregressive Paraphrase-Identification X
- URL: http://arxiv.org/abs/2210.01979v1
- Date: Wed, 5 Oct 2022 01:23:52 GMT
- Title: GAPX: Generalized Autoregressive Paraphrase-Identification X
- Authors: Yifei Zhou, Renyu Li, Hayden Housen, Ser-Nam Lim
- Abstract summary: A major source of this performance drop comes from biases introduced by negative examples.
We introduce a perplexity based out-of-distribution metric that we show can effectively and automatically determine how much weight it should be given during inference.
- Score: 24.331570697458954
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Paraphrase Identification is a fundamental task in Natural Language
Processing. While much progress has been made in the field, the performance of
many state-of-the-art models often suffer from distribution shift during
inference time. We verify that a major source of this performance drop comes
from biases introduced by negative examples. To overcome these biases, we
propose in this paper to train two separate models, one that only utilizes the
positive pairs and the other the negative pairs. This enables us the option of
deciding how much to utilize the negative model, for which we introduce a
perplexity based out-of-distribution metric that we show can effectively and
automatically determine how much weight it should be given during inference. We
support our findings with strong empirical results.
Related papers
- Forcing Diffuse Distributions out of Language Models [70.28345569190388]
Despite being trained specifically to follow user instructions, today's instructiontuned language models perform poorly when instructed to produce random outputs.
We propose a fine-tuning method that encourages language models to output distributions that are diffuse over valid outcomes.
arXiv Detail & Related papers (2024-04-16T19:17:23Z) - Reducing the Vision and Language Bias for Temporal Sentence Grounding [22.571577672704716]
We propose a Debiasing-TSG (D-TSG) model to filter and remove the negative biases in both vision and language modalities.
We demonstrate its effectiveness by achieving the state-of-the-art performance on three benchmark datasets.
arXiv Detail & Related papers (2022-07-27T11:18:45Z) - A Generative Language Model for Few-shot Aspect-Based Sentiment Analysis [90.24921443175514]
We focus on aspect-based sentiment analysis, which involves extracting aspect term, category, and predicting their corresponding polarities.
We propose to reformulate the extraction and prediction tasks into the sequence generation task, using a generative language model with unidirectional attention.
Our approach outperforms the previous state-of-the-art (based on BERT) on average performance by a large margins in few-shot and full-shot settings.
arXiv Detail & Related papers (2022-04-11T18:31:53Z) - On Modality Bias Recognition and Reduction [70.69194431713825]
We study the modality bias problem in the context of multi-modal classification.
We propose a plug-and-play loss function method, whereby the feature space for each label is adaptively learned.
Our method yields remarkable performance improvements compared with the baselines.
arXiv Detail & Related papers (2022-02-25T13:47:09Z) - Right for the Right Latent Factors: Debiasing Generative Models via
Disentanglement [20.41752850243945]
Key assumption of most statistical machine learning methods is that they have access to independent samples from the distribution of data they encounter at test time.
In particular, machine learning models have been shown to exhibit Clever-Hans-like behaviour, meaning that spurious correlations in the training set are inadvertently learnt.
We propose to debias generative models by disentangling their internal representations, which is achieved via human feedback.
arXiv Detail & Related papers (2022-02-01T13:16:18Z) - Sampling from Arbitrary Functions via PSD Models [55.41644538483948]
We take a two-step approach by first modeling the probability distribution and then sampling from that model.
We show that these models can approximate a large class of densities concisely using few evaluations, and present a simple algorithm to effectively sample from these models.
arXiv Detail & Related papers (2021-10-20T12:25:22Z) - Mitigating Catastrophic Forgetting in Scheduled Sampling with Elastic
Weight Consolidation in Neural Machine Translation [15.581515781839656]
Autoregressive models trained with maximum likelihood estimation suffer from exposure bias.
We propose using Elastic Weight Consolidation as trade-off between mitigating exposure bias and retaining output quality.
Experiments on two IWSLT'14 translation tasks demonstrate that our approach alleviates catastrophic forgetting and significantly improves BLEU.
arXiv Detail & Related papers (2021-09-13T20:37:58Z) - A Generative Approach for Mitigating Structural Biases in Natural
Language Inference [24.44419010439227]
In this work, we reformulate the NLI task as a generative task, where a model is conditioned on the biased subset of the input and the label.
We show that this approach is highly robust to large amounts of bias.
We find that generative models are difficult to train and they generally perform worse than discriminative baselines.
arXiv Detail & Related papers (2021-08-31T17:59:45Z) - Understanding Hard Negatives in Noise Contrastive Estimation [21.602701327267905]
We develop analytical tools to understand the role of hard negatives.
We derive a general form of the score function that unifies various architectures used in text retrieval.
arXiv Detail & Related papers (2021-04-13T14:42:41Z) - When Hearst Is not Enough: Improving Hypernymy Detection from Corpus
with Distributional Models [59.46552488974247]
This paper addresses whether an is-a relationship exists between words (x, y) with the help of large textual corpora.
Recent studies suggest that pattern-based ones are superior, if large-scale Hearst pairs are extracted and fed, with the sparsity of unseen (x, y) pairs relieved.
For the first time, this paper quantifies the non-negligible existence of those specific cases. We also demonstrate that distributional methods are ideal to make up for pattern-based ones in such cases.
arXiv Detail & Related papers (2020-10-10T08:34:19Z) - Decision-Making with Auto-Encoding Variational Bayes [71.44735417472043]
We show that a posterior approximation distinct from the variational distribution should be used for making decisions.
Motivated by these theoretical results, we propose learning several approximate proposals for the best model.
In addition to toy examples, we present a full-fledged case study of single-cell RNA sequencing.
arXiv Detail & Related papers (2020-02-17T19:23:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.