Memorization vs. Generalization: Quantifying Data Leakage in NLP
Performance Evaluation
- URL: http://arxiv.org/abs/2102.01818v1
- Date: Wed, 3 Feb 2021 00:58:45 GMT
- Title: Memorization vs. Generalization: Quantifying Data Leakage in NLP
Performance Evaluation
- Authors: Aparna Elangovan, Jiayuan He, Karin Verspoor
- Abstract summary: Public datasets are often used to evaluate the efficacy and generalizability of state-of-the-art methods for many tasks in natural language processing (NLP)
The presence of overlap between the train and test datasets can lead to inflated results, inadvertently evaluating the model's ability to memorize and interpreting it as the ability to generalize.
We identify leakage of training data into test data on several publicly available datasets used to evaluate NLP tasks, including named entity recognition and relation extraction, and study them to assess the impact of that leakage on the model's ability to memorize versus generalize.
- Score: 4.98030422694461
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Public datasets are often used to evaluate the efficacy and generalizability
of state-of-the-art methods for many tasks in natural language processing
(NLP). However, the presence of overlap between the train and test datasets can
lead to inflated results, inadvertently evaluating the model's ability to
memorize and interpreting it as the ability to generalize. In addition, such
data sets may not provide an effective indicator of the performance of these
methods in real world scenarios. We identify leakage of training data into test
data on several publicly available datasets used to evaluate NLP tasks,
including named entity recognition and relation extraction, and study them to
assess the impact of that leakage on the model's ability to memorize versus
generalize.
Related papers
- Does Data Contamination Detection Work (Well) for LLMs? A Survey and Evaluation on Detection Assumptions [20.51842378080194]
Large language models (LLMs) have demonstrated great performance across various benchmarks, showing potential as general-purpose task solvers.
A significant concern in their evaluation is data contamination, where overlap between training data and evaluation datasets inflates performance assessments.
We systematically review 47 papers on data contamination detection, categorize the underlying assumptions, and assess whether they have been rigorously validated.
arXiv Detail & Related papers (2024-10-24T17:58:22Z) - How Hard is this Test Set? NLI Characterization by Exploiting Training Dynamics [49.9329723199239]
We propose a method for the automated creation of a challenging test set without relying on the manual construction of artificial and unrealistic examples.
We categorize the test set of popular NLI datasets into three difficulty levels by leveraging methods that exploit training dynamics.
When our characterization method is applied to the training set, models trained with only a fraction of the data achieve comparable performance to those trained on the full dataset.
arXiv Detail & Related papers (2024-10-04T13:39:21Z) - Assessing Privacy Risks in Language Models: A Case Study on
Summarization Tasks [65.21536453075275]
We focus on the summarization task and investigate the membership inference (MI) attack.
We exploit text similarity and the model's resistance to document modifications as potential MI signals.
We discuss several safeguards for training summarization models to protect against MI attacks and discuss the inherent trade-off between privacy and utility.
arXiv Detail & Related papers (2023-10-20T05:44:39Z) - On the Universal Adversarial Perturbations for Efficient Data-free
Adversarial Detection [55.73320979733527]
We propose a data-agnostic adversarial detection framework, which induces different responses between normal and adversarial samples to UAPs.
Experimental results show that our method achieves competitive detection performance on various text classification tasks.
arXiv Detail & Related papers (2023-06-27T02:54:07Z) - ALP: Action-Aware Embodied Learning for Perception [60.64801970249279]
We introduce Action-Aware Embodied Learning for Perception (ALP)
ALP incorporates action information into representation learning through a combination of optimizing a reinforcement learning policy and an inverse dynamics prediction objective.
We show that ALP outperforms existing baselines in several downstream perception tasks.
arXiv Detail & Related papers (2023-06-16T21:51:04Z) - Data Valuation Without Training of a Model [8.89493507314525]
We propose a training-free data valuation score, called complexity-gap score, to quantify the influence of individual instances in generalization of neural networks.
The proposed score can quantify irregularity of the instances and measure how much each data instance contributes in the total movement of the network parameters during training.
arXiv Detail & Related papers (2023-01-03T02:19:20Z) - Cluster-level pseudo-labelling for source-free cross-domain facial
expression recognition [94.56304526014875]
We propose the first Source-Free Unsupervised Domain Adaptation (SFUDA) method for Facial Expression Recognition (FER)
Our method exploits self-supervised pretraining to learn good feature representations from the target data.
We validate the effectiveness of our method in four adaptation setups, proving that it consistently outperforms existing SFUDA methods when applied to FER.
arXiv Detail & Related papers (2022-10-11T08:24:50Z) - On Generalisability of Machine Learning-based Network Intrusion
Detection Systems [0.0]
In this paper, we evaluate seven supervised and unsupervised learning models on four benchmark NIDS datasets.
Our investigation indicates that none of the considered models is able to generalise over all studied datasets.
Our investigation also indicates that overall, unsupervised learning methods generalise better than supervised learning models in our considered scenarios.
arXiv Detail & Related papers (2022-05-09T08:26:48Z) - Combining Feature and Instance Attribution to Detect Artifacts [62.63504976810927]
We propose methods to facilitate identification of training data artifacts.
We show that this proposed training-feature attribution approach can be used to uncover artifacts in training data.
We execute a small user study to evaluate whether these methods are useful to NLP researchers in practice.
arXiv Detail & Related papers (2021-07-01T09:26:13Z) - Regularizing Models via Pointwise Mutual Information for Named Entity
Recognition [17.767466724342064]
We propose a Pointwise Mutual Information (PMI) to enhance generalization ability while outperforming an in-domain performance.
Our approach enables to debias highly correlated word and labels in the benchmark datasets.
For long-named and complex-structure entities, our method can predict these entities through debiasing on conjunction or special characters.
arXiv Detail & Related papers (2021-04-15T05:47:27Z) - Learning Unbiased Representations via Mutual Information Backpropagation [36.383338079229695]
In particular, we face the case where some attributes (bias) of the data, if learned by the model, can severely compromise its generalization properties.
We propose a novel end-to-end optimization strategy, which simultaneously estimates and minimizes the mutual information between the learned representation and the data attributes.
arXiv Detail & Related papers (2020-03-13T18:06:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.