Training on the Test Model: Contamination in Ranking Distillation
- URL: http://arxiv.org/abs/2411.02284v1
- Date: Mon, 04 Nov 2024 17:11:14 GMT
- Title: Training on the Test Model: Contamination in Ranking Distillation
- Authors: Vishakha Suresh Kalal, Andrew Parry, Sean MacAvaney,
- Abstract summary: We investigate the effect of a contaminated teacher model in a distillation setting.
We find that contamination occurs even when the test data represents a small fraction of the teacher's training samples.
- Score: 14.753216172912968
- License:
- Abstract: Neural approaches to ranking based on pre-trained language models are highly effective in ad-hoc search. However, the computational expense of these models can limit their application. As such, a process known as knowledge distillation is frequently applied to allow a smaller, efficient model to learn from an effective but expensive model. A key example of this is the distillation of expensive API-based commercial Large Language Models into smaller production-ready models. However, due to the opacity of training data and processes of most commercial models, one cannot ensure that a chosen test collection has not been observed previously, creating the potential for inadvertent data contamination. We, therefore, investigate the effect of a contaminated teacher model in a distillation setting. We evaluate several distillation techniques to assess the degree to which contamination occurs during distillation. By simulating a ``worst-case'' setting where the degree of contamination is known, we find that contamination occurs even when the test data represents a small fraction of the teacher's training samples. We, therefore, encourage caution when training using black-box teacher models where data provenance is ambiguous.
Related papers
- Towards Training One-Step Diffusion Models Without Distillation [72.80423908458772]
We show that one-step generative models can be trained directly without this distillation process.
We propose a family of distillation methods that achieve competitive results without relying on score estimation.
arXiv Detail & Related papers (2025-02-11T23:02:14Z) - Inference-Time Diffusion Model Distillation [59.350789627086456]
We introduce Distillation++, a novel inference-time distillation framework.
Inspired by recent advances in conditional sampling, our approach recasts student model sampling as a proximal optimization problem.
We integrate distillation optimization during reverse sampling, which can be viewed as teacher guidance.
arXiv Detail & Related papers (2024-12-12T02:07:17Z) - Estimating Contamination via Perplexity: Quantifying Memorisation in
Language Model Evaluation [2.4173424114751114]
We propose a novel method to quantify contamination without the access of the full training set.
Our analysis provides evidence of significant memorisation of recent foundation models in popular reading comprehension, summarisation benchmarks, while multiple choice appears less contaminated.
arXiv Detail & Related papers (2023-09-19T15:02:58Z) - BOOT: Data-free Distillation of Denoising Diffusion Models with
Bootstrapping [64.54271680071373]
Diffusion models have demonstrated excellent potential for generating diverse images.
Knowledge distillation has been recently proposed as a remedy that can reduce the number of inference steps to one or a few.
We present a novel technique called BOOT, that overcomes limitations with an efficient data-free distillation algorithm.
arXiv Detail & Related papers (2023-06-08T20:30:55Z) - HomoDistil: Homotopic Task-Agnostic Distillation of Pre-trained
Transformers [49.79405257763856]
This paper focuses on task-agnostic distillation.
It produces a compact pre-trained model that can be easily fine-tuned on various tasks with small computational costs and memory footprints.
We propose Homotopic Distillation (HomoDistil), a novel task-agnostic distillation approach equipped with iterative pruning.
arXiv Detail & Related papers (2023-02-19T17:37:24Z) - Recognition of Defective Mineral Wool Using Pruned ResNet Models [88.24021148516319]
We developed a visual quality control system for mineral wool.
X-ray images of wool specimens were collected to create a training set of defective and non-defective samples.
We obtained a model with more than 98% accuracy, which in comparison to the current procedure used at the company, it can recognize 20% more defective products.
arXiv Detail & Related papers (2022-11-01T13:58:02Z) - Robust and Resource-Efficient Data-Free Knowledge Distillation by Generative Pseudo Replay [5.3330804968579795]
Data-Free Knowledge Distillation (KD) allows knowledge transfer from a trained neural network (teacher) to a more compact one (student) in the absence of original training data.
Existing works use a validation set to monitor the accuracy of the student over real data and report the highest performance throughout the entire process.
However, validation data may not be available at distillation time either, making it infeasible to record the student snapshot that achieved the peak accuracy.
This is challenging because the student experiences knowledge degradation due to the distribution shift of the synthetic data.
We propose to model the distribution of the previously observed synthetic samples
arXiv Detail & Related papers (2022-01-09T14:14:28Z) - Churn Reduction via Distillation [54.5952282395487]
We show an equivalence between training with distillation using the base model as the teacher and training with an explicit constraint on the predictive churn.
We then show that distillation performs strongly for low churn training against a number of recent baselines.
arXiv Detail & Related papers (2021-06-04T18:03:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.