Self-Supervised Dataset Distillation for Transfer Learning
- URL: http://arxiv.org/abs/2310.06511v3
- Date: Fri, 12 Apr 2024 01:53:33 GMT
- Title: Self-Supervised Dataset Distillation for Transfer Learning
- Authors: Dong Bok Lee, Seanie Lee, Joonho Ko, Kenji Kawaguchi, Juho Lee, Sung Ju Hwang,
- Abstract summary: We propose a novel problem of distilling an unlabeled dataset into a set of small synthetic samples for efficient self-supervised learning (SSL)
We first prove that a gradient of synthetic samples with respect to a SSL objective in naive bilevel optimization is textitbiased due to randomness originating from data augmentations or masking.
We empirically validate the effectiveness of our method on various applications involving transfer learning.
- Score: 77.4714995131992
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Dataset distillation methods have achieved remarkable success in distilling a large dataset into a small set of representative samples. However, they are not designed to produce a distilled dataset that can be effectively used for facilitating self-supervised pre-training. To this end, we propose a novel problem of distilling an unlabeled dataset into a set of small synthetic samples for efficient self-supervised learning (SSL). We first prove that a gradient of synthetic samples with respect to a SSL objective in naive bilevel optimization is \textit{biased} due to the randomness originating from data augmentations or masking. To address this issue, we propose to minimize the mean squared error (MSE) between a model's representations of the synthetic examples and their corresponding learnable target feature representations for the inner objective, which does not introduce any randomness. Our primary motivation is that the model obtained by the proposed inner optimization can mimic the \textit{self-supervised target model}. To achieve this, we also introduce the MSE between representations of the inner model and the self-supervised target model on the original full dataset for outer optimization. Lastly, assuming that a feature extractor is fixed, we only optimize a linear head on top of the feature extractor, which allows us to reduce the computational cost and obtain a closed-form solution of the head with kernel ridge regression. We empirically validate the effectiveness of our method on various applications involving transfer learning.
Related papers
- Distributionally Robust Optimization as a Scalable Framework to Characterize Extreme Value Distributions [22.765095010254118]
The goal of this paper is to develop distributionally robust optimization (DRO) estimators, specifically for multidimensional Extreme Value Theory (EVT) statistics.
In order to mitigate over-conservative estimates while enhancing out-of-sample performance, we study DRO estimators informed by semi-parametric max-stable constraints in the space of point processes.
Both approaches are validated using synthetically generated data, recovering prescribed characteristics, and verifying the efficacy of the proposed techniques.
arXiv Detail & Related papers (2024-07-31T19:45:27Z) - Aligning Large Language Models with Self-generated Preference Data [72.99676237703099]
We propose a new framework that boosts the alignment of large language models (LLMs) with human preferences.
Our key idea is leveraging the human prior knowledge within the small (seed) data.
We introduce a noise-aware preference learning algorithm to mitigate the risk of low quality within generated preference data.
arXiv Detail & Related papers (2024-06-06T18:01:02Z) - Outlier Gradient Analysis: Efficiently Identifying Detrimental Training Samples for Deep Learning Models [36.05242956018461]
In this paper, we establish a bridge between identifying detrimental training samples via influence functions and outlier gradient detection.
We first validate the hypothesis of our proposed outlier gradient analysis approach on synthetic datasets.
We then demonstrate its effectiveness in detecting mislabeled samples in vision models and selecting data samples for improving performance of natural language processing transformer models.
arXiv Detail & Related papers (2024-05-06T21:34:46Z) - Soft Preference Optimization: Aligning Language Models to Expert Distributions [40.84391304598521]
SPO is a method for aligning generative models, such as Large Language Models (LLMs), with human preferences.
SPO integrates preference loss with a regularization term across the model's entire output distribution.
We showcase SPO's methodology, its theoretical foundation, and its comparative advantages in simplicity, computational efficiency, and alignment precision.
arXiv Detail & Related papers (2024-04-30T19:48:55Z) - Synthetic data, real errors: how (not) to publish and use synthetic data [86.65594304109567]
We show how the generative process affects the downstream ML task.
We introduce Deep Generative Ensemble (DGE) to approximate the posterior distribution over the generative process model parameters.
arXiv Detail & Related papers (2023-05-16T07:30:29Z) - Learning from aggregated data with a maximum entropy model [73.63512438583375]
We show how a new model, similar to a logistic regression, may be learned from aggregated data only by approximating the unobserved feature distribution with a maximum entropy hypothesis.
We present empirical evidence on several public datasets that the model learned this way can achieve performances comparable to those of a logistic model trained with the full unaggregated data.
arXiv Detail & Related papers (2022-10-05T09:17:27Z) - Self-Distillation for Further Pre-training of Transformers [83.84227016847096]
We propose self-distillation as a regularization for a further pre-training stage.
We empirically validate the efficacy of self-distillation on a variety of benchmark datasets for image and text classification tasks.
arXiv Detail & Related papers (2022-09-30T02:25:12Z) - Conservative Objective Models for Effective Offline Model-Based
Optimization [78.19085445065845]
Computational design problems arise in a number of settings, from synthetic biology to computer architectures.
We propose a method that learns a model of the objective function that lower bounds the actual value of the ground-truth objective on out-of-distribution inputs.
COMs are simple to implement and outperform a number of existing methods on a wide range of MBO problems.
arXiv Detail & Related papers (2021-07-14T17:55:28Z) - Model-based Policy Optimization with Unsupervised Model Adaptation [37.09948645461043]
We investigate how to bridge the gap between real and simulated data due to inaccurate model estimation for better policy optimization.
We propose a novel model-based reinforcement learning framework AMPO, which introduces unsupervised model adaptation.
Our approach achieves state-of-the-art performance in terms of sample efficiency on a range of continuous control benchmark tasks.
arXiv Detail & Related papers (2020-10-19T14:19:42Z) - Neural Model-based Optimization with Right-Censored Observations [42.530925002607376]
Neural networks (NNs) have been demonstrated to work well at the core of model-based optimization procedures.
We show that our trained regression models achieve a better predictive quality than several baselines.
arXiv Detail & Related papers (2020-09-29T07:32:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.