Self-Supervised Dataset Distillation for Transfer Learning
- URL: http://arxiv.org/abs/2310.06511v3
- Date: Fri, 12 Apr 2024 01:53:33 GMT
- Title: Self-Supervised Dataset Distillation for Transfer Learning
- Authors: Dong Bok Lee, Seanie Lee, Joonho Ko, Kenji Kawaguchi, Juho Lee, Sung Ju Hwang,
- Abstract summary: We propose a novel problem of distilling an unlabeled dataset into a set of small synthetic samples for efficient self-supervised learning (SSL)
We first prove that a gradient of synthetic samples with respect to a SSL objective in naive bilevel optimization is textitbiased due to randomness originating from data augmentations or masking.
We empirically validate the effectiveness of our method on various applications involving transfer learning.
- Score: 77.4714995131992
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Dataset distillation methods have achieved remarkable success in distilling a large dataset into a small set of representative samples. However, they are not designed to produce a distilled dataset that can be effectively used for facilitating self-supervised pre-training. To this end, we propose a novel problem of distilling an unlabeled dataset into a set of small synthetic samples for efficient self-supervised learning (SSL). We first prove that a gradient of synthetic samples with respect to a SSL objective in naive bilevel optimization is \textit{biased} due to the randomness originating from data augmentations or masking. To address this issue, we propose to minimize the mean squared error (MSE) between a model's representations of the synthetic examples and their corresponding learnable target feature representations for the inner objective, which does not introduce any randomness. Our primary motivation is that the model obtained by the proposed inner optimization can mimic the \textit{self-supervised target model}. To achieve this, we also introduce the MSE between representations of the inner model and the self-supervised target model on the original full dataset for outer optimization. Lastly, assuming that a feature extractor is fixed, we only optimize a linear head on top of the feature extractor, which allows us to reduce the computational cost and obtain a closed-form solution of the head with kernel ridge regression. We empirically validate the effectiveness of our method on various applications involving transfer learning.
Related papers
- SAIL: Self-Amplified Iterative Learning for Diffusion Model Alignment with Minimal Human Feedback [19.637094881784634]
We propose textbfSAIL (textbfSelf-textbfAmplified textbfIterative textbfLearning), a novel framework that enables diffusion models to act as their own teachers through iterative self-improvement.
arXiv Detail & Related papers (2026-02-05T06:58:38Z) - Dataset Distillation for Pre-Trained Self-Supervised Vision Models [43.50190223507616]
dataset distillation aims to find a small set of synthetic images such that training a model on them reproduces the performance of the same model trained on a much larger dataset of real samples.<n>We introduce a method of dataset distillation for this task called Linear Gradient Matching.<n>Our method yields synthetic data that outperform all real-image baselines and, remarkably, generalize across pre-trained vision models.
arXiv Detail & Related papers (2025-11-20T18:59:57Z) - Heterogeneous Self-Supervised Acoustic Pre-Training with Local Constraints [64.15709757611369]
We propose a new self-supervised pre-training approach to dealing with heterogeneous data.<n>The proposed approach can significantly improve the adaptivity of the self-supervised pre-trained model for the downstream supervised fine-tuning tasks.
arXiv Detail & Related papers (2025-08-27T15:48:50Z) - CART-based Synthetic Tabular Data Generation for Imbalanced Regression [1.342834401139078]
We propose adapting an existing CART-based synthetic data generation method, tailoring it for imbalanced regression.<n>The new method integrates relevance and density-based mechanisms to guide sampling in sparse regions of the target space.<n>Our experimental study focuses on the prediction of extreme target values across benchmark datasets.
arXiv Detail & Related papers (2025-06-03T12:42:20Z) - Self-Boost via Optimal Retraining: An Analysis via Approximate Message Passing [58.52119063742121]
Retraining a model using its own predictions together with the original, potentially noisy labels is a well-known strategy for improving the model performance.<n>This paper addresses the question of how to optimally combine the model's predictions and the provided labels.<n>Our main contribution is the derivation of the Bayes optimal aggregator function to combine the current model's predictions and the given labels.
arXiv Detail & Related papers (2025-05-21T07:16:44Z) - Dataset Distillation as Pushforward Optimal Quantization [2.5892916589735457]
We propose a synthetic training set that achieves similar performance to training on real data, with orders of magnitude less computational requirements.<n>In particular, we link existing disentangled dataset distillation methods to the classical optimal quantization and Wasserstein barycenter problems.<n>We achieve better performance and inter-model generalization on the ImageNet-1K dataset with trivial additional computation, and SOTA performance in higher image-per-class settings.
arXiv Detail & Related papers (2025-01-13T20:41:52Z) - Distributionally Robust Optimization as a Scalable Framework to Characterize Extreme Value Distributions [22.765095010254118]
The goal of this paper is to develop distributionally robust optimization (DRO) estimators, specifically for multidimensional Extreme Value Theory (EVT) statistics.
In order to mitigate over-conservative estimates while enhancing out-of-sample performance, we study DRO estimators informed by semi-parametric max-stable constraints in the space of point processes.
Both approaches are validated using synthetically generated data, recovering prescribed characteristics, and verifying the efficacy of the proposed techniques.
arXiv Detail & Related papers (2024-07-31T19:45:27Z) - Aligning Large Language Models with Self-generated Preference Data [72.99676237703099]
We propose a new framework that boosts the alignment of large language models (LLMs) with human preferences.
Our key idea is leveraging the human prior knowledge within the small (seed) data.
We introduce a noise-aware preference learning algorithm to mitigate the risk of low quality within generated preference data.
arXiv Detail & Related papers (2024-06-06T18:01:02Z) - Outlier Gradient Analysis: Efficiently Identifying Detrimental Training Samples for Deep Learning Models [36.05242956018461]
In this paper, we establish a bridge between identifying detrimental training samples via influence functions and outlier gradient detection.
We first validate the hypothesis of our proposed outlier gradient analysis approach on synthetic datasets.
We then demonstrate its effectiveness in detecting mislabeled samples in vision models and selecting data samples for improving performance of natural language processing transformer models.
arXiv Detail & Related papers (2024-05-06T21:34:46Z) - Soft Preference Optimization: Aligning Language Models to Expert Distributions [40.84391304598521]
SPO is a method for aligning generative models, such as Large Language Models (LLMs), with human preferences.
SPO integrates preference loss with a regularization term across the model's entire output distribution.
We showcase SPO's methodology, its theoretical foundation, and its comparative advantages in simplicity, computational efficiency, and alignment precision.
arXiv Detail & Related papers (2024-04-30T19:48:55Z) - Diffusion Models as Constrained Samplers for Optimization with Unknown Constraints [55.39203337683045]
We propose to perform optimization within the data manifold using diffusion models.<n>Depending on the differentiability of the objective function, we propose two different sampling methods.<n>Our method achieves better or comparable performance with previous state-of-the-art baselines.
arXiv Detail & Related papers (2024-02-28T03:09:12Z) - Synthetic data, real errors: how (not) to publish and use synthetic data [86.65594304109567]
We show how the generative process affects the downstream ML task.
We introduce Deep Generative Ensemble (DGE) to approximate the posterior distribution over the generative process model parameters.
arXiv Detail & Related papers (2023-05-16T07:30:29Z) - Boosting Differentiable Causal Discovery via Adaptive Sample Reweighting [62.23057729112182]
Differentiable score-based causal discovery methods learn a directed acyclic graph from observational data.
We propose a model-agnostic framework to boost causal discovery performance by dynamically learning the adaptive weights for the Reweighted Score function, ReScore.
arXiv Detail & Related papers (2023-03-06T14:49:59Z) - Learning from aggregated data with a maximum entropy model [73.63512438583375]
We show how a new model, similar to a logistic regression, may be learned from aggregated data only by approximating the unobserved feature distribution with a maximum entropy hypothesis.
We present empirical evidence on several public datasets that the model learned this way can achieve performances comparable to those of a logistic model trained with the full unaggregated data.
arXiv Detail & Related papers (2022-10-05T09:17:27Z) - Self-Distillation for Further Pre-training of Transformers [83.84227016847096]
We propose self-distillation as a regularization for a further pre-training stage.
We empirically validate the efficacy of self-distillation on a variety of benchmark datasets for image and text classification tasks.
arXiv Detail & Related papers (2022-09-30T02:25:12Z) - Conservative Objective Models for Effective Offline Model-Based
Optimization [78.19085445065845]
Computational design problems arise in a number of settings, from synthetic biology to computer architectures.
We propose a method that learns a model of the objective function that lower bounds the actual value of the ground-truth objective on out-of-distribution inputs.
COMs are simple to implement and outperform a number of existing methods on a wide range of MBO problems.
arXiv Detail & Related papers (2021-07-14T17:55:28Z) - Model-based Policy Optimization with Unsupervised Model Adaptation [37.09948645461043]
We investigate how to bridge the gap between real and simulated data due to inaccurate model estimation for better policy optimization.
We propose a novel model-based reinforcement learning framework AMPO, which introduces unsupervised model adaptation.
Our approach achieves state-of-the-art performance in terms of sample efficiency on a range of continuous control benchmark tasks.
arXiv Detail & Related papers (2020-10-19T14:19:42Z) - Neural Model-based Optimization with Right-Censored Observations [42.530925002607376]
Neural networks (NNs) have been demonstrated to work well at the core of model-based optimization procedures.
We show that our trained regression models achieve a better predictive quality than several baselines.
arXiv Detail & Related papers (2020-09-29T07:32:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.