Generalized Kernel Inducing Points by Duality Gap for Dataset Distillation
- URL: http://arxiv.org/abs/2502.12607v1
- Date: Tue, 18 Feb 2025 07:43:13 GMT
- Title: Generalized Kernel Inducing Points by Duality Gap for Dataset Distillation
- Authors: Tatsuya Aoyama, Hanting Yang, Hiroyuki Hanada, Satoshi Akahane, Tomonari Tanaka, Yoshito Okura, Yu Inatsu, Noriaki Hashimoto, Taro Murayama, Hanju Lee, Shinya Kojima, Ichiro Takeuchi,
- Abstract summary: Duality Gap KIP (DGKIP) is an extension of the Kernel Inducing Points (KIP) method for dataset distillation.<n> Experimental results on standard benchmarks such as MNIST and CIFAR-10 demonstrate that DGKIP retains the efficiency of KIP while offering broader applicability and robust performance.
- Score: 14.979902937312099
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose Duality Gap KIP (DGKIP), an extension of the Kernel Inducing Points (KIP) method for dataset distillation. While existing dataset distillation methods often rely on bi-level optimization, DGKIP eliminates the need for such optimization by leveraging duality theory in convex programming. The KIP method has been introduced as a way to avoid bi-level optimization; however, it is limited to the squared loss and does not support other loss functions (e.g., cross-entropy or hinge loss) that are more suitable for classification tasks. DGKIP addresses this limitation by exploiting an upper bound on parameter changes after dataset distillation using the duality gap, enabling its application to a wider range of loss functions. We also characterize theoretical properties of DGKIP by providing upper bounds on the test error and prediction consistency after dataset distillation. Experimental results on standard benchmarks such as MNIST and CIFAR-10 demonstrate that DGKIP retains the efficiency of KIP while offering broader applicability and robust performance.
Related papers
- ECPv2: Fast, Efficient, and Scalable Global Optimization of Lipschitz Functions [91.31596562378076]
ECPv2 is a scalable and theoretically grounded algorithm for global optimization of Lipschitz-continuous functions with unknown Lipschitzs.<n>We show that ECPv2 retains ECP's no-regret guarantees with optimal-time bounds and expands the acceptance region with these findings.
arXiv Detail & Related papers (2025-11-20T17:30:55Z) - Bayesian Natural Gradient Fine-Tuning of CLIP Models via Kalman Filtering [4.681301898136104]
Few fine-tuning is a major challenge to achieve optimal performance in vision-language pre-trained models.<n>We propose a Bayesian approximation of Natural Descent (NGD) using a Kalman filter for CLIP models.<n>Our algorithm consistently achieves superior--or comparable--ID performance compared to state-of-the-art baselines.
arXiv Detail & Related papers (2025-11-03T16:00:45Z) - CAGE: Curvature-Aware Gradient Estimation For Accurate Quantization-Aware Training [73.46600457802693]
We introduce a new method that counteracts the loss induced by quantization.<n>CAGE significantly improves upon the state-of-theart methods in terms of accuracy, for similar computational cost.<n>For QAT pre-training of Llama models, CAGE matches the accuracy achieved at 4-bits (W4A4) with the prior best method.
arXiv Detail & Related papers (2025-10-21T16:33:57Z) - Stochastic Primal-Dual Double Block-Coordinate for Two-way Partial AUC Maximization [56.805574957824135]
Two-way partial AUCAUC is a critical performance metric for binary classification with imbalanced data.<n>Existing algorithms for TPAUC optimization remain under-explored.<n>We introduce two innovative double-coordinate block-coordinate algorithms for TPAUC optimization.
arXiv Detail & Related papers (2025-05-28T03:55:05Z) - Generalized Kullback-Leibler Divergence Loss [105.66549870868971]
We prove that the Kullback-Leibler (KL) Divergence loss is equivalent to the Decoupled Kullback-Leibler (DKL) Divergence loss.
Thanks to the decoupled structure of DKL loss, we have identified two areas for improvement.
arXiv Detail & Related papers (2025-03-11T04:43:33Z) - On the Convergence of DP-SGD with Adaptive Clipping [56.24689348875711]
Gradient Descent with gradient clipping is a powerful technique for enabling differentially private optimization.<n>This paper provides the first comprehensive convergence analysis of SGD with quantile clipping (QC-SGD)<n>We show how QC-SGD suffers from a bias problem similar to constant-threshold clipped SGD but can be mitigated through a carefully designed quantile and step size schedule.
arXiv Detail & Related papers (2024-12-27T20:29:47Z) - Offline Behavior Distillation [57.6900189406964]
Massive reinforcement learning (RL) data are typically collected to train policies offline without the need for interactions.
We formulate offline behavior distillation (OBD), which synthesizes limited expert behavioral data from sub-optimal RL data.
We propose two naive OBD objectives, DBC and PBC, which measure distillation performance via the decision difference between policies trained on distilled data and either offline data or a near-expert policy.
arXiv Detail & Related papers (2024-10-30T06:28:09Z) - Primal-Dual Spectral Representation for Off-policy Evaluation [39.24759979398673]
Off-policy evaluation (OPE) is one of the most fundamental problems in reinforcement learning (RL)
We show that our algorithm, SpectralDICE, is both primal and sample efficient, the performance of which is supported by a rigorous theoretical sample complexity guarantee and a thorough empirical evaluation on various benchmarks.
arXiv Detail & Related papers (2024-10-23T03:38:31Z) - Gradient Normalization Provably Benefits Nonconvex SGD under Heavy-Tailed Noise [60.92029979853314]
We investigate the roles of gradient normalization and clipping in ensuring the convergence of Gradient Descent (SGD) under heavy-tailed noise.
Our work provides the first theoretical evidence demonstrating the benefits of gradient normalization in SGD under heavy-tailed noise.
We introduce an accelerated SGD variant incorporating gradient normalization and clipping, further enhancing convergence rates under heavy-tailed noise.
arXiv Detail & Related papers (2024-10-21T22:40:42Z) - EnsLoss: Stochastic Calibrated Loss Ensembles for Preventing Overfitting in Classification [1.3778851745408134]
We propose a novel ensemble method, namely EnsLoss, to combine loss functions within the Empirical risk minimization framework.
We first transform the CC conditions of losses into loss-derivatives, thereby bypassing the need for explicit loss functions.
We theoretically establish the statistical consistency of our approach and provide insights into its benefits.
arXiv Detail & Related papers (2024-09-02T02:40:42Z) - Differentially Private SGD Without Clipping Bias: An Error-Feedback Approach [62.000948039914135]
Using Differentially Private Gradient Descent with Gradient Clipping (DPSGD-GC) to ensure Differential Privacy (DP) comes at the cost of model performance degradation.
We propose a new error-feedback (EF) DP algorithm as an alternative to DPSGD-GC.
We establish an algorithm-specific DP analysis for our proposed algorithm, providing privacy guarantees based on R'enyi DP.
arXiv Detail & Related papers (2023-11-24T17:56:44Z) - Proximal Point Imitation Learning [48.50107891696562]
We develop new algorithms with rigorous efficiency guarantees for infinite horizon imitation learning.
We leverage classical tools from optimization, in particular, the proximal-point method (PPM) and dual smoothing.
We achieve convincing empirical performance for both linear and neural network function approximation.
arXiv Detail & Related papers (2022-09-22T12:40:21Z) - Optimizing Partial Area Under the Top-k Curve: Theory and Practice [151.5072746015253]
We develop a novel metric named partial Area Under the top-k Curve (AUTKC)
AUTKC has a better discrimination ability, and its Bayes optimal score function could give a correct top-K ranking with respect to the conditional probability.
We present an empirical surrogate risk minimization framework to optimize the proposed metric.
arXiv Detail & Related papers (2022-09-03T11:09:13Z) - A novel Deep Learning approach for one-step Conformal Prediction
approximation [0.7646713951724009]
Conformal Prediction (CP) is a versatile solution that guarantees a maximum error rate given minimal constraints.
We propose a novel conformal loss function that approximates the traditionally two-step CP approach in a single step.
arXiv Detail & Related papers (2022-07-25T17:46:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.