Related papers: Optimization, Generalization and Differential Privacy Bounds for Gradient Descent on Kolmogorov-Arnold Networks

Optimization, Generalization and Differential Privacy Bounds for Gradient Descent on Kolmogorov-Arnold Networks

URL: http://arxiv.org/abs/2601.22409v2
Date: Wed, 04 Feb 2026 17:03:18 GMT
Title: Optimization, Generalization and Differential Privacy Bounds for Gradient Descent on Kolmogorov-Arnold Networks
Authors: Puyu Wang, Junyu Zhou, Philipp Liznerski, Marius Kloft,
Abstract summary: Kolmogorov--Arnold Networks (KANs) have recently emerged as a structured alternative to standard iterations.<n>In this paper, we analyze gradient descent (GD) for training two-layer KANs.<n>We derive general bounds that characterize their training dynamics, generalization, and utility under differential privacy.
Score: 18.128384228564883
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Kolmogorov--Arnold Networks (KANs) have recently emerged as a structured alternative to standard MLPs, yet a principled theory for their training dynamics, generalization, and privacy properties remains limited. In this paper, we analyze gradient descent (GD) for training two-layer KANs and derive general bounds that characterize their training dynamics, generalization, and utility under differential privacy (DP). As a concrete instantiation, we specialize our analysis to logistic loss under an NTK-separable assumption, where we show that polylogarithmic network width suffices for GD to achieve an optimization rate of order $1/T$ and a generalization rate of order $1/n$, with $T$ denoting the number of GD iterations and $n$ the sample size. In the private setting, we characterize the noise required for $(ε,δ)$-DP and obtain a utility bound of order $\sqrt{d}/(nε)$ (with $d$ the input dimension), matching the classical lower bound for general convex Lipschitz problems. Our results imply that polylogarithmic width is not only sufficient but also necessary under differential privacy, revealing a qualitative gap between non-private (sufficiency only) and private (necessity also emerges) training regimes. Experiments further illustrate how these theoretical insights can guide practical choices, including network width selection and early stopping.

Related papers

Regularized Online RLHF with Generalized Bilinear Preferences [68.44113000390544]
We consider the problem of contextual online RLHF with general preferences.<n>We adopt the Generalized Bilinear Preference Model to capture preferences via low-rank, skew-symmetric matrices.<n>We prove that the dual gap of the greedy policy is bounded by the square of the estimation error.
arXiv Detail & Related papers (2026-02-26T15:27:53Z)
Better Rates for Private Linear Regression in the Proportional Regime via Aggressive Clipping [19.186034457189162]
A common approach is to set the clipping constant much larger than the expected norm of the per-sample gradients.<n>While simplifying the analysis, this is however in sharp contrast with what empirical evidence suggests to optimize performance.<n>Our work bridges this gap between theory and practice by crucially operating in a regime where clipping happens frequently.
arXiv Detail & Related papers (2025-05-22T07:34:27Z)
Dyn-D$^2$P: Dynamic Differentially Private Decentralized Learning with Provable Utility Guarantee [36.82471440872803]
We propose a new Dynamic Differentially Private Decentralized DP approach (termed Dyn-D$2$P)<n>Dyn-D$2$P dynamically adjusts gradient clipping bounds and noise levels based on gradient convergence.<n>Experiments on benchmark datasets demonstrate the superiority of Dyn-D$2$ over its counterparts employing fixed-level noises.
arXiv Detail & Related papers (2025-05-10T13:57:57Z)
Smoothed Normalization for Efficient Distributed Private Optimization [54.197255548244705]
Federated learning enables machine learning models with privacy of participants.<n>There is no differentially private distributed method for training, non-feedback problems.<n>We introduce a new distributed algorithm $alpha$-$sf NormEC$ with provable convergence guarantees.
arXiv Detail & Related papers (2025-02-19T07:10:32Z)
Spatial Supply Repositioning with Censored Demand Data [10.797160099834306]
We consider a network inventory system motivated by one-way, on-demand vehicle sharing services.<n>Finding an optimal policy in such a general inventory network is analytically and computationally challenging.<n>Our work highlights the critical role of inventory in the viability of shared mobility businesses.
arXiv Detail & Related papers (2025-01-31T15:16:02Z)
Optimized Tradeoffs for Private Prediction with Majority Ensembling [59.99331405291337]
We introduce the Data-dependent Randomized Response Majority (DaRRM) algorithm.<n>DaRRM is parameterized by a data-dependent noise function $gamma$, and enables efficient utility optimization over the class of all private algorithms.<n>We show that DaRRM provably enjoys a privacy gain of a factor of 2 over common baselines, with fixed utility.
arXiv Detail & Related papers (2024-11-27T00:48:48Z)
Differentially Private Domain Adaptation with Theoretical Guarantees [46.37771025567305]
In many applications, the labeled data at the labeled data's disposal is subject to privacy constraints and is relatively limited. This is the modern problem of supervised domain adaptation from a public source to a private target domain. We make use of a general learner to benefit from favorable theoretical learning guarantees.
arXiv Detail & Related papers (2023-06-15T04:03:06Z)
Normalized/Clipped SGD with Perturbation for Differentially Private Non-Convex Optimization [94.06564567766475]
DP-SGD and DP-NSGD mitigate the risk of large models memorizing sensitive training data. We show that these two algorithms achieve similar best accuracy while DP-NSGD is comparatively easier to tune than DP-SGD.
arXiv Detail & Related papers (2022-06-27T03:45:02Z)
The Fundamental Price of Secure Aggregation in Differentially Private Federated Learning [34.630300910399036]
We characterize the fundamental communication cost required to obtain the best accuracy under $varepsilon$ central DP. Our results show that $tildeOleft( min(n2varepsilon2, d) right)$ bits per client are both sufficient and necessary. This provides a significant improvement relative to state-of-the-art SecAgg distributed DP schemes.
arXiv Detail & Related papers (2022-03-07T22:56:09Z)
Self-Ensembling GAN for Cross-Domain Semantic Segmentation [107.27377745720243]
This paper proposes a self-ensembling generative adversarial network (SE-GAN) exploiting cross-domain data for semantic segmentation. In SE-GAN, a teacher network and a student network constitute a self-ensembling model for generating semantic segmentation maps, which together with a discriminator, forms a GAN. Despite its simplicity, we find SE-GAN can significantly boost the performance of adversarial training and enhance the stability of the model.
arXiv Detail & Related papers (2021-12-15T09:50:25Z)
Private Stochastic Non-Convex Optimization: Adaptive Algorithms and Tighter Generalization Bounds [72.63031036770425]
We propose differentially private (DP) algorithms for bound non-dimensional optimization. We demonstrate two popular deep learning methods on the empirical advantages over standard gradient methods.
arXiv Detail & Related papers (2020-06-24T06:01:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.