Kernel Regression in Structured Non-IID Settings: Theory and Implications for Denoising Score Learning
- URL: http://arxiv.org/abs/2510.15363v1
- Date: Fri, 17 Oct 2025 06:52:40 GMT
- Title: Kernel Regression in Structured Non-IID Settings: Theory and Implications for Denoising Score Learning
- Authors: Dechen Zhang, Zhenmei Shi, Yi Zhang, Yingyu Liang, Difan Zou,
- Abstract summary: We present the first systematic study of KRR generalization for non-i.i.d. data with signal-noise causal structure.<n>We apply our results to denoising score learning, establishing generalization guarantees and providing principled guidance for sampling noisy data points.
- Score: 39.3200762517639
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Kernel ridge regression (KRR) is a foundational tool in machine learning, with recent work emphasizing its connections to neural networks. However, existing theory primarily addresses the i.i.d. setting, while real-world data often exhibits structured dependencies - particularly in applications like denoising score learning where multiple noisy observations derive from shared underlying signals. We present the first systematic study of KRR generalization for non-i.i.d. data with signal-noise causal structure, where observations represent different noisy views of common signals. By developing a novel blockwise decomposition method that enables precise concentration analysis for dependent data, we derive excess risk bounds for KRR that explicitly depend on: (1) the kernel spectrum, (2) causal structure parameters, and (3) sampling mechanisms (including relative sample sizes for signals and noises). We further apply our results to denoising score learning, establishing generalization guarantees and providing principled guidance for sampling noisy data points. This work advances KRR theory while providing practical tools for analyzing dependent data in modern machine learning applications.
Related papers
- Beyond Universal Approximation Theorems: Algorithmic Uniform Approximation by Neural Networks Trained with Noisy Data [12.815892583089443]
This paper introduces an architecture-specific randomized training algorithm that constructs a uniform approximator from $N$ noisy training samples.<n>Our trained neural networks attain the minimax-optimal quantity of textittrainable (non-random) parameters, subject to logarithmic factors which vanish under the idealized noiseless sampling assumed in classical UATs.
arXiv Detail & Related papers (2025-08-31T16:20:27Z) - Impact of Noisy Supervision in Foundation Model Learning [91.56591923244943]
This paper is the first work to comprehensively understand and analyze the nature of noise in pre-training datasets.<n>We propose a tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise and improve generalization.
arXiv Detail & Related papers (2024-03-11T16:22:41Z) - Optimal Rates and Saturation for Noiseless Kernel Ridge Regression [4.585021053685196]
We present a comprehensive study of Kernel ridge regression (KRR) in the noiseless regime.<n>KRR is a fundamental method for learning functions from finite samples.<n>We introduce a refined notion of degrees of freedom, which we believe has broader applicability in the analysis of kernel methods.
arXiv Detail & Related papers (2024-02-24T04:57:59Z) - Multi-modal Causal Structure Learning and Root Cause Analysis [67.67578590390907]
We propose Mulan, a unified multi-modal causal structure learning method for root cause localization.
We leverage a log-tailored language model to facilitate log representation learning, converting log sequences into time-series data.
We also introduce a novel key performance indicator-aware attention mechanism for assessing modality reliability and co-learning a final causal graph.
arXiv Detail & Related papers (2024-02-04T05:50:38Z) - Combating Bilateral Edge Noise for Robust Link Prediction [56.43882298843564]
We propose an information-theory-guided principle, Robust Graph Information Bottleneck (RGIB), to extract reliable supervision signals and avoid representation collapse.
Two instantiations, RGIB-SSL and RGIB-REP, are explored to leverage the merits of different methodologies.
Experiments on six datasets and three GNNs with diverse noisy scenarios verify the effectiveness of our RGIB instantiations.
arXiv Detail & Related papers (2023-11-02T12:47:49Z) - A Structured Dictionary Perspective on Implicit Neural Representations [47.35227614605095]
We show that most INR families are analogous to structured signal dictionaries whose atoms are integer harmonics of the set of initial mapping frequencies.
We explore the inductive bias of INRs exploiting recent results about the empirical neural tangent kernel (NTK)
Our results permit to design and tune novel INR architectures, but can also be of interest for the wider deep learning theory community.
arXiv Detail & Related papers (2021-12-03T14:00:52Z) - Learning Neural Causal Models with Active Interventions [83.44636110899742]
We introduce an active intervention-targeting mechanism which enables a quick identification of the underlying causal structure of the data-generating process.
Our method significantly reduces the required number of interactions compared with random intervention targeting.
We demonstrate superior performance on multiple benchmarks from simulated to real-world data.
arXiv Detail & Related papers (2021-09-06T13:10:37Z) - SreaMRAK a Streaming Multi-Resolution Adaptive Kernel Algorithm [60.61943386819384]
Existing implementations of KRR require that all the data is stored in the main memory.
We propose StreaMRAK - a streaming version of KRR.
We present a showcase study on two synthetic problems and the prediction of the trajectory of a double pendulum.
arXiv Detail & Related papers (2021-08-23T21:03:09Z) - Closed Loop Neural-Symbolic Learning via Integrating Neural Perception,
Grammar Parsing, and Symbolic Reasoning [134.77207192945053]
Prior methods learn the neural-symbolic models using reinforcement learning approaches.
We introduce the textbfgrammar model as a textitsymbolic prior to bridge neural perception and symbolic reasoning.
We propose a novel textbfback-search algorithm which mimics the top-down human-like learning procedure to propagate the error.
arXiv Detail & Related papers (2020-06-11T17:42:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.