Related papers: Statistical Learning Theory for Distributional Classification

Statistical Learning Theory for Distributional Classification

URL: http://arxiv.org/abs/2601.14818v1
Date: Wed, 21 Jan 2026 09:44:24 GMT
Title: Statistical Learning Theory for Distributional Classification
Authors: Christian Fiedler,
Abstract summary: In supervised learning with distributional inputs, the inputs are not accessible in the learning phase, but only samples thereof.<n>This problem is particularly amenable to kernel-based learning methods, where the distributions or samples are first embedded into a Hilbert space.<n>We contribute to the theoretical analysis of this latter approach, with a particular focus on classification with distributional inputs using SVMs.
Score: 3.231986804142224
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In supervised learning with distributional inputs in the two-stage sampling setup, relevant to applications like learning-based medical screening or causal learning, the inputs (which are probability distributions) are not accessible in the learning phase, but only samples thereof. This problem is particularly amenable to kernel-based learning methods, where the distributions or samples are first embedded into a Hilbert space, often using kernel mean embeddings (KMEs), and then a standard kernel method like Support Vector Machines (SVMs) is applied, using a kernel defined on the embedding Hilbert space. In this work, we contribute to the theoretical analysis of this latter approach, with a particular focus on classification with distributional inputs using SVMs. We establish a new oracle inequality and derive consistency and learning rate results. Furthermore, for SVMs using the hinge loss and Gaussian kernels, we formulate a novel variant of an established noise assumption from the binary classification literature, under which we can establish learning rates. Finally, some of our technical tools like a new feature space for Gaussian kernels on Hilbert spaces are of independent interest.

Related papers

Notes on Kernel Methods in Machine Learning [0.8435614464136675]
We develop the theory of positive definite kernels, reproducing kernel Hilbert spaces (RKHS), and Hilbert-Schmidt operators.<n>We also introduce kernel density estimation, kernel embeddings of distributions, and the Maximum Mean Discrepancy (MMD)<n>The exposition is designed to serve as a foundation for more advanced topics, including Gaussian processes, kernel Bayesian inference, and functional analytic approaches to modern machine learning.
arXiv Detail & Related papers (2025-11-18T13:29:07Z)
On the Consistency of Kernel Methods with Dependent Observations [5.467140383171385]
We propose a new notion of empirical weak convergence (EWC) explaining such phenomena for kernel methods. EWC assumes the existence of a random data distribution and is a strict weakening of previous assumptions in the field. Our results open new classes of processes to statistical learning and can serve as a foundation for a theory of learning beyond i.i.d. and mixing.
arXiv Detail & Related papers (2024-06-10T08:35:01Z)
Improved learning theory for kernel distribution regression with two-stage sampling [10.371912403602735]
kernel methods have become a method of choice to tackle the distribution regression problem.<n>We introduce the novel near-unbiased condition on the Hilbertian embeddings, that enables us to provide new error bounds.<n>We show that this near-unbiased condition holds for three important classes of kernels, based on optimal transport and mean embedding.
arXiv Detail & Related papers (2023-08-28T06:29:09Z)
Symmetric Equilibrium Learning of VAEs [56.56929742714685]
We view variational autoencoders (VAEs) as decoder-encoder pairs, which map distributions in the data space to distributions in the latent space and vice versa. We propose a Nash equilibrium learning approach, which is symmetric with respect to the encoder and decoder and allows learning VAEs in situations where both the data and the latent distributions are accessible only by sampling.
arXiv Detail & Related papers (2023-07-19T10:27:34Z)
New Equivalences Between Interpolation and SVMs: Kernels and Structured Features [22.231455330003328]
We present a new and flexible analysis framework for proving SVP in an arbitrary kernel reproducing Hilbert space with a flexible class of generative models for the labels. We show that SVP occurs in many interesting settings not covered by prior work, and we leverage these results to prove novel generalization results for kernel SVM classification.
arXiv Detail & Related papers (2023-05-03T17:52:40Z)
Stationary Kernels and Gaussian Processes on Lie Groups and their Homogeneous Spaces II: non-compact symmetric spaces [43.877478563933316]
In to symmetries is one of the most fundamental forms of prior information one can consider. In this work, we develop constructive and practical techniques for building stationary Gaussian processes on a very large class of non-Euclidean spaces.
arXiv Detail & Related papers (2023-01-30T17:27:12Z)
Learning "best" kernels from data in Gaussian process regression. With application to aerodynamics [0.4588028371034406]
We introduce algorithms to select/design kernels in Gaussian process regression/kriging surrogate modeling techniques. A first class of algorithms is kernel flow, which was introduced in a context of classification in machine learning. A second class of algorithms is called spectral kernel ridge regression, and aims at selecting a "best" kernel such that the norm of the function to be approximated is minimal.
arXiv Detail & Related papers (2022-06-03T07:50:54Z)
On the Benefits of Large Learning Rates for Kernel Methods [110.03020563291788]
We show that a phenomenon can be precisely characterized in the context of kernel methods. We consider the minimization of a quadratic objective in a separable Hilbert space, and show that with early stopping, the choice of learning rate influences the spectral decomposition of the obtained solution.
arXiv Detail & Related papers (2022-02-28T13:01:04Z)
A Note on Optimizing Distributions using Kernel Mean Embeddings [94.96262888797257]
Kernel mean embeddings represent probability measures by their infinite-dimensional mean embeddings in a reproducing kernel Hilbert space. We show that when the kernel is characteristic, distributions with a kernel sum-of-squares density are dense. We provide algorithms to optimize such distributions in the finite-sample setting.
arXiv Detail & Related papers (2021-06-18T08:33:45Z)
Kernel learning approaches for summarising and combining posterior similarity matrices [68.8204255655161]
We build upon the notion of the posterior similarity matrix (PSM) in order to suggest new approaches for summarising the output of MCMC algorithms for Bayesian clustering models. A key contribution of our work is the observation that PSMs are positive semi-definite, and hence can be used to define probabilistically-motivated kernel matrices.
arXiv Detail & Related papers (2020-09-27T14:16:14Z)
Isolation Distributional Kernel: A New Tool for Point & Group Anomaly Detection [76.1522587605852]
Isolation Distributional Kernel (IDK) is a new way to measure the similarity between two distributions. We demonstrate IDK's efficacy and efficiency as a new tool for kernel based anomaly detection for both point and group anomalies.
arXiv Detail & Related papers (2020-09-24T12:25:43Z)
A Boundary Based Out-of-Distribution Classifier for Generalized Zero-Shot Learning [83.1490247844899]
Generalized Zero-Shot Learning (GZSL) is a challenging topic that has promising prospects in many realistic scenarios. We propose a boundary based Out-of-Distribution (OOD) classifier which classifies the unseen and seen domains by only using seen samples for training. We extensively validate our approach on five popular benchmark datasets including AWA1, AWA2, CUB, FLO and SUN.
arXiv Detail & Related papers (2020-08-09T11:27:19Z)
Learning Deep Kernels for Non-Parametric Two-Sample Tests [50.92621794426821]
We propose a class of kernel-based two-sample tests, which aim to determine whether two sets of samples are drawn from the same distribution. Our tests are constructed from kernels parameterized by deep neural nets, trained to maximize test power.
arXiv Detail & Related papers (2020-02-21T03:54:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.