Related papers: High-arity PAC learning via exchangeability

Related papers

Overcoming Dimensional Factorization Limits in Discrete Diffusion Models through Quantum Joint Distribution Learning [79.65014491424151]
We propose a quantum Discrete Denoising Diffusion Probabilistic Model (QD3PM)<n>It enables joint probability learning through diffusion and denoising in exponentially large Hilbert spaces.<n>This paper establishes a new theoretical paradigm in generative models by leveraging the quantum advantage in joint distribution learning.
arXiv Detail & Related papers (2025-05-08T11:48:21Z)
Transformation-Invariant Learning and Theoretical Guarantees for OOD Generalization [34.036655200677664]
This paper focuses on a distribution shift setting where train and test distributions can be related by classes of (data) transformation maps. We establish learning rules and algorithmic reductions to Empirical Risk Minimization (ERM) We highlight that the learning rules we derive offer a game-theoretic viewpoint on distribution shift.
arXiv Detail & Related papers (2024-10-30T20:59:57Z)
Measurability in the Fundamental Theorem of Statistical Learning [0.0]
The Fundamental Theorem of Statistical Learning states that a hypothesis space is PAC learnable if and only if its VC dimension is finite. This paper presents sufficient conditions for the PAC learnability of hypothesis spaces defined over o-minimal expansions of the reals. This class of hypothesis spaces covers all artificial neural networks for binary classification that use commonly employed activation functions like ReLU and the sigmoid function.
arXiv Detail & Related papers (2024-10-14T08:03:06Z)
Distributional Associations vs In-Context Reasoning: A Study of Feed-forward and Attention Layers [49.80959223722325]
We study the distinction between feed-forward and attention layers in large language models. We find that feed-forward layers tend to learn simple distributional associations such as bigrams, while attention layers focus on in-context reasoning.
arXiv Detail & Related papers (2024-06-05T08:51:08Z)
Credal Learning Theory [4.64390130376307]
We lay the foundations for a credal' theory of learning, using convex sets of probabilities to model the variability in the data-generating distribution. Bounds are derived for the case of finite hypotheses spaces, as well as infinite model spaces, which directly generalize classical results.
arXiv Detail & Related papers (2024-02-01T19:25:58Z)
Learning Complete Topology-Aware Correlations Between Relations for Inductive Link Prediction [121.65152276851619]
We show that semantic correlations between relations are inherently edge-level and entity-independent. We propose a novel subgraph-based method, namely TACO, to model Topology-Aware COrrelations between relations. To further exploit the potential of RCN, we propose Complete Common Neighbor induced subgraph.
arXiv Detail & Related papers (2023-09-20T08:11:58Z)
On the Joint Interaction of Models, Data, and Features [82.60073661644435]
We introduce a new tool, the interaction tensor, for empirically analyzing the interaction between data and model through features. Based on these observations, we propose a conceptual framework for feature learning. Under this framework, the expected accuracy for a single hypothesis and agreement for a pair of hypotheses can both be derived in closed-form.
arXiv Detail & Related papers (2023-06-07T21:35:26Z)
DIFFormer: Scalable (Graph) Transformers Induced by Energy Constrained Diffusion [66.21290235237808]
We introduce an energy constrained diffusion model which encodes a batch of instances from a dataset into evolutionary states. We provide rigorous theory that implies closed-form optimal estimates for the pairwise diffusion strength among arbitrary instance pairs. Experiments highlight the wide applicability of our model as a general-purpose encoder backbone with superior performance in various tasks.
arXiv Detail & Related papers (2023-01-23T15:18:54Z)
A theory of learning with constrained weight-distribution [17.492950552276067]
We develop a statistical mechanical theory of learning in neural networks that incorporates structural information as constraints. We show that training in our algorithm can be interpreted as geodesic flows in the Wasserstein space of probability distributions. Our theory and algorithm provide novel strategies for incorporating prior knowledge about weights into learning, and reveal a powerful connection between structure and function in neural networks.
arXiv Detail & Related papers (2022-06-14T00:43:34Z)
Amortized Inference for Causal Structure Learning [72.84105256353801]
Learning causal structure poses a search problem that typically involves evaluating structures using a score or independence test. We train a variational inference model to predict the causal structure from observational/interventional data. Our models exhibit robust generalization capabilities under substantial distribution shift.
arXiv Detail & Related papers (2022-05-25T17:37:08Z)
Adversarially Robust Models may not Transfer Better: Sufficient Conditions for Domain Transferability from the View of Regularization [17.825841580342715]
Machine learning robustness and domain generalization are fundamentally correlated. Recent studies show that more robust (adversarially trained) models are more generalizable. There is a lack of theoretical understanding of their fundamental connections.
arXiv Detail & Related papers (2022-02-03T20:26:27Z)
Accuracy on the Line: On the Strong Correlation Between Out-of-Distribution and In-Distribution Generalization [89.73665256847858]
We show that out-of-distribution performance is strongly correlated with in-distribution performance for a wide range of models and distribution shifts. Specifically, we demonstrate strong correlations between in-distribution and out-of-distribution performance on variants of CIFAR-10 & ImageNet. We also investigate cases where the correlation is weaker, for instance some synthetic distribution shifts from CIFAR-10-C and the tissue classification dataset Camelyon17-WILDS.
arXiv Detail & Related papers (2021-07-09T19:48:23Z)
Prequential MDL for Causal Structure Learning with Neural Networks [9.669269791955012]
We show that the prequential minimum description length principle can be used to derive a practical scoring function for Bayesian networks. We obtain plausible and parsimonious graph structures without relying on sparsity inducing priors or other regularizers which must be tuned. We discuss how the the prequential score relates to recent work that infers causal structure from the speed of adaptation when the observations come from a source undergoing distributional shift.
arXiv Detail & Related papers (2021-07-02T22:35:21Z)
Generalization Properties of Optimal Transport GANs with Latent Distribution Learning [52.25145141639159]
We study how the interplay between the latent distribution and the complexity of the pushforward map affects performance. Motivated by our analysis, we advocate learning the latent distribution as well as the pushforward map within the GAN paradigm.
arXiv Detail & Related papers (2020-07-29T07:31:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.