Related papers: Training Guarantees of Neural Network Classification Two-Sample Tests by Kernel Analysis

Training Guarantees of Neural Network Classification Two-Sample Tests by Kernel Analysis

URL: http://arxiv.org/abs/2407.04806v2
Date: Tue, 9 Jul 2024 18:45:58 GMT
Title: Training Guarantees of Neural Network Classification Two-Sample Tests by Kernel Analysis
Authors: Varun Khurana, Xiuyuan Cheng, Alexander Cloninger,
Abstract summary: We construct and analyze a neural network two-sample test to determine whether two datasets came from the same distribution. We derive the theoretical minimum training time needed to ensure the NTK two-sample test detects a deviation-level between the datasets. We show that the statistical power associated with the neural network two-sample test goes to 1 as the neural network training samples and test evaluation samples go to infinity.
Score: 58.435336033383145
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We construct and analyze a neural network two-sample test to determine whether two datasets came from the same distribution (null hypothesis) or not (alternative hypothesis). We perform time-analysis on a neural tangent kernel (NTK) two-sample test. In particular, we derive the theoretical minimum training time needed to ensure the NTK two-sample test detects a deviation-level between the datasets. Similarly, we derive the theoretical maximum training time before the NTK two-sample test detects a deviation-level. By approximating the neural network dynamics with the NTK dynamics, we extend this time-analysis to the realistic neural network two-sample test generated from time-varying training dynamics and finite training samples. A similar extension is done for the neural network two-sample test generated from time-varying training dynamics but trained on the population. To give statistical guarantees, we show that the statistical power associated with the neural network two-sample test goes to 1 as the neural network training samples and test evaluation samples go to infinity. Additionally, we prove that the training times needed to detect the same deviation-level in the null and alternative hypothesis scenarios are well-separated. Finally, we run some experiments showcasing a two-layer neural network two-sample test on a hard two-sample test problem and plot a heatmap of the statistical power of the two-sample test in relation to training time and network complexity.

Related papers

Deep learning with missing data [3.829599191332801]
We propose Pattern Embedded Neural Networks (PENNs), which can be applied in conjunction with any existing imputation technique. In addition to a neural network trained on the imputed data, PENNs pass the vectors of observation indicators through a second neural network to provide a compact representation. The outputs are then combined in a third neural network to produce final predictions.
arXiv Detail & Related papers (2025-04-21T18:57:36Z)
Divergence of Empirical Neural Tangent Kernel in Classification Problems [0.0]
In classification problems, fully connected neural networks (FCNs) and residual neural networks (ResNets) cannot be approximated by kernel logistic regression based on the Neural Tangent Kernel (NTK) We show that the empirical NTK does not uniformly converge to the NTK across all times on the training samples as the network width increases.
arXiv Detail & Related papers (2025-04-15T12:30:21Z)
Computable Lipschitz Bounds for Deep Neural Networks [0.0]
We analyse three existing upper bounds written for the $l2$ norm. We propose two novel bounds for both feed-forward fully-connected neural networks and convolutional neural networks.
arXiv Detail & Related papers (2024-10-28T14:09:46Z)
Network two-sample test for block models [16.597465729143813]
We consider the two-sample testing problem for networks, where the goal is to determine whether two sets of networks originated from the same model. We adopt the block model (SBM) for network distributions, due to their interpretability and the potential to approximate more general models. We introduce an efficient algorithm to match estimated network parameters, allowing us to properly combine and contrast information within and across samples, leading to a powerful test.
arXiv Detail & Related papers (2024-06-10T04:28:37Z)
Collaborative non-parametric two-sample testing [55.98760097296213]
The goal is to identify nodes where the null hypothesis $p_v = q_v$ should be rejected. We propose the non-parametric collaborative two-sample testing (CTST) framework that efficiently leverages the graph structure. Our methodology integrates elements from f-divergence estimation, Kernel Methods, and Multitask Learning.
arXiv Detail & Related papers (2024-02-08T14:43:56Z)
Expressive probabilistic sampling in recurrent neural networks [4.3900330990701235]
We show that firing rate dynamics of a recurrent neural circuit with a separate set of output units can sample from an arbitrary probability distribution. We propose an efficient training procedure based on denoising score matching that finds recurrent and output weights such that the RSN implements Langevin sampling.
arXiv Detail & Related papers (2023-08-22T22:20:39Z)
How does unlabeled data improve generalization in self-training? A one-hidden-layer theoretical analysis [93.37576644429578]
This work establishes the first theoretical analysis for the known iterative self-training paradigm. We prove the benefits of unlabeled data in both training convergence and generalization ability. Experiments from shallow neural networks to deep neural networks are also provided to justify the correctness of our established theoretical insights on self-training.
arXiv Detail & Related papers (2022-01-21T02:16:52Z)
Why Lottery Ticket Wins? A Theoretical Perspective of Sample Complexity on Pruned Neural Networks [79.74580058178594]
We analyze the performance of training a pruned neural network by analyzing the geometric structure of the objective function. We show that the convex region near a desirable model with guaranteed generalization enlarges as the neural network model is pruned.
arXiv Detail & Related papers (2021-10-12T01:11:07Z)
Learning Neural Network Subspaces [74.44457651546728]
Recent observations have advanced our understanding of the neural network optimization landscape. With a similar computational cost as training one model, we learn lines, curves, and simplexes of high-accuracy neural networks. With a similar computational cost as training one model, we learn lines, curves, and simplexes of high-accuracy neural networks.
arXiv Detail & Related papers (2021-02-20T23:26:58Z)
Statistical model-based evaluation of neural networks [74.10854783437351]
We develop an experimental setup for the evaluation of neural networks (NNs) The setup helps to benchmark a set of NNs vis-a-vis minimum-mean-square-error (MMSE) performance bounds. This allows us to test the effects of training data size, data dimension, data geometry, noise, and mismatch between training and testing conditions.
arXiv Detail & Related papers (2020-11-18T00:33:24Z)
The training accuracy of two-layer neural networks: its estimation and understanding using random datasets [0.0]
We propose a novel theory based on space partitioning to estimate the approximate training accuracy for two-layer neural networks on random datasets without training. Our method estimates the training accuracy for two-layer fully-connected neural networks on two-class random datasets using only three arguments.
arXiv Detail & Related papers (2020-10-26T07:21:29Z)
Tighter risk certificates for neural networks [10.462889461373226]
We present two training objectives, used here for the first time in connection with training neural networks. We also re-implement a previously used training objective based on a classical PAC-Bayes bound. We compute risk certificates for the learnt predictors, based on part of the data used to learn the predictors.
arXiv Detail & Related papers (2020-07-25T11:02:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.