Related papers: Gradient Descent Robustly Learns the Intrinsic Dimension of Data in Training Convolutional Neural Networks

Gradient Descent Robustly Learns the Intrinsic Dimension of Data in Training Convolutional Neural Networks

URL: http://arxiv.org/abs/2504.08628v1
Date: Fri, 11 Apr 2025 15:29:55 GMT
Title: Gradient Descent Robustly Learns the Intrinsic Dimension of Data in Training Convolutional Neural Networks
Authors: Chenyang Zhang, Peifeng Gao, Difan Zou, Yuan Cao,
Abstract summary: We study the rank of convolutional neural networks (CNNs) trained by gradient descent.<n>We prove that CNNs trained by gradient descent can learn the intrinsic dimension of clean images, despite the presence of relatively large background noises.
Score: 25.691362553629588
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Modern neural networks are usually highly over-parameterized. Behind the wide usage of over-parameterized networks is the belief that, if the data are simple, then the trained network will be automatically equivalent to a simple predictor. Following this intuition, many existing works have studied different notions of "ranks" of neural networks and their relation to the rank of data. In this work, we study the rank of convolutional neural networks (CNNs) trained by gradient descent, with a specific focus on the robustness of the rank to image background noises. Specifically, we point out that, when adding background noises to images, the rank of the CNN trained with gradient descent is affected far less compared with the rank of the data. We support our claim with a theoretical case study, where we consider a particular data model to characterize low-rank clean images with added background noises. We prove that CNNs trained by gradient descent can learn the intrinsic dimension of clean images, despite the presence of relatively large background noises. We also conduct experiments on synthetic and real datasets to further validate our claim.

Related papers

Opening the Black Box: predicting the trainability of deep neural networks with reconstruction entropy [0.0]
We present a method for predicting the trainable regime in parameter space for deep feedforward neural networks (DNNs)<n>We show that a single epoch of training is sufficient to predict the trainability of the deep feedforward network on a range of datasets.
arXiv Detail & Related papers (2024-06-13T18:00:05Z)
Implicit Bias of Gradient Descent for Two-layer ReLU and Leaky ReLU Networks on Nearly-orthogonal Data [66.1211659120882]
The implicit bias towards solutions with favorable properties is believed to be a key reason why neural networks trained by gradient-based optimization can generalize well. While the implicit bias of gradient flow has been widely studied for homogeneous neural networks (including ReLU and leaky ReLU networks), the implicit bias of gradient descent is currently only understood for smooth neural networks.
arXiv Detail & Related papers (2023-10-29T08:47:48Z)
Neural networks trained with SGD learn distributions of increasing complexity [78.30235086565388]
We show that neural networks trained using gradient descent initially classify their inputs using lower-order input statistics. We then exploit higher-order statistics only later during training. We discuss the relation of DSB to other simplicity biases and consider its implications for the principle of universality in learning.
arXiv Detail & Related papers (2022-11-21T15:27:22Z)
How deep convolutional neural networks lose spatial information with training [0.7328100870402177]
We show how stability to image diffeomorphisms is achieved by spatial pooling in the first half of the net, and by channel pooling in the second half. We find that the increased sensitivity to noise is due to the perturbing noise piling up during pooling, after being rectified by ReLU units.
arXiv Detail & Related papers (2022-10-04T10:21:03Z)
Reconstructing Training Data from Trained Neural Networks [42.60217236418818]
We show in some cases a significant fraction of the training data can in fact be reconstructed from the parameters of a trained neural network classifier. We propose a novel reconstruction scheme that stems from recent theoretical results about the implicit bias in training neural networks with gradient-based methods.
arXiv Detail & Related papers (2022-06-15T18:35:16Z)
Neural Maximum A Posteriori Estimation on Unpaired Data for Motion Deblurring [87.97330195531029]
We propose a Neural Maximum A Posteriori (NeurMAP) estimation framework for training neural networks to recover blind motion information and sharp content from unpaired data. The proposed NeurMAP is an approach to existing deblurring neural networks, and is the first framework that enables training image deblurring networks on unpaired datasets.
arXiv Detail & Related papers (2022-04-26T08:09:47Z)
Data-driven emergence of convolutional structure in neural networks [83.4920717252233]
We show how fully-connected neural networks solving a discrimination task can learn a convolutional structure directly from their inputs. By carefully designing data models, we show that the emergence of this pattern is triggered by the non-Gaussian, higher-order local structure of the inputs.
arXiv Detail & Related papers (2022-02-01T17:11:13Z)
What Do Neural Networks Learn When Trained With Random Labels? [20.54410239839646]
We study deep neural networks (DNNs) trained on natural image data with entirely random labels. We show analytically for convolutional and fully connected networks that an alignment between the principal components of network parameters and data takes place when training with random labels. We show how this alignment produces a positive transfer: networks pre-trained with random labels train faster downstream compared to training from scratch.
arXiv Detail & Related papers (2020-06-18T12:07:22Z)
The Neural Tangent Link Between CNN Denoisers and Non-Local Filters [4.254099382808598]
Convolutional Neural Networks (CNNs) are now a well-established tool for solving computational imaging problems. We introduce a formal link between such networks through their neural kernel tangent (NTK) and well-known non-local filtering techniques. We evaluate our findings via extensive image denoising experiments.
arXiv Detail & Related papers (2020-06-03T16:50:54Z)
Compressive sensing with un-trained neural networks: Gradient descent finds the smoothest approximation [60.80172153614544]
Un-trained convolutional neural networks have emerged as highly successful tools for image recovery and restoration. We show that an un-trained convolutional neural network can approximately reconstruct signals and images that are sufficiently structured, from a near minimal number of random measurements.
arXiv Detail & Related papers (2020-05-07T15:57:25Z)
Curriculum By Smoothing [52.08553521577014]
Convolutional Neural Networks (CNNs) have shown impressive performance in computer vision tasks such as image classification, detection, and segmentation. We propose an elegant curriculum based scheme that smoothes the feature embedding of a CNN using anti-aliasing or low-pass filters. As the amount of information in the feature maps increases during training, the network is able to progressively learn better representations of the data.
arXiv Detail & Related papers (2020-03-03T07:27:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.