Related papers: Approximation and Non-parametric Estimation of ResNet-type Convolutional Neural Networks

Approximation and Non-parametric Estimation of ResNet-type Convolutional Neural Networks

URL: http://arxiv.org/abs/1903.10047v4
Date: Sun, 13 Aug 2023 09:04:34 GMT
Title: Approximation and Non-parametric Estimation of ResNet-type Convolutional Neural Networks
Authors: Kenta Oono, Taiji Suzuki
Abstract summary: We show a ResNet-type CNN can attain the minimax optimal error rates in important function classes. We derive approximation and estimation error rates of the aformentioned type of CNNs for the Barron and H"older classes.
Score: 52.972605601174955
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Convolutional neural networks (CNNs) have been shown to achieve optimal approximation and estimation error rates (in minimax sense) in several function classes. However, previous analyzed optimal CNNs are unrealistically wide and difficult to obtain via optimization due to sparse constraints in important function classes, including the H\"older class. We show a ResNet-type CNN can attain the minimax optimal error rates in these classes in more plausible situations -- it can be dense, and its width, channel size, and filter size are constant with respect to sample size. The key idea is that we can replicate the learning ability of Fully-connected neural networks (FNNs) by tailored CNNs, as long as the FNNs have \textit{block-sparse} structures. Our theory is general in a sense that we can automatically translate any approximation rate achieved by block-sparse FNNs into that by CNNs. As an application, we derive approximation and estimation error rates of the aformentioned type of CNNs for the Barron and H\"older classes with the same strategy.

Related papers

Model Parallel Training and Transfer Learning for Convolutional Neural Networks by Domain Decomposition [0.0]
Deep convolutional neural networks (CNNs) have been shown to be very successful in a wide range of image processing applications. Due to their increasing number of model parameters and an increasing availability of large amounts of training data, parallelization strategies to efficiently train complex CNNs are necessary.
arXiv Detail & Related papers (2024-08-26T17:35:01Z)
On the rates of convergence for learning with convolutional neural networks [9.772773527230134]
We study approximation and learning capacities of convolutional neural networks (CNNs) with one-side zero-padding and multiple channels. We derive convergence rates for estimators based on CNNs in many learning problems. It is also shown that the obtained rates for classification are minimax optimal in some common settings.
arXiv Detail & Related papers (2024-03-25T06:42:02Z)
PICNN: A Pathway towards Interpretable Convolutional Neural Networks [12.31424771480963]
We introduce a novel pathway to alleviate the entanglement between filters and image classes. We use the Bernoulli sampling to generate the filter-cluster assignment matrix from a learnable filter-class correspondence matrix. We evaluate the effectiveness of our method on ten widely used network architectures.
arXiv Detail & Related papers (2023-12-19T11:36:03Z)
A Proximal Algorithm for Network Slimming [2.8148957592979427]
A popular channel pruning method for convolutional neural networks (CNNs) uses subgradient descent to train CNNs. We develop an alternative algorithm called proximal NS to train CNNs towards sparse, accurate structures. Our experiments demonstrate that after one round of training, proximal NS yields a CNN with competitive accuracy and compression.
arXiv Detail & Related papers (2023-07-02T23:34:12Z)
The Onset of Variance-Limited Behavior for Networks in the Lazy and Rich Regimes [75.59720049837459]
We study the transition from infinite-width behavior to this variance limited regime as a function of sample size $P$ and network width $N$. We find that finite-size effects can become relevant for very small datasets on the order of $P* sim sqrtN$ for regression with ReLU networks.
arXiv Detail & Related papers (2022-12-23T04:48:04Z)
What Can Be Learnt With Wide Convolutional Neural Networks? [69.55323565255631]
We study infinitely-wide deep CNNs in the kernel regime. We prove that deep CNNs adapt to the spatial scale of the target function. We conclude by computing the generalisation error of a deep CNN trained on the output of another deep CNN.
arXiv Detail & Related papers (2022-08-01T17:19:32Z)
Towards a General Purpose CNN for Long Range Dependencies in $\mathrm{N}$D [49.57261544331683]
We propose a single CNN architecture equipped with continuous convolutional kernels for tasks on arbitrary resolution, dimensionality and length without structural changes. We show the generality of our approach by applying the same CCNN to a wide set of tasks on sequential (1$mathrmD$) and visual data (2$mathrmD$) Our CCNN performs competitively and often outperforms the current state-of-the-art across all tasks considered.
arXiv Detail & Related papers (2022-06-07T15:48:02Z)
Rethinking Nearest Neighbors for Visual Classification [56.00783095670361]
k-NN is a lazy learning method that aggregates the distance between the test image and top-k neighbors in a training set. We adopt k-NN with pre-trained visual representations produced by either supervised or self-supervised methods in two steps. Via extensive experiments on a wide range of classification tasks, our study reveals the generality and flexibility of k-NN integration.
arXiv Detail & Related papers (2021-12-15T20:15:01Z)
BreakingBED -- Breaking Binary and Efficient Deep Neural Networks by Adversarial Attacks [65.2021953284622]
We study robustness of CNNs against white-box and black-box adversarial attacks. Results are shown for distilled CNNs, agent-based state-of-the-art pruned models, and binarized neural networks.
arXiv Detail & Related papers (2021-03-14T20:43:19Z)
Multichannel CNN with Attention for Text Classification [5.1545224296246275]
This paper proposes Attention-based Multichannel Convolutional Neural Network (AMCNN) for text classification. AMCNN uses a bi-directional long short-term memory to encode the history and future information of words into high dimensional representations. The experimental results on the benchmark datasets demonstrate that AMCNN achieves better performance than state-of-the-art methods.
arXiv Detail & Related papers (2020-06-29T16:37:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.