Related papers: Channel-Wise Early Stopping without a Validation Set via NNK Polytope Interpolation

Channel-Wise Early Stopping without a Validation Set via NNK Polytope Interpolation

URL: http://arxiv.org/abs/2107.12972v1
Date: Tue, 27 Jul 2021 17:33:30 GMT
Title: Channel-Wise Early Stopping without a Validation Set via NNK Polytope Interpolation
Authors: David Bonet, Antonio Ortega, Javier Ruiz-Hidalgo, Sarath Shekkizhar
Abstract summary: Convolutional neural networks (ConvNets) comprise high-dimensional feature spaces formed by the aggregation of multiple channels. We present channel-wise DeepNNK, a novel generalization estimate based on non-dimensional kernel regression (NNK) graphs.
Score: 36.479195100553085
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: State-of-the-art neural network architectures continue to scale in size and deliver impressive generalization results, although this comes at the expense of limited interpretability. In particular, a key challenge is to determine when to stop training the model, as this has a significant impact on generalization. Convolutional neural networks (ConvNets) comprise high-dimensional feature spaces formed by the aggregation of multiple channels, where analyzing intermediate data representations and the model's evolution can be challenging owing to the curse of dimensionality. We present channel-wise DeepNNK (CW-DeepNNK), a novel channel-wise generalization estimate based on non-negative kernel regression (NNK) graphs with which we perform local polytope interpolation on low-dimensional channels. This method leads to instance-based interpretability of both the learned data representations and the relationship between channels. Motivated by our observations, we use CW-DeepNNK to propose a novel early stopping criterion that (i) does not require a validation set, (ii) is based on a task performance metric, and (iii) allows stopping to be reached at different points for each channel. Our experiments demonstrate that our proposed method has advantages as compared to the standard criterion based on validation set performance.

Related papers

Observation Noise and Initialization in Wide Neural Networks [9.163214210191814]
We introduce a textitshifted network that enables arbitrary prior mean functions. Our theoretical insights are validated empirically, with experiments exploring different values of observation noise and network architectures.
arXiv Detail & Related papers (2025-02-03T17:39:45Z)
Information-Theoretic Generalization Bounds for Deep Neural Networks [22.87479366196215]
Deep neural networks (DNNs) exhibit an exceptional capacity for generalization in practical applications. This work aims to capture the effect and benefits of depth for supervised learning via information-theoretic generalization bounds.
arXiv Detail & Related papers (2024-04-04T03:20:35Z)
Revealing Decurve Flows for Generalized Graph Propagation [108.80758541147418]
This study addresses the limitations of the traditional analysis of message-passing, central to graph learning, by defining em textbfgeneralized propagation with directed and weighted graphs. We include a preliminary exploration of learned propagation patterns in datasets, a first in the field.
arXiv Detail & Related papers (2024-02-13T14:13:17Z)
Small Object Detection via Coarse-to-fine Proposal Generation and Imitation Learning [52.06176253457522]
We propose a two-stage framework tailored for small object detection based on the Coarse-to-fine pipeline and Feature Imitation learning. CFINet achieves state-of-the-art performance on the large-scale small object detection benchmarks, SODA-D and SODA-A.
arXiv Detail & Related papers (2023-08-18T13:13:09Z)
Interpolation-based Correlation Reduction Network for Semi-Supervised Graph Learning [49.94816548023729]
We propose a novel graph contrastive learning method, termed Interpolation-based Correlation Reduction Network (ICRN) In our method, we improve the discriminative capability of the latent feature by enlarging the margin of decision boundaries. By combining the two settings, we extract rich supervision information from both the abundant unlabeled nodes and the rare yet valuable labeled nodes for discnative representation learning.
arXiv Detail & Related papers (2022-06-06T14:26:34Z)
On Feature Learning in Neural Networks with Global Convergence Guarantees [49.870593940818715]
We study the optimization of wide neural networks (NNs) via gradient flow (GF) We show that when the input dimension is no less than the size of the training set, the training loss converges to zero at a linear rate under GF. We also show empirically that, unlike in the Neural Tangent Kernel (NTK) regime, our multi-layer model exhibits feature learning and can achieve better generalization performance than its NTK counterpart.
arXiv Detail & Related papers (2022-04-22T15:56:43Z)
Low Complexity Channel estimation with Neural Network Solutions [1.0499453838486013]
We deploy a general residual convolutional neural network to achieve channel estimation in a downlink scenario. Compared with other deep learning methods for channel estimation, our results suggest improved mean squared error computation.
arXiv Detail & Related papers (2022-01-24T19:55:10Z)
Exploring Gradient Flow Based Saliency for DNN Model Compression [21.993801817422572]
Model pruning aims to reduce the deep neural network (DNN) model size or computational overhead. Traditional model pruning methods that evaluates the channel significance for DNN pay too much attention to the local analysis of each channel. We propose a new model pruning method from a new perspective of flow in this paper.
arXiv Detail & Related papers (2021-10-24T16:09:40Z)
Channel redundancy and overlap in convolutional neural networks with channel-wise NNK graphs [36.479195100553085]
Feature spaces in the deep layers of convolutional neural networks (CNNs) are often very high-dimensional and difficult to interpret. We analyze theoretically channel-wise non-negative kernel (CW-NNK) regression graphs to quantify the overlap between channels. We find that redundancy between channels is significant and varies with the layer depth and the level of regularization.
arXiv Detail & Related papers (2021-10-18T22:50:07Z)
Modeling from Features: a Mean-field Framework for Over-parameterized Deep Neural Networks [54.27962244835622]
This paper proposes a new mean-field framework for over- parameterized deep neural networks (DNNs) In this framework, a DNN is represented by probability measures and functions over its features in the continuous limit. We illustrate the framework via the standard DNN and the Residual Network (Res-Net) architectures.
arXiv Detail & Related papers (2020-07-03T01:37:16Z)
Bayesian Graph Neural Networks with Adaptive Connection Sampling [62.51689735630133]
We propose a unified framework for adaptive connection sampling in graph neural networks (GNNs) The proposed framework not only alleviates over-smoothing and over-fitting tendencies of deep GNNs, but also enables learning with uncertainty in graph analytic tasks with GNNs.
arXiv Detail & Related papers (2020-06-07T07:06:35Z)
Depthwise Non-local Module for Fast Salient Object Detection Using a Single Thread [136.2224792151324]
We propose a new deep learning algorithm for fast salient object detection. The proposed algorithm achieves competitive accuracy and high inference efficiency simultaneously with a single CPU thread.
arXiv Detail & Related papers (2020-01-22T15:23:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.