Takeuchi's Information Criteria as Generalization Measures for DNNs Close to NTK Regime
- URL: http://arxiv.org/abs/2602.23219v1
- Date: Thu, 26 Feb 2026 17:01:14 GMT
- Title: Takeuchi's Information Criteria as Generalization Measures for DNNs Close to NTK Regime
- Authors: Hiroki Naganuma, Taiji Suzuki, Rio Yokota, Masahiro Nomura, Kohta Ishikawa, Ikuro Sato,
- Abstract summary: Generalization measures have been studied extensively in the machine learning community to better characterize generalization gaps.<n>This study focuses on Takeuchi's information criterion (TIC) to investigate the conditions under which this classical measure can effectively explain the generalization gaps of deep neural networks (DNNs)
- Score: 56.89793618576349
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Generalization measures have been studied extensively in the machine learning community to better characterize generalization gaps. However, establishing a reliable generalization measure for statistically singular models such as deep neural networks (DNNs) is difficult due to their complex nature. This study focuses on Takeuchi's information criterion (TIC) to investigate the conditions under which this classical measure can effectively explain the generalization gaps of DNNs. Importantly, the developed theory indicates the applicability of TIC near the neural tangent kernel (NTK) regime. In a series of experiments, we trained more than 5,000 DNN models with 12 architectures, including large models (e.g., VGG-16), on four datasets, and estimated the corresponding TIC values to examine the relationship between the generalization gap and the TIC estimates. We applied several TIC approximation methods with feasible computational costs and assessed the accuracy trade-off. Our experimental results indicate that the estimated TIC values correlate well with the generalization gap under conditions close to the NTK regime. However, we show both theoretically and empirically that outside the NTK regime such correlation disappears. Finally, we demonstrate that TIC provides better trial pruning ability than existing methods for hyperparameter optimization.
Related papers
- Deep Neural Networks as Iterated Function Systems and a Generalization Bound [2.7920304852537536]
We show that two important deep architectures can be viewed as, or canonically associated with, place-dependent IFS.<n>We derive a Wasserstein bound for generative modeling that controls the collage-type approximation error between the data distribution and its image.
arXiv Detail & Related papers (2026-01-27T07:32:49Z) - Generalization Performance of Hypergraph Neural Networks [21.483543928698676]
We develop margin-based generalization bounds for four representative classes of hypergraph neural networks.<n>Our results reveal the manner in which hypergraph structure and spectral norms of the learned weights can affect the generalization bounds.<n>Our empirical study examines the relationship between the practical performance and theoretical bounds of the models over synthetic and real-world datasets.
arXiv Detail & Related papers (2025-01-22T00:20:26Z) - A practical generalization metric for deep networks benchmarking [4.111474233685893]
This paper introduces a practical generalization metric for benchmarking different deep networks and proposes a novel testbed for the verification of theoretical estimations.
Our findings indicate that a deep network's generalization capacity in classification tasks is contingent upon both classification accuracy and the diversity of unseen data.
It is discouraging to note that most of the available generalization estimations do not correlate with the practical measurements obtained using our proposed practical metric.
arXiv Detail & Related papers (2024-09-02T23:38:25Z) - Error Bounds of Supervised Classification from Information-Theoretic Perspective [0.0]
We explore bounds on the expected risk when using deep neural networks for supervised classification from an information theoretic perspective.
We introduce model risk and fitting error, which are derived from further decomposing the empirical risk.
arXiv Detail & Related papers (2024-06-07T01:07:35Z) - NTK-Guided Few-Shot Class Incremental Learning [47.92720244138099]
We present a novel conceptualization of anti-amnesia in terms of mathematical generalization, leveraging the Neural Tangent Kernel (NTK) perspective.
Our method focuses on two key aspects: ensuring optimal NTK convergence and minimizing NTK-related generalization loss.
Our NTK-FSCIL surpasses contemporary state-of-the-art approaches, elevating end-session accuracy by 2.9% to 9.3%.
arXiv Detail & Related papers (2024-03-19T06:43:46Z) - A PAC-Bayesian Perspective on the Interpolating Information Criterion [54.548058449535155]
We show how a PAC-Bayes bound is obtained for a general class of models, characterizing factors which influence performance in the interpolating regime.
We quantify how the test error for overparameterized models achieving effectively zero training error depends on the quality of the implicit regularization imposed by e.g. the combination of model, parameter-initialization scheme.
arXiv Detail & Related papers (2023-11-13T01:48:08Z) - Efficient kernel surrogates for neural network-based regression [0.8030359871216615]
We study the performance of the Conjugate Kernel (CK), an efficient approximation to the Neural Tangent Kernel (NTK)
We show that the CK performance is only marginally worse than that of the NTK and, in certain cases, is shown to be superior.
In addition to providing a theoretical grounding for using CKs instead of NTKs, our framework suggests a recipe for improving DNN accuracy inexpensively.
arXiv Detail & Related papers (2023-10-28T06:41:47Z) - Towards Demystifying the Generalization Behaviors When Neural Collapse
Emerges [132.62934175555145]
Neural Collapse (NC) is a well-known phenomenon of deep neural networks in the terminal phase of training (TPT)
We propose a theoretical explanation for why continuing training can still lead to accuracy improvement on test set, even after the train accuracy has reached 100%.
We refer to this newly discovered property as "non-conservative generalization"
arXiv Detail & Related papers (2023-10-12T14:29:02Z) - Benign Overfitting in Deep Neural Networks under Lazy Training [72.28294823115502]
We show that when the data distribution is well-separated, DNNs can achieve Bayes-optimal test error for classification.
Our results indicate that interpolating with smoother functions leads to better generalization.
arXiv Detail & Related papers (2023-05-30T19:37:44Z) - Adversarial Estimators [0.0]
We develop an theory of adversarial estimators (A-estimators')
We present results characterizing the convergence rates of A-estimators under both point-wise and partial identification.
Our theory also yields the normality of general functionals of neural network M-estimators.
arXiv Detail & Related papers (2022-04-22T04:39:44Z) - Consistency and Monotonicity Regularization for Neural Knowledge Tracing [50.92661409499299]
Knowledge Tracing (KT) tracking a human's knowledge acquisition is a central component in online learning and AI in Education.
We propose three types of novel data augmentation, coined replacement, insertion, and deletion, along with corresponding regularization losses.
Extensive experiments on various KT benchmarks show that our regularization scheme consistently improves the model performances.
arXiv Detail & Related papers (2021-05-03T02:36:29Z) - On Connections between Regularizations for Improving DNN Robustness [67.28077776415724]
This paper analyzes regularization terms proposed recently for improving the adversarial robustness of deep neural networks (DNNs)
We study possible connections between several effective methods, including input-gradient regularization, Jacobian regularization, curvature regularization, and a cross-Lipschitz functional.
arXiv Detail & Related papers (2020-07-04T23:43:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.