Related papers: The Local Learning Coefficient: A Singularity-Aware Complexity Measure

The Local Learning Coefficient: A Singularity-Aware Complexity Measure

URL: http://arxiv.org/abs/2308.12108v2
Date: Mon, 30 Sep 2024 23:30:37 GMT
Title: The Local Learning Coefficient: A Singularity-Aware Complexity Measure
Authors: Edmund Lau, Zach Furman, George Wang, Daniel Murfet, Susan Wei,
Abstract summary: The Local Learning Coefficient (LLC) is introduced as a novel complexity measure for deep neural networks (DNNs) This paper provides an extensive exploration of the LLC's theoretical underpinnings, offering both a clear definition and intuitive insights into its application. Ultimately, the LLC emerges as a crucial tool for reconciling the apparent contradiction between deep learning's complexity and the principle of parsimony.
Score: 2.1670528702668648
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The Local Learning Coefficient (LLC) is introduced as a novel complexity measure for deep neural networks (DNNs). Recognizing the limitations of traditional complexity measures, the LLC leverages Singular Learning Theory (SLT), which has long recognized the significance of singularities in the loss landscape geometry. This paper provides an extensive exploration of the LLC's theoretical underpinnings, offering both a clear definition and intuitive insights into its application. Moreover, we propose a new scalable estimator for the LLC, which is then effectively applied across diverse architectures including deep linear networks up to 100M parameters, ResNet image models, and transformer language models. Empirical evidence suggests that the LLC provides valuable insights into how training heuristics might influence the effective complexity of DNNs. Ultimately, the LLC emerges as a crucial tool for reconciling the apparent contradiction between deep learning's complexity and the principle of parsimony.

Related papers

Binarized Neural Networks Converge Toward Algorithmic Simplicity: Empirical Support for the Learning-as-Compression Hypothesis [36.24954635616374]
We propose a shift toward algorithmic information theory, using Binarized Neural Networks (BNNs) as a first proxy.<n>We apply the Block Decomposition Method (BDM) and demonstrate it more closely tracks structural changes during training than entropy.<n>These results support the view of training as a process of algorithmic compression, where learning corresponds to the progressive internalization of structured regularities.
arXiv Detail & Related papers (2025-05-27T02:51:36Z)
Sparse Mixture-of-Experts for Compositional Generalization: Empirical Evidence and Theoretical Foundations of Optimal Sparsity [89.81738321188391]
This study investigates the relationship between task complexity and optimal sparsity in SMoE models.<n>We show that the optimal sparsity lies between minimal activation (1-2 experts) and full activation, with the exact number scaling proportionally to task complexity.
arXiv Detail & Related papers (2024-10-17T18:40:48Z)
Convergence Analysis for Deep Sparse Coding via Convolutional Neural Networks [7.956678963695681]
We introduce a novel class of Deep Sparse Coding (DSC) models. We derive convergence rates for CNNs in their ability to extract sparse features. Inspired by the strong connection between sparse coding and CNNs, we explore training strategies to encourage neural networks to learn more sparse features.
arXiv Detail & Related papers (2024-08-10T12:43:55Z)
Unifying Synergies between Self-supervised Learning and Dynamic Computation [53.66628188936682]
We present a novel perspective on the interplay between SSL and DC paradigms. We show that it is feasible to simultaneously learn a dense and gated sub-network from scratch in a SSL setting. The co-evolution during pre-training of both dense and gated encoder offers a good accuracy-efficiency trade-off.
arXiv Detail & Related papers (2023-01-22T17:12:58Z)
Less is More: Rethinking Few-Shot Learning and Recurrent Neural Nets [2.824895388993495]
We provide theoretical guarantees for reliable learning under the information-theoretic AEP. We then focus on a highly efficient recurrent neural net (RNN) framework and propose a reduced-entropy algorithm for few-shot learning. Our experimental results demonstrate significant potential for improving learning models' sample efficiency, generalization, and time complexity.
arXiv Detail & Related papers (2022-09-28T17:33:11Z)
Semi-Parametric Inducing Point Networks and Neural Processes [15.948270454686197]
Semi-parametric inducing point networks (SPIN) can query the training set at inference time in a compute-efficient manner. SPIN attains linear complexity via a cross-attention mechanism between datapoints inspired by inducing point methods. In our experiments, SPIN reduces memory requirements, improves accuracy across a range of meta-learning tasks, and improves state-of-the-art performance on an important practical problem, genotype imputation.
arXiv Detail & Related papers (2022-05-24T01:42:46Z)
Deep Architecture Connectivity Matters for Its Convergence: A Fine-Grained Analysis [94.64007376939735]
We theoretically characterize the impact of connectivity patterns on the convergence of deep neural networks (DNNs) under gradient descent training. We show that by a simple filtration on "unpromising" connectivity patterns, we can trim down the number of models to evaluate.
arXiv Detail & Related papers (2022-05-11T17:43:54Z)
An Information-Theoretic Framework for Supervised Learning [22.280001450122175]
We propose a novel information-theoretic framework with its own notions of regret and sample complexity. We study the sample complexity of learning from data generated by deep neural networks with ReLU activation units. We conclude by corroborating our theoretical results with experimental analysis of random single-hidden-layer neural networks.
arXiv Detail & Related papers (2022-03-01T05:58:28Z)
Intrinsic Dimension, Persistent Homology and Generalization in Neural Networks [19.99615698375829]
We show that generalization error can be equivalently bounded in terms of a notion called the 'persistent homology dimension' (PHD) We develop an efficient algorithm to estimate PHD in the scale of modern deep neural networks. Our experiments show that the proposed approach can efficiently compute a network's intrinsic dimension in a variety of settings.
arXiv Detail & Related papers (2021-11-25T17:06:15Z)
Reinforcement Learning with External Knowledge by using Logical Neural Networks [67.46162586940905]
A recent neuro-symbolic framework called the Logical Neural Networks (LNNs) can simultaneously provide key-properties of both neural networks and symbolic logic. We propose an integrated method that enables model-free reinforcement learning from external knowledge sources.
arXiv Detail & Related papers (2021-03-03T12:34:59Z)
Learning Connectivity of Neural Networks from a Topological Perspective [80.35103711638548]
We propose a topological perspective to represent a network into a complete graph for analysis. By assigning learnable parameters to the edges which reflect the magnitude of connections, the learning process can be performed in a differentiable manner. This learning process is compatible with existing networks and owns adaptability to larger search spaces and different tasks.
arXiv Detail & Related papers (2020-08-19T04:53:31Z)
Neural Complexity Measures [96.06344259626127]
We propose Neural Complexity (NC), a meta-learning framework for predicting generalization. Our model learns a scalar complexity measure through interactions with many heterogeneous tasks in a data-driven way.
arXiv Detail & Related papers (2020-08-07T02:12:10Z)
Belief Propagation Reloaded: Learning BP-Layers for Labeling Problems [83.98774574197613]
We take one of the simplest inference methods, a truncated max-product Belief propagation, and add what is necessary to make it a proper component of a deep learning model. This BP-Layer can be used as the final or an intermediate block in convolutional neural networks (CNNs) The model is applicable to a range of dense prediction problems, is well-trainable and provides parameter-efficient and robust solutions in stereo, optical flow and semantic segmentation.
arXiv Detail & Related papers (2020-03-13T13:11:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.