Related papers: A statistical physics framework for optimal learning

A statistical physics framework for optimal learning

URL: http://arxiv.org/abs/2507.07907v1
Date: Thu, 10 Jul 2025 16:39:46 GMT
Title: A statistical physics framework for optimal learning
Authors: Francesca Mignacco, Francesco Mori,
Abstract summary: We combine statistical physics with control theory in a unified theoretical framework to identify optimal protocols in neural network models.<n>We formulate the design of learning protocols as an optimal control problem directly on the dynamics order parameters.<n>This framework encompasses a variety of learning scenarios, optimization constraints, and control budgets.
Score: 1.243080988483032
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Learning is a complex dynamical process shaped by a range of interconnected decisions. Careful design of hyperparameter schedules for artificial neural networks or efficient allocation of cognitive resources by biological learners can dramatically affect performance. Yet, theoretical understanding of optimal learning strategies remains sparse, especially due to the intricate interplay between evolving meta-parameters and nonlinear learning dynamics. The search for optimal protocols is further hindered by the high dimensionality of the learning space, often resulting in predominantly heuristic, difficult to interpret, and computationally demanding solutions. Here, we combine statistical physics with control theory in a unified theoretical framework to identify optimal protocols in prototypical neural network models. In the high-dimensional limit, we derive closed-form ordinary differential equations that track online stochastic gradient descent through low-dimensional order parameters. We formulate the design of learning protocols as an optimal control problem directly on the dynamics of the order parameters with the goal of minimizing the generalization error at the end of training. This framework encompasses a variety of learning scenarios, optimization constraints, and control budgets. We apply it to representative cases, including optimal curricula, adaptive dropout regularization and noise schedules in denoising autoencoders. We find nontrivial yet interpretable strategies highlighting how optimal protocols mediate crucial learning tradeoffs, such as maximizing alignment with informative input directions while minimizing noise fitting. Finally, we show how to apply our framework to real datasets. Our results establish a principled foundation for understanding and designing optimal learning protocols and suggest a path toward a theory of meta-learning grounded in statistical physics.

Related papers

Online Decision-Focused Learning [63.83903681295497]
Decision-focused learning (DFL) is an increasingly popular paradigm for training predictive models whose outputs are used in decision-making tasks.<n>We investigate DFL in dynamic environments where the objective function does not evolve over time.<n>We establish bounds on the expected dynamic regret, both when decision space is a simplex and when it is a general bounded convex polytope.
arXiv Detail & Related papers (2025-05-19T10:40:30Z)
End-to-End Learning Framework for Solving Non-Markovian Optimal Control [9.156265463755807]
We propose an innovative system identification method control strategy for FOLTI systems.<n>We also develop the first end-to-end data-driven learning framework, Fractional-Order Learning for Optimal Control (FOLOC)
arXiv Detail & Related papers (2025-02-07T04:18:56Z)
Integrating Optimization Theory with Deep Learning for Wireless Network Design [38.257335693563554]
Traditional wireless network design relies on optimization algorithms derived from domain-specific mathematical models.<n>Deep learning has emerged as a promising alternative to overcome complexity and adaptability concerns.<n>This paper introduces a novel approach that integrates optimization theory with deep learning methodologies to address these issues.
arXiv Detail & Related papers (2024-12-11T20:27:48Z)
Optimal Protocols for Continual Learning via Statistical Physics and Control Theory [6.69087470775851]
Artificial neural networks often struggle with catastrophic forgetting when learning multiple tasks sequentially.<n>Recent theoretical work has addressed this issue by analysing learning curves in synthetic frameworks under training protocols.<n>We fill this gap by combining exact equations for training dynamics, derived using statistical physics techniques, with optimal control methods.<n>Our theoretical analysis offers non-trivial yet interpretable strategies for mitigating catastrophic forgetting.
arXiv Detail & Related papers (2024-09-26T17:01:41Z)
Meta-Learning Strategies through Value Maximization in Neural Networks [7.285835869818669]
We present a learning effort framework capable of efficiently optimizing control signals on a fully normative objective. We apply this framework to investigate the effect of approximations in common meta-learning algorithms. Across settings, we find that control effort is most beneficial when applied to easier aspects of a task early in learning.
arXiv Detail & Related papers (2023-10-30T18:29:26Z)
Online Network Source Optimization with Graph-Kernel MAB [62.6067511147939]
We propose Grab-UCB, a graph- kernel multi-arms bandit algorithm to learn online the optimal source placement in large scale networks. We describe the network processes with an adaptive graph dictionary model, which typically leads to sparse spectral representations. We derive the performance guarantees that depend on network parameters, which further influence the learning curve of the sequential decision strategy.
arXiv Detail & Related papers (2023-07-07T15:03:42Z)
Towards a Better Theoretical Understanding of Independent Subnetwork Training [56.24689348875711]
We take a closer theoretical look at Independent Subnetwork Training (IST) IST is a recently proposed and highly effective technique for solving the aforementioned problems. We identify fundamental differences between IST and alternative approaches, such as distributed methods with compressed communication.
arXiv Detail & Related papers (2023-06-28T18:14:22Z)
Smoothed Online Learning for Prediction in Piecewise Affine Systems [43.64498536409903]
This paper builds on the recently developed smoothed online learning framework. It provides the first algorithms for prediction and simulation in piecewise affine systems.
arXiv Detail & Related papers (2023-01-26T15:54:14Z)
Annealing Optimization for Progressive Learning with Stochastic Approximation [0.0]
We introduce a learning model designed to meet the needs of applications in which computational resources are limited. We develop an online prototype-based learning algorithm that is formulated as an online-free gradient approximation algorithm. The learning model can be viewed as an interpretable and progressively growing competitive neural network model to be used for supervised, unsupervised, and reinforcement learning.
arXiv Detail & Related papers (2022-09-06T21:31:01Z)
Nonparametric Estimation of Heterogeneous Treatment Effects: From Theory to Learning Algorithms [91.3755431537592]
We analyze four broad meta-learning strategies which rely on plug-in estimation and pseudo-outcome regression. We highlight how this theoretical reasoning can be used to guide principled algorithm design and translate our analyses into practice.
arXiv Detail & Related papers (2021-01-26T17:11:40Z)
Belief Propagation Reloaded: Learning BP-Layers for Labeling Problems [83.98774574197613]
We take one of the simplest inference methods, a truncated max-product Belief propagation, and add what is necessary to make it a proper component of a deep learning model. This BP-Layer can be used as the final or an intermediate block in convolutional neural networks (CNNs) The model is applicable to a range of dense prediction problems, is well-trainable and provides parameter-efficient and robust solutions in stereo, optical flow and semantic segmentation.
arXiv Detail & Related papers (2020-03-13T13:11:35Z)
Information Theoretic Model Predictive Q-Learning [64.74041985237105]
We present a novel theoretical connection between information theoretic MPC and entropy regularized RL. We develop a Q-learning algorithm that can leverage biased models.
arXiv Detail & Related papers (2019-12-31T00:29:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.