Related papers: How Feature Learning Can Improve Neural Scaling Laws

How Feature Learning Can Improve Neural Scaling Laws

URL: http://arxiv.org/abs/2409.17858v1
Date: Thu, 26 Sep 2024 14:05:32 GMT
Title: How Feature Learning Can Improve Neural Scaling Laws
Authors: Blake Bordelon, Alexander Atanasov, Cengiz Pehlevan
Abstract summary: We develop a solvable model of neural scaling laws beyond the kernel limit. We show how performance scales with model size, training time, and the total amount of available data.
Score: 86.9540615081759
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We develop a solvable model of neural scaling laws beyond the kernel limit. Theoretical analysis of this model shows how performance scales with model size, training time, and the total amount of available data. We identify three scaling regimes corresponding to varying task difficulties: hard, easy, and super easy tasks. For easy and super-easy target functions, which lie in the reproducing kernel Hilbert space (RKHS) defined by the initial infinite-width Neural Tangent Kernel (NTK), the scaling exponents remain unchanged between feature learning and kernel regime models. For hard tasks, defined as those outside the RKHS of the initial NTK, we demonstrate both analytically and empirically that feature learning can improve scaling with training time and compute, nearly doubling the exponent for hard tasks. This leads to a different compute optimal strategy to scale parameters and training time in the feature learning regime. We support our finding that feature learning improves the scaling law for hard tasks but not for easy and super-easy tasks with experiments of nonlinear MLPs fitting functions with power-law Fourier spectra on the circle and CNNs learning vision tasks.

Related papers

Trainability, Expressivity and Interpretability in Gated Neural ODEs [0.0]
We introduce a novel measure of expressivity which probes the capacity of a neural network to generate complex trajectories. We show how reduced-dimensional gnODEs retain their modeling power while greatly improving interpretability. We also demonstrate the benefit of gating in nODEs on several real-world tasks.
arXiv Detail & Related papers (2023-07-12T18:29:01Z)
Robust Learning with Progressive Data Expansion Against Spurious Correlation [65.83104529677234]
We study the learning process of a two-layer nonlinear convolutional neural network in the presence of spurious features. Our analysis suggests that imbalanced data groups and easily learnable spurious features can lead to the dominance of spurious features during the learning process. We propose a new training algorithm called PDE that efficiently enhances the model's robustness for a better worst-group performance.
arXiv Detail & Related papers (2023-06-08T05:44:06Z)
Learning to Optimize Permutation Flow Shop Scheduling via Graph-based Imitation Learning [70.65666982566655]
Permutation flow shop scheduling (PFSS) is widely used in manufacturing systems. We propose to train the model via expert-driven imitation learning, which accelerates convergence more stably and accurately. Our model's network parameters are reduced to only 37% of theirs, and the solution gap of our model towards the expert solutions decreases from 6.8% to 1.3% on average.
arXiv Detail & Related papers (2022-10-31T09:46:26Z)
TKIL: Tangent Kernel Approach for Class Balanced Incremental Learning [4.822598110892847]
Class incremental learning methods aim to keep a memory of a few exemplars from previously learned tasks, and distilling knowledge from them. Existing methods struggle to balance the performance across classes since they typically overfit the model to the latest task. We introduce a novel methodology of Tangent Kernel for Incremental Learning (TKIL) achieves that class-balanced performance.
arXiv Detail & Related papers (2022-06-17T00:20:54Z)
Semi-Parametric Inducing Point Networks and Neural Processes [15.948270454686197]
Semi-parametric inducing point networks (SPIN) can query the training set at inference time in a compute-efficient manner. SPIN attains linear complexity via a cross-attention mechanism between datapoints inspired by inducing point methods. In our experiments, SPIN reduces memory requirements, improves accuracy across a range of meta-learning tasks, and improves state-of-the-art performance on an important practical problem, genotype imputation.
arXiv Detail & Related papers (2022-05-24T01:42:46Z)
NeuralEF: Deconstructing Kernels by Deep Neural Networks [47.54733625351363]
Traditional nonparametric solutions based on the Nystr"om formula suffer from scalability issues. Recent work has resorted to a parametric approach, i.e., training neural networks to approximate the eigenfunctions. We show that these problems can be fixed by using a new series of objective functions that generalizes to space of supervised and unsupervised learning problems.
arXiv Detail & Related papers (2022-04-30T05:31:07Z)
Inducing Gaussian Process Networks [80.40892394020797]
We propose inducing Gaussian process networks (IGN), a simple framework for simultaneously learning the feature space as well as the inducing points. The inducing points, in particular, are learned directly in the feature space, enabling a seamless representation of complex structured domains. We report on experimental results for real-world data sets showing that IGNs provide significant advances over state-of-the-art methods.
arXiv Detail & Related papers (2022-04-21T05:27:09Z)
FG-Net: Fast Large-Scale LiDAR Point CloudsUnderstanding Network Leveraging CorrelatedFeature Mining and Geometric-Aware Modelling [15.059508985699575]
FG-Net is a general deep learning framework for large-scale point clouds understanding without voxelizations. We propose a deep convolutional neural network leveraging correlated feature mining and deformable convolution based geometric-aware modelling. Our approaches outperform state-of-the-art approaches in terms of accuracy and efficiency.
arXiv Detail & Related papers (2020-12-17T08:20:09Z)
Learning the Linear Quadratic Regulator from Nonlinear Observations [135.66883119468707]
We introduce a new problem setting for continuous control called the LQR with Rich Observations, or RichLQR. In our setting, the environment is summarized by a low-dimensional continuous latent state with linear dynamics and quadratic costs. Our results constitute the first provable sample complexity guarantee for continuous control with an unknown nonlinearity in the system model and general function approximation.
arXiv Detail & Related papers (2020-10-08T07:02:47Z)
Learning to Learn Kernels with Variational Random Features [118.09565227041844]
We introduce kernels with random Fourier features in the meta-learning framework to leverage their strong few-shot learning ability. We formulate the optimization of MetaVRF as a variational inference problem. We show that MetaVRF delivers much better, or at least competitive, performance compared to existing meta-learning alternatives.
arXiv Detail & Related papers (2020-06-11T18:05:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.