Related papers: Do Mice Grok? Glimpses of Hidden Progress During Overtraining in Sensory Cortex

Do Mice Grok? Glimpses of Hidden Progress During Overtraining in Sensory Cortex

URL: http://arxiv.org/abs/2411.03541v1
Date: Tue, 05 Nov 2024 22:42:49 GMT
Title: Do Mice Grok? Glimpses of Hidden Progress During Overtraining in Sensory Cortex
Authors: Tanishq Kumar, Blake Bordelon, Cengiz Pehlevan, Venkatesh N. Murthy, Samuel J. Gershman,
Abstract summary: We find evidence for such learning in mice following continued training on a task, long after behavior saturates at near-ceiling performance ("overtraining") We demonstrate that class representations in cortex continue to separate during overtraining, so that examples that were incorrectly classified at the beginning of overtraining can abruptly be correctly classified later on, despite no changes in behavior during that time. We conclude by showing how this model of late-time feature learning implies an explanation for the empirical puzzle overtraining reversal in animal learning.
Score: 32.79706360108185
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Does learning of task-relevant representations stop when behavior stops changing? Motivated by recent theoretical advances in machine learning and the intuitive observation that human experts continue to learn from practice even after mastery, we hypothesize that task-specific representation learning can continue, even when behavior plateaus. In a novel reanalysis of recently published neural data, we find evidence for such learning in posterior piriform cortex of mice following continued training on a task, long after behavior saturates at near-ceiling performance ("overtraining"). This learning is marked by an increase in decoding accuracy from piriform neural populations and improved performance on held-out generalization tests. We demonstrate that class representations in cortex continue to separate during overtraining, so that examples that were incorrectly classified at the beginning of overtraining can abruptly be correctly classified later on, despite no changes in behavior during that time. We hypothesize this hidden yet rich learning takes the form of approximate margin maximization; we validate this and other predictions in the neural data, as well as build and interpret a simple synthetic model that recapitulates these phenomena. We conclude by showing how this model of late-time feature learning implies an explanation for the empirical puzzle of overtraining reversal in animal learning, where task-specific representations are more robust to particular task changes because the learned features can be reused.

Related papers

How Weight Resampling and Optimizers Shape the Dynamics of Continual Learning and Forgetting in Neural Networks [2.270857464465579]
Recent work in continual learning has highlighted the beneficial effect of resampling weights in the last layer of a neural network (zapping)<n>We investigate in detail the pattern of learning and forgetting that take place inside a convolutional neural network when trained in challenging settings.
arXiv Detail & Related papers (2025-07-02T10:18:35Z)
On the Dynamics Under the Unhinged Loss and Beyond [104.49565602940699]
We introduce the unhinged loss, a concise loss function, that offers more mathematical opportunities to analyze closed-form dynamics. The unhinged loss allows for considering more practical techniques, such as time-vary learning rates and feature normalization.
arXiv Detail & Related papers (2023-12-13T02:11:07Z)
Theoretical Characterization of How Neural Network Pruning Affects its Generalization [131.1347309639727]
This work makes the first attempt to study how different pruning fractions affect the model's gradient descent dynamics and generalization. It is shown that as long as the pruning fraction is below a certain threshold, gradient descent can drive the training loss toward zero. More surprisingly, the generalization bound gets better as the pruning fraction gets larger.
arXiv Detail & Related papers (2023-01-01T03:10:45Z)
Feature Forgetting in Continual Representation Learning [48.89340526235304]
representations do not suffer from "catastrophic forgetting" even in plain continual learning, but little further fact is known about its characteristics. We devise a protocol for evaluating representation in continual learning, and then use it to present an overview of the basic trends of continual representation learning. To study the feature forgetting problem, we create a synthetic dataset to identify and visualize the prevalence of feature forgetting in neural networks.
arXiv Detail & Related papers (2022-05-26T13:38:56Z)
Probing Representation Forgetting in Supervised and Unsupervised Continual Learning [14.462797749666992]
Catastrophic forgetting is associated with an abrupt loss of knowledge previously learned by a model. We show that representation forgetting can lead to new insights on the effect of model capacity and loss function used in continual learning.
arXiv Detail & Related papers (2022-03-24T23:06:08Z)
An Empirical Investigation of the Role of Pre-training in Lifelong Learning [21.995593026269578]
We show that generic pre-training implicitly alleviates the effects of catastrophic forgetting when learning multiple tasks sequentially. We study this phenomenon by analyzing the loss landscape, finding that pre-trained weights appear to ease forgetting by leading to wider minima.
arXiv Detail & Related papers (2021-12-16T19:00:55Z)
Learning Curves for Sequential Training of Neural Networks: Self-Knowledge Transfer and Forgetting [9.734033555407406]
We consider neural networks in the neural tangent kernel regime that continually learn target functions from task to task. We investigate a variant of continual learning where the model learns the same target function in multiple tasks. Even for the same target, the trained model shows some transfer and forgetting depending on the sample size of each task.
arXiv Detail & Related papers (2021-12-03T00:25:01Z)
When and how epochwise double descent happens [7.512375012141203]
An epochwise double descent' effect exists in which the generalization error initially drops, then rises, and finally drops again with increasing training time. This presents a practical problem in that the amount of time required for training is long, and early stopping based on validation performance may result in suboptimal generalization. We show that epochwise double descent requires a critical amount of noise to occur, but above a second critical noise level early stopping remains effective.
arXiv Detail & Related papers (2021-08-26T19:19:17Z)
A study on the plasticity of neural networks [21.43675319928863]
We discuss the implication of losing plasticity for continual learning. We show that a pretrained model on data from the same distribution as the one it is fine-tuned on might not reach the same generalisation as a freshly initialised model.
arXiv Detail & Related papers (2021-05-31T18:21:06Z)
Reducing Representation Drift in Online Continual Learning [87.71558506591937]
We study the online continual learning paradigm, where agents must learn from a changing distribution with constrained memory and compute. In this work we instead focus on the change in representations of previously observed data due to the introduction of previously unobserved class samples in the incoming data stream.
arXiv Detail & Related papers (2021-04-11T15:19:30Z)
Usable Information and Evolution of Optimal Representations During Training [79.38872675793813]
In particular, we find that semantically meaningful but ultimately irrelevant information is encoded in the early transient dynamics of training. We show these effects on both perceptual decision-making tasks inspired by literature, as well as on standard image classification tasks.
arXiv Detail & Related papers (2020-10-06T03:50:19Z)
Automatic Recall Machines: Internal Replay, Continual Learning and the Brain [104.38824285741248]
Replay in neural networks involves training on sequential data with memorized samples, which counteracts forgetting of previous behavior caused by non-stationarity. We present a method where these auxiliary samples are generated on the fly, given only the model that is being trained for the assessed objective. Instead the implicit memory of learned samples within the assessed model itself is exploited.
arXiv Detail & Related papers (2020-06-22T15:07:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.