ResMem: Learn what you can and memorize the rest
- URL: http://arxiv.org/abs/2302.01576v2
- Date: Fri, 20 Oct 2023 22:52:08 GMT
- Title: ResMem: Learn what you can and memorize the rest
- Authors: Zitong Yang, Michal Lukasik, Vaishnavh Nagarajan, Zonglin Li, Ankit
Singh Rawat, Manzil Zaheer, Aditya Krishna Menon, Sanjiv Kumar
- Abstract summary: We propose the residual-memorization (ResMem) algorithm to augment an existing prediction model.
By construction, ResMem can explicitly memorize the training labels.
We show that ResMem consistently improves the test set generalization of the original prediction model.
- Score: 79.19649788662511
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The impressive generalization performance of modern neural networks is
attributed in part to their ability to implicitly memorize complex training
patterns. Inspired by this, we explore a novel mechanism to improve model
generalization via explicit memorization. Specifically, we propose the
residual-memorization (ResMem) algorithm, a new method that augments an
existing prediction model (e.g. a neural network) by fitting the model's
residuals with a $k$-nearest neighbor based regressor. The final prediction is
then the sum of the original model and the fitted residual regressor. By
construction, ResMem can explicitly memorize the training labels. Empirically,
we show that ResMem consistently improves the test set generalization of the
original prediction model across various standard vision and natural language
processing benchmarks. Theoretically, we formulate a stylized linear regression
problem and rigorously show that ResMem results in a more favorable test risk
over the base predictor.
Related papers
- Scaling and renormalization in high-dimensional regression [72.59731158970894]
This paper presents a succinct derivation of the training and generalization performance of a variety of high-dimensional ridge regression models.
We provide an introduction and review of recent results on these topics, aimed at readers with backgrounds in physics and deep learning.
arXiv Detail & Related papers (2024-05-01T15:59:00Z) - Recurrent Reinforcement Learning with Memoroids [11.302674177386383]
We study memory models such as Recurrent Neural Networks (RNNs) and Transformers, by mapping trajectories to latent Markov states.
Neither model scales particularly well to long sequences, especially compared to an emerging class of memory models called Linear Recurrent Models.
We reformulate existing models using a novel monoid-based framework that we call memoroids.
arXiv Detail & Related papers (2024-02-15T11:56:53Z) - Theoretical Characterization of the Generalization Performance of
Overfitted Meta-Learning [70.52689048213398]
This paper studies the performance of overfitted meta-learning under a linear regression model with Gaussian features.
We find new and interesting properties that do not exist in single-task linear regression.
Our analysis suggests that benign overfitting is more significant and easier to observe when the noise and the diversity/fluctuation of the ground truth of each training task are large.
arXiv Detail & Related papers (2023-04-09T20:36:13Z) - Return of the RNN: Residual Recurrent Networks for Invertible Sentence
Embeddings [0.0]
This study presents a novel model for invertible sentence embeddings using a residual recurrent network trained on an unsupervised encoding task.
Rather than the probabilistic outputs common to neural machine translation models, our approach employs a regression-based output layer to reconstruct the input sequence's word vectors.
The model achieves high accuracy and fast training with the ADAM, a significant finding given that RNNs typically require memory units, such as LSTMs, or second-order optimization methods.
arXiv Detail & Related papers (2023-03-23T15:59:06Z) - Measuring and Reducing Model Update Regression in Structured Prediction
for NLP [31.86240946966003]
backward compatibility requires that the new model does not regress on cases that were correctly handled by its predecessor.
This work studies model update regression in structured prediction tasks.
We propose a simple and effective method, Backward-Congruent Re-ranking (BCR), by taking into account the characteristics of structured output.
arXiv Detail & Related papers (2022-02-07T07:04:54Z) - Regression Bugs Are In Your Model! Measuring, Reducing and Analyzing
Regressions In NLP Model Updates [68.09049111171862]
This work focuses on quantifying, reducing and analyzing regression errors in the NLP model updates.
We formulate the regression-free model updates into a constrained optimization problem.
We empirically analyze how model ensemble reduces regression.
arXiv Detail & Related papers (2021-05-07T03:33:00Z) - A Bayesian Perspective on Training Speed and Model Selection [51.15664724311443]
We show that a measure of a model's training speed can be used to estimate its marginal likelihood.
We verify our results in model selection tasks for linear models and for the infinite-width limit of deep neural networks.
Our results suggest a promising new direction towards explaining why neural networks trained with gradient descent are biased towards functions that generalize well.
arXiv Detail & Related papers (2020-10-27T17:56:14Z) - Automatic Recall Machines: Internal Replay, Continual Learning and the
Brain [104.38824285741248]
Replay in neural networks involves training on sequential data with memorized samples, which counteracts forgetting of previous behavior caused by non-stationarity.
We present a method where these auxiliary samples are generated on the fly, given only the model that is being trained for the assessed objective.
Instead the implicit memory of learned samples within the assessed model itself is exploited.
arXiv Detail & Related papers (2020-06-22T15:07:06Z) - AI Feynman 2.0: Pareto-optimal symbolic regression exploiting graph
modularity [8.594811303203581]
We present an improved method for symbolic regression that seeks to fit data to formulas that are Pareto-optimal.
It improves on the previous state-of-the-art by typically being orders of magnitude more robust toward noise and bad data.
We develop a method for discovering generalized symmetries from gradient properties of a neural network fit.
arXiv Detail & Related papers (2020-06-18T18:01:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.