Instance-Conditional Timescales of Decay for Non-Stationary Learning
- URL: http://arxiv.org/abs/2212.05908v2
- Date: Wed, 20 Dec 2023 09:26:38 GMT
- Title: Instance-Conditional Timescales of Decay for Non-Stationary Learning
- Authors: Nishant Jain, Pradeep Shenoy
- Abstract summary: Slow concept drift is a ubiquitous, yet under-studied problem in machine learning systems.
We propose an optimization-driven approach towards balancing instance importance over large training windows.
Experiments on a large real-world dataset of 39M photos over a 9 year period show upto 15% relative gains in accuracy.
- Score: 11.90763787610444
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Slow concept drift is a ubiquitous, yet under-studied problem in practical
machine learning systems. In such settings, although recent data is more
indicative of future data, naively prioritizing recent instances runs the risk
of losing valuable information from the past. We propose an optimization-driven
approach towards balancing instance importance over large training windows.
First, we model instance relevance using a mixture of multiple timescales of
decay, allowing us to capture rich temporal trends. Second, we learn an
auxiliary scorer model that recovers the appropriate mixture of timescales as a
function of the instance itself. Finally, we propose a nested optimization
objective for learning the scorer, by which it maximizes forward transfer for
the learned model. Experiments on a large real-world dataset of 39M photos over
a 9 year period show upto 15% relative gains in accuracy compared to other
robust learning baselines. We replicate our gains on two collections of
real-world datasets for non-stationary learning, and extend our work to
continual learning settings where, too, we beat SOTA methods by large margins.
Related papers
- Adaptive Rentention & Correction for Continual Learning [114.5656325514408]
A common problem in continual learning is the classification layer's bias towards the most recent task.
We name our approach Adaptive Retention & Correction (ARC)
ARC achieves an average performance increase of 2.7% and 2.6% on the CIFAR-100 and Imagenet-R datasets.
arXiv Detail & Related papers (2024-05-23T08:43:09Z) - Adaptive Memory Replay for Continual Learning [29.333341368722653]
Updating Foundation Models as new data becomes available can lead to catastrophic forgetting'
We introduce a framework of adaptive memory replay for continual learning, where sampling of past data is phrased as a multi-armed bandit problem.
We demonstrate the effectiveness of our approach, which maintains high performance while reducing forgetting by up to 10% at no training efficiency cost.
arXiv Detail & Related papers (2024-04-18T22:01:56Z) - Bad Students Make Great Teachers: Active Learning Accelerates Large-Scale Visual Understanding [9.112203072394648]
Power-law scaling indicates that large-scale training with uniform sampling is prohibitively slow.
Active learning methods aim to increase data efficiency by prioritizing learning on the most relevant examples.
arXiv Detail & Related papers (2023-12-08T19:26:13Z) - Learn to Unlearn for Deep Neural Networks: Minimizing Unlearning
Interference with Gradient Projection [56.292071534857946]
Recent data-privacy laws have sparked interest in machine unlearning.
Challenge is to discard information about the forget'' data without altering knowledge about remaining dataset.
We adopt a projected-gradient based learning method, named as Projected-Gradient Unlearning (PGU)
We provide empirically evidence to demonstrate that our unlearning method can produce models that behave similar to models retrained from scratch across various metrics even when the training dataset is no longer accessible.
arXiv Detail & Related papers (2023-12-07T07:17:24Z) - Zero-shot Retrieval: Augmenting Pre-trained Models with Search Engines [83.65380507372483]
Large pre-trained models can dramatically reduce the amount of task-specific data required to solve a problem, but they often fail to capture domain-specific nuances out of the box.
This paper shows how to leverage recent advances in NLP and multi-modal learning to augment a pre-trained model with search engine retrieval.
arXiv Detail & Related papers (2023-11-29T05:33:28Z) - Correlated Time Series Self-Supervised Representation Learning via
Spatiotemporal Bootstrapping [13.988624652592259]
Time series analysis plays an important role in many real-world industries.
In this paper, we propose a time-step-level representation learning framework for individual instances.
A linear regression model trained on top of the learned representations demonstrates our model performs best in most cases.
arXiv Detail & Related papers (2023-06-12T09:42:16Z) - Adaptive Cross Batch Normalization for Metric Learning [75.91093210956116]
Metric learning is a fundamental problem in computer vision.
We show that it is equally important to ensure that the accumulated embeddings are up to date.
In particular, it is necessary to circumvent the representational drift between the accumulated embeddings and the feature embeddings at the current training iteration.
arXiv Detail & Related papers (2023-03-30T03:22:52Z) - Time Series Contrastive Learning with Information-Aware Augmentations [57.45139904366001]
A key component of contrastive learning is to select appropriate augmentations imposing some priors to construct feasible positive samples.
How to find the desired augmentations of time series data that are meaningful for given contrastive learning tasks and datasets remains an open question.
We propose a new contrastive learning approach with information-aware augmentations, InfoTS, that adaptively selects optimal augmentations for time series representation learning.
arXiv Detail & Related papers (2023-03-21T15:02:50Z) - Beyond Transfer Learning: Co-finetuning for Action Localisation [64.07196901012153]
We propose co-finetuning -- simultaneously training a single model on multiple upstream'' and downstream'' tasks.
We demonstrate that co-finetuning outperforms traditional transfer learning when using the same total amount of data.
We also show how we can easily extend our approach to multiple upstream'' datasets to further improve performance.
arXiv Detail & Related papers (2022-07-08T10:25:47Z) - Deep Learning on a Data Diet: Finding Important Examples Early in
Training [35.746302913918484]
In vision datasets, simple scores can be used to identify important examples very early in training.
We propose two such scores -- the Gradient Normed (GraNd) and the Error L2-Norm (EL2N)
arXiv Detail & Related papers (2021-07-15T02:12:20Z) - One Backward from Ten Forward, Subsampling for Large-Scale Deep Learning [35.0157090322113]
Large-scale machine learning systems are often continuously trained with enormous data from production environments.
The sheer volume of streaming data poses a significant challenge to real-time training subsystems and ad-hoc sampling is the standard practice.
We propose to record a constant amount of information per instance from these forward passes. The extra information measurably improves the selection of which data instances should participate in forward and backward passes.
arXiv Detail & Related papers (2021-04-27T11:29:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.