Efficient Learning Under Density Shift in Incremental Settings Using Cramér-Rao-Based Regularization
- URL: http://arxiv.org/abs/2502.12949v1
- Date: Tue, 18 Feb 2025 16:00:10 GMT
- Title: Efficient Learning Under Density Shift in Incremental Settings Using Cramér-Rao-Based Regularization
- Authors: Behraj Khan, Behroz Mirza, Nouman Durrani, Tahir Syed,
- Abstract summary: This work takes a distributed density estimation angle to the problem where data are temporally distributed.
It processes data in batches and allows a neural network to treat a batch as training data.
$C2A$ achieves $19%$ accuracy at maximum against state-of-the-art methods.
- Score: 0.0
- License:
- Abstract: The continuous surge in data volume and velocity is often dealt with using data orchestration and distributed processing approaches, abstracting away the machine learning challenges that exist at the algorithmic level. With growing interest in automating the learning loop, training with data that arrive in a sequence rather than in the classical in-memory training data form will face a machine learning challenge because of evolving feature distributions across batches of training data biasing the cross-validation step (\cite{sugiyama2012machine}). This work takes a distributed density estimation angle to the problem where data are temporally distributed. It processes data in batches and allows a neural network to treat a batch as training data. The method accumulates knowledge about the data density via posterior probability absorption using the Fisher Information Matrix, which contains information about the local optimization gradients for the batch. This is then used as a regularizer for the loss in the following batch, and therefore the density estimate for the entire dataset constructively gets more robust to the non-iid distribution shift. This needs the presence of a pair of batches in memory at a time, so the space cost is not a function of the size of the complete, distributed dataset. We proposed a novel regularization-based approach Covariate Shift Correction $C^{2}A$ that leverages Fisher information and Kullback-Leibler divergence to adapt to both natural and sequential covariate shift caused by dataset fragmentation. $C^{2}A$ achieves $19\%$ accuracy at maximum against state-of-the-art methods.
Related papers
- Capturing the Temporal Dependence of Training Data Influence [100.91355498124527]
We formalize the concept of trajectory-specific leave-one-out influence, which quantifies the impact of removing a data point during training.
We propose data value embedding, a novel technique enabling efficient approximation of trajectory-specific LOO.
As data value embedding captures training data ordering, it offers valuable insights into model training dynamics.
arXiv Detail & Related papers (2024-12-12T18:28:55Z) - Mitigating covariate shift in non-colocated data with learned parameter priors [0.0]
We present textitFragmentation-induced co-shift remediation ($FIcsR$), which minimizes an $f$-divergence between a fragment's covariate distribution and that of the standard cross-validation baseline.
We run extensive classification experiments on multiple data classes, over $40$ datasets, and with data batched over multiple sequence lengths.
The results are promising under all these conditions; with improved accuracy against batch and fold state-of-the-art by more than $5%$ and $10%$, respectively.
arXiv Detail & Related papers (2024-11-10T15:48:29Z) - FedUV: Uniformity and Variance for Heterogeneous Federated Learning [5.9330433627374815]
Federated learning is a promising framework to train neural networks with widely distributed data.
Recent work has shown this is due to the final layer of the network being most prone to local bias.
We investigate the training dynamics of the classifier by applying SVD to the weights motivated by the observation that freezing weights results in constant singular values.
arXiv Detail & Related papers (2024-02-27T15:53:15Z) - Informative regularization for a multi-layer perceptron RR Lyrae
classifier under data shift [3.303002683812084]
We propose a scalable and easily adaptable approach based on an informative regularization and an ad-hoc training procedure to mitigate the shift problem.
Our method provides a new path to incorporate knowledge from characteristic features into artificial neural networks to manage the underlying data shift problem.
arXiv Detail & Related papers (2023-03-12T02:49:19Z) - Invariance Learning in Deep Neural Networks with Differentiable Laplace
Approximations [76.82124752950148]
We develop a convenient gradient-based method for selecting the data augmentation.
We use a differentiable Kronecker-factored Laplace approximation to the marginal likelihood as our objective.
arXiv Detail & Related papers (2022-02-22T02:51:11Z) - Transformers Can Do Bayesian Inference [56.99390658880008]
We present Prior-Data Fitted Networks (PFNs)
PFNs leverage in-context learning in large-scale machine learning techniques to approximate a large set of posteriors.
We demonstrate that PFNs can near-perfectly mimic Gaussian processes and also enable efficient Bayesian inference for intractable problems.
arXiv Detail & Related papers (2021-12-20T13:07:39Z) - Online Missing Value Imputation and Change Point Detection with the
Gaussian Copula [21.26330349034669]
Missing value imputation is crucial for real-world data science.
We develop an online imputation algorithm for mixed data using the Gaussian copula.
arXiv Detail & Related papers (2020-09-25T16:27:47Z) - Learning while Respecting Privacy and Robustness to Distributional
Uncertainties and Adversarial Data [66.78671826743884]
The distributionally robust optimization framework is considered for training a parametric model.
The objective is to endow the trained model with robustness against adversarially manipulated input data.
Proposed algorithms offer robustness with little overhead.
arXiv Detail & Related papers (2020-07-07T18:25:25Z) - Evaluating Prediction-Time Batch Normalization for Robustness under
Covariate Shift [81.74795324629712]
We call prediction-time batch normalization, which significantly improves model accuracy and calibration under covariate shift.
We show that prediction-time batch normalization provides complementary benefits to existing state-of-the-art approaches for improving robustness.
The method has mixed results when used alongside pre-training, and does not seem to perform as well under more natural types of dataset shift.
arXiv Detail & Related papers (2020-06-19T05:08:43Z) - Unshuffling Data for Improved Generalization [65.57124325257409]
Generalization beyond the training distribution is a core challenge in machine learning.
We show that partitioning the data into well-chosen, non-i.i.d. subsets treated as multiple training environments can guide the learning of models with better out-of-distribution generalization.
arXiv Detail & Related papers (2020-02-27T03:07:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.