Derivative-based regularization for regression
- URL: http://arxiv.org/abs/2405.00555v1
- Date: Wed, 1 May 2024 14:57:59 GMT
- Title: Derivative-based regularization for regression
- Authors: Enrico Lopedoto, Maksim Shekhunov, Vitaly Aksenov, Kizito Salako, Tillman Weyde,
- Abstract summary: We introduce a novel approach to regularization in multivariable regression problems.
Our regularizer, called DLoss, penalises differences between the model's derivatives and derivatives of the data generating function as estimated from the training data.
- Score: 3.0408645115035036
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: In this work, we introduce a novel approach to regularization in multivariable regression problems. Our regularizer, called DLoss, penalises differences between the model's derivatives and derivatives of the data generating function as estimated from the training data. We call these estimated derivatives data derivatives. The goal of our method is to align the model to the data, not only in terms of target values but also in terms of the derivatives involved. To estimate data derivatives, we select (from the training data) 2-tuples of input-value pairs, using either nearest neighbour or random, selection. On synthetic and real datasets, we evaluate the effectiveness of adding DLoss, with different weights, to the standard mean squared error loss. The experimental results show that with DLoss (using nearest neighbour selection) we obtain, on average, the best rank with respect to MSE on validation data sets, compared to no regularization, L2 regularization, and Dropout.
Related papers
- Loss-to-Loss Prediction: Scaling Laws for All Datasets [17.078832037614397]
We derive a strategy for predicting one loss from another and apply it to predict across different pre-training datasets.
Our predictions extrapolate well even at 20x the largest FLOP budget used to fit the curves.
arXiv Detail & Related papers (2024-11-19T23:23:16Z) - Adaptive Optimization for Prediction with Missing Data [6.800113478497425]
We show that some adaptive linear regression models are equivalent to learning an imputation rule and a downstream linear regression model simultaneously.
In settings where data is strongly not missing at random, our methods achieve a 2-10% improvement in out-of-sample accuracy.
arXiv Detail & Related papers (2024-02-02T16:35:51Z) - The Power and Limitation of Pretraining-Finetuning for Linear Regression
under Covariate Shift [127.21287240963859]
We investigate a transfer learning approach with pretraining on the source data and finetuning based on the target data.
For a large class of linear regression instances, transfer learning with $O(N2)$ source data is as effective as supervised learning with $N$ target data.
arXiv Detail & Related papers (2022-08-03T05:59:49Z) - DIVA: Dataset Derivative of a Learning Task [108.18912044384213]
We present a method to compute the derivative of a learning task with respect to a dataset.
A learning task is a function from a training set to the validation error, which can be represented by a trained deep neural network (DNN)
The "dataset derivative" is a linear operator, computed around the trained model, that informs how outliers of the weight of each training sample affect the validation error.
arXiv Detail & Related papers (2021-11-18T16:33:12Z) - X-model: Improving Data Efficiency in Deep Learning with A Minimax Model [78.55482897452417]
We aim at improving data efficiency for both classification and regression setups in deep learning.
To take the power of both worlds, we propose a novel X-model.
X-model plays a minimax game between the feature extractor and task-specific heads.
arXiv Detail & Related papers (2021-10-09T13:56:48Z) - Variation-Incentive Loss Re-weighting for Regression Analysis on Biased
Data [8.115323786541078]
We aim to improve the accuracy of the regression analysis by addressing the data skewness/bias during model training.
We propose a Variation-Incentive Loss re-weighting method (VILoss) to optimize the gradient descent-based model training for regression analysis.
arXiv Detail & Related papers (2021-09-14T10:22:21Z) - Imputation-Free Learning from Incomplete Observations [73.15386629370111]
We introduce the importance of guided gradient descent (IGSGD) method to train inference from inputs containing missing values without imputation.
We employ reinforcement learning (RL) to adjust the gradients used to train the models via back-propagation.
Our imputation-free predictions outperform the traditional two-step imputation-based predictions using state-of-the-art imputation methods.
arXiv Detail & Related papers (2021-07-05T12:44:39Z) - HYDRA: Hypergradient Data Relevance Analysis for Interpreting Deep
Neural Networks [51.143054943431665]
We propose Hypergradient Data Relevance Analysis, or HYDRA, which interprets predictions made by deep neural networks (DNNs) as effects of their training data.
HYDRA assesses the contribution of training data toward test data points throughout the training trajectory.
In addition, we quantitatively demonstrate that HYDRA outperforms influence functions in accurately estimating data contribution and detecting noisy data labels.
arXiv Detail & Related papers (2021-02-04T10:00:13Z) - Advanced Dropout: A Model-free Methodology for Bayesian Dropout
Optimization [62.8384110757689]
Overfitting ubiquitously exists in real-world applications of deep neural networks (DNNs)
The advanced dropout technique applies a model-free and easily implemented distribution with parametric prior, and adaptively adjusts dropout rate.
We evaluate the effectiveness of the advanced dropout against nine dropout techniques on seven computer vision datasets.
arXiv Detail & Related papers (2020-10-11T13:19:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.