Least-Squares Linear Dilation-Erosion Regressor Trained using Stochastic
Descent Gradient or the Difference of Convex Methods
- URL: http://arxiv.org/abs/2107.05682v1
- Date: Mon, 12 Jul 2021 18:41:59 GMT
- Title: Least-Squares Linear Dilation-Erosion Regressor Trained using Stochastic
Descent Gradient or the Difference of Convex Methods
- Authors: Angelica Louren\c{c}o Oliveira and Marcos Eduardo Valle
- Abstract summary: We present a hybrid morphological neural network for regression tasks called linear dilation-erosion regression ($ell$-DER)
An $ell$-DER model is given by a convex combination of the composition of linear and morphological elementary operators.
- Score: 2.055949720959582
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper presents a hybrid morphological neural network for regression
tasks called linear dilation-erosion regression ($\ell$-DER). In few words, an
$\ell$-DER model is given by a convex combination of the composition of linear
and elementary morphological operators. As a result, they yield continuous
piecewise linear functions and, thus, are universal approximators. Apart from
introducing the $\ell$-DER models, we present three approaches for training
these models: one based on stochastic descent gradient and two based on the
difference of convex programming problems. Finally, we evaluate the performance
of the $\ell$-DER model using 14 regression tasks. Although the approach based
on SDG revealed faster than the other two, the $\ell$-DER trained using a
disciplined convex-concave programming problem outperformed the others in terms
of the least mean absolute error score.
Related papers
- Analysis of Interpolating Regression Models and the Double Descent
Phenomenon [3.883460584034765]
It is commonly assumed that models which interpolate noisy training data are poor to generalize.
The best models obtained are overparametrized and the testing error exhibits the double descent behavior as the model order increases.
We derive a result based on the behavior of the smallest singular value of the regression matrix that explains the peak location and the double descent shape of the testing error as a function of model order.
arXiv Detail & Related papers (2023-04-17T09:44:33Z) - Theoretical Characterization of the Generalization Performance of
Overfitted Meta-Learning [70.52689048213398]
This paper studies the performance of overfitted meta-learning under a linear regression model with Gaussian features.
We find new and interesting properties that do not exist in single-task linear regression.
Our analysis suggests that benign overfitting is more significant and easier to observe when the noise and the diversity/fluctuation of the ground truth of each training task are large.
arXiv Detail & Related papers (2023-04-09T20:36:13Z) - Git Re-Basin: Merging Models modulo Permutation Symmetries [3.5450828190071655]
We show how simple algorithms can be used to fit large networks in practice.
We demonstrate the first (to our knowledge) demonstration of zero mode connectivity between independently trained models.
We also discuss shortcomings in the linear mode connectivity hypothesis.
arXiv Detail & Related papers (2022-09-11T10:44:27Z) - Nonparametric regression with modified ReLU networks [77.34726150561087]
We consider regression estimation with modified ReLU neural networks in which network weight matrices are first modified by a function $alpha$ before being multiplied by input vectors.
arXiv Detail & Related papers (2022-07-17T21:46:06Z) - Surprises in adversarially-trained linear regression [12.33259114006129]
Adversarial training is one of the most effective approaches to defend against such examples.
We show that for linear regression problems, adversarial training can be formulated as a convex problem.
We show that for sufficiently many features or sufficiently small regularization parameters, the learned model perfectly interpolates the training data.
arXiv Detail & Related papers (2022-05-25T11:54:42Z) - Learning Sparse Graph with Minimax Concave Penalty under Gaussian Markov
Random Fields [51.07460861448716]
This paper presents a convex-analytic framework to learn from data.
We show that a triangular convexity decomposition is guaranteed by a transform of the corresponding to its upper part.
arXiv Detail & Related papers (2021-09-17T17:46:12Z) - Cogradient Descent for Dependable Learning [64.02052988844301]
We propose a dependable learning based on Cogradient Descent (CoGD) algorithm to address the bilinear optimization problem.
CoGD is introduced to solve bilinear problems when one variable is with sparsity constraint.
It can also be used to decompose the association of features and weights, which further generalizes our method to better train convolutional neural networks (CNNs)
arXiv Detail & Related papers (2021-06-20T04:28:20Z) - A Bayesian Perspective on Training Speed and Model Selection [51.15664724311443]
We show that a measure of a model's training speed can be used to estimate its marginal likelihood.
We verify our results in model selection tasks for linear models and for the infinite-width limit of deep neural networks.
Our results suggest a promising new direction towards explaining why neural networks trained with gradient descent are biased towards functions that generalize well.
arXiv Detail & Related papers (2020-10-27T17:56:14Z) - Maximin Optimization for Binary Regression [24.351803097593887]
regression problems with binary weights are ubiquitous in quantized learning models and digital communication systems.
Lagrangran method also performs well in regression with cross entropy loss, as well as non- neural multi-layer saddle-point optimization.
arXiv Detail & Related papers (2020-10-10T19:47:40Z) - Piecewise Linear Regression via a Difference of Convex Functions [50.89452535187813]
We present a new piecewise linear regression methodology that utilizes fitting a difference of convex functions (DC functions) to the data.
We empirically validate the method, showing it to be practically implementable, and to have comparable performance to existing regression/classification methods on real-world datasets.
arXiv Detail & Related papers (2020-07-05T18:58:47Z) - The Impact of the Mini-batch Size on the Variance of Gradients in
Stochastic Gradient Descent [28.148743710421932]
The mini-batch gradient descent (SGD) algorithm is widely used in training machine learning models.
We study SGD dynamics under linear regression and two-layer linear networks, with an easy extension to deeper linear networks.
arXiv Detail & Related papers (2020-04-27T20:06:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.