A generalized quadratic loss for SVM and Deep Neural Networks
- URL: http://arxiv.org/abs/2102.07606v1
- Date: Mon, 15 Feb 2021 15:49:08 GMT
- Title: A generalized quadratic loss for SVM and Deep Neural Networks
- Authors: Filippo Portera
- Abstract summary: We consider some supervised binary classification tasks and a regression task, whereas SVM and Deep Learning, at present, exhibit the best generalization performances.
We extend the work [3] on a generalized quadratic loss for learning problems that examines pattern correlations in order to concentrate the learning problem into input space regions where patterns are more densely distributed.
- Score: 0.0
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: We consider some supervised binary classification tasks and a regression
task, whereas SVM and Deep Learning, at present, exhibit the best
generalization performances. We extend the work [3] on a generalized quadratic
loss for learning problems that examines pattern correlations in order to
concentrate the learning problem into input space regions where patterns are
more densely distributed. From a shallow methods point of view (e.g.: SVM),
since the following mathematical derivation of problem (9) in [3] is incorrect,
we restart from problem (8) in [3] and we try to solve it with one procedure
that iterates over the dual variables until the primal and dual objective
functions converge. In addition we propose another algorithm that tries to
solve the classification problem directly from the primal problem formulation.
We make also use of Multiple Kernel Learning to improve generalization
performances. Moreover, we introduce for the first time a custom loss that
takes in consideration pattern correlation for a shallow and a Deep Learning
task. We propose some pattern selection criteria and the results on 4 UCI
data-sets for the SVM method. We also report the results on a larger binary
classification data-set based on Twitter, again drawn from UCI, combined with
shallow Learning Neural Networks, with and without the generalized quadratic
loss. At last, we test our loss with a Deep Neural Network within a larger
regression task taken from UCI. We compare the results of our optimizers with
the well known solver SVMlight and with Keras Multi-Layers Neural Networks with
standard losses and with a parameterized generalized quadratic loss, and we
obtain comparable results.
Related papers
- What to Do When Your Discrete Optimization Is the Size of a Neural
Network? [24.546550334179486]
Machine learning applications using neural networks involve solving discrete optimization problems.
classical approaches used in discrete settings do not scale well to large neural networks.
We take continuation path (CP) methods to represent using purely the former and Monte Carlo (MC) methods to represent the latter.
arXiv Detail & Related papers (2024-02-15T21:57:43Z) - Learning To Dive In Branch And Bound [95.13209326119153]
We propose L2Dive to learn specific diving structurals with graph neural networks.
We train generative models to predict variable assignments and leverage the duality of linear programs to make diving decisions.
arXiv Detail & Related papers (2023-01-24T12:01:45Z) - Towards Better Out-of-Distribution Generalization of Neural Algorithmic
Reasoning Tasks [51.8723187709964]
We study the OOD generalization of neural algorithmic reasoning tasks.
The goal is to learn an algorithm from input-output pairs using deep neural networks.
arXiv Detail & Related papers (2022-11-01T18:33:20Z) - Understanding the Generalization of Adam in Learning Neural Networks
with Proper Regularization [118.50301177912381]
We show that Adam can converge to different solutions of the objective with provably different errors, even with weight decay globalization.
We show that if convex, and the weight decay regularization is employed, any optimization algorithms including Adam will converge to the same solution.
arXiv Detail & Related papers (2021-08-25T17:58:21Z) - Network Support for High-performance Distributed Machine Learning [17.919773898228716]
We propose a system model that captures both learning nodes (that perform computations) and information nodes (that provide data)
We then formulate the problem of selecting (i) which learning and information nodes should cooperate to complete the learning task, and (ii) the number of iterations to perform.
We devise an algorithm, named DoubleClimb, that can find a 1+1/|I|-competitive solution with cubic worst-case complexity.
arXiv Detail & Related papers (2021-02-05T19:38:57Z) - Understanding Self-supervised Learning with Dual Deep Networks [74.92916579635336]
We propose a novel framework to understand contrastive self-supervised learning (SSL) methods that employ dual pairs of deep ReLU networks.
We prove that in each SGD update of SimCLR with various loss functions, the weights at each layer are updated by a emphcovariance operator.
To further study what role the covariance operator plays and which features are learned in such a process, we model data generation and augmentation processes through a emphhierarchical latent tree model (HLTM)
arXiv Detail & Related papers (2020-10-01T17:51:49Z) - Fast Learning of Graph Neural Networks with Guaranteed Generalizability:
One-hidden-layer Case [93.37576644429578]
Graph neural networks (GNNs) have made great progress recently on learning from graph-structured data in practice.
We provide a theoretically-grounded generalizability analysis of GNNs with one hidden layer for both regression and binary classification problems.
arXiv Detail & Related papers (2020-06-25T00:45:52Z) - An Efficient Framework for Clustered Federated Learning [26.24231986590374]
We address the problem of federated learning (FL) where users are distributed into clusters.
We propose the Iterative Federated Clustering Algorithm (IFCA)
We show that our algorithm is efficient in non- partitioned problems such as neural networks.
arXiv Detail & Related papers (2020-06-07T08:48:59Z) - Communication-Efficient Distributed Stochastic AUC Maximization with
Deep Neural Networks [50.42141893913188]
We study a distributed variable for large-scale AUC for a neural network as with a deep neural network.
Our model requires a much less number of communication rounds and still a number of communication rounds in theory.
Our experiments on several datasets show the effectiveness of our theory and also confirm our theory.
arXiv Detail & Related papers (2020-05-05T18:08:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.