Accelerated Neural Network Training with Rooted Logistic Objectives
- URL: http://arxiv.org/abs/2310.03890v1
- Date: Thu, 5 Oct 2023 20:49:48 GMT
- Title: Accelerated Neural Network Training with Rooted Logistic Objectives
- Authors: Zhu Wang, Praveen Raj Veluswami, Harsh Mishra, Sathya N. Ravi
- Abstract summary: We derive a novel sequence of em strictly convex functions that are at least as strict as logistic loss.
Our results illustrate that training with rooted loss function is converged faster and gains performance improvements.
- Score: 13.400503928962756
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Many neural networks deployed in the real world scenarios are trained using
cross entropy based loss functions. From the optimization perspective, it is
known that the behavior of first order methods such as gradient descent
crucially depend on the separability of datasets. In fact, even in the most
simplest case of binary classification, the rate of convergence depends on two
factors: (1) condition number of data matrix, and (2) separability of the
dataset. With no further pre-processing techniques such as
over-parametrization, data augmentation etc., separability is an intrinsic
quantity of the data distribution under consideration. We focus on the
landscape design of the logistic function and derive a novel sequence of {\em
strictly} convex functions that are at least as strict as logistic loss. The
minimizers of these functions coincide with those of the minimum norm solution
wherever possible. The strict convexity of the derived function can be extended
to finetune state-of-the-art models and applications. In empirical experimental
analysis, we apply our proposed rooted logistic objective to multiple deep
models, e.g., fully-connected neural networks and transformers, on various of
classification benchmarks. Our results illustrate that training with rooted
loss function is converged faster and gains performance improvements.
Furthermore, we illustrate applications of our novel rooted loss function in
generative modeling based downstream applications, such as finetuning StyleGAN
model with the rooted loss. The code implementing our losses and models can be
found here for open source software development purposes:
https://anonymous.4open.science/r/rooted_loss.
Related papers
- Deep Loss Convexification for Learning Iterative Models [11.36644967267829]
Iterative methods such as iterative closest point (ICP) for point cloud registration often suffer from bad local optimality.
We propose learning to form a convex landscape around each ground truth.
arXiv Detail & Related papers (2024-11-16T01:13:04Z) - Towards Robust Out-of-Distribution Generalization: Data Augmentation and Neural Architecture Search Approaches [4.577842191730992]
We study ways toward robust OoD generalization for deep learning.
We first propose a novel and effective approach to disentangle the spurious correlation between features that are not essential for recognition.
We then study the problem of strengthening neural architecture search in OoD scenarios.
arXiv Detail & Related papers (2024-10-25T20:50:32Z) - Just How Flexible are Neural Networks in Practice? [89.80474583606242]
It is widely believed that a neural network can fit a training set containing at least as many samples as it has parameters.
In practice, however, we only find solutions via our training procedure, including the gradient and regularizers, limiting flexibility.
arXiv Detail & Related papers (2024-06-17T12:24:45Z) - Nonlinear functional regression by functional deep neural network with
kernel embedding [20.306390874610635]
We propose a functional deep neural network with an efficient and fully data-dependent dimension reduction method.
The architecture of our functional net consists of a kernel embedding step, a projection step, and a deep ReLU neural network for the prediction.
The utilization of smooth kernel embedding enables our functional net to be discretization invariant, efficient, and robust to noisy observations.
arXiv Detail & Related papers (2024-01-05T16:43:39Z) - On the Dynamics Under the Unhinged Loss and Beyond [104.49565602940699]
We introduce the unhinged loss, a concise loss function, that offers more mathematical opportunities to analyze closed-form dynamics.
The unhinged loss allows for considering more practical techniques, such as time-vary learning rates and feature normalization.
arXiv Detail & Related papers (2023-12-13T02:11:07Z) - Improved Convergence Guarantees for Shallow Neural Networks [91.3755431537592]
We prove convergence of depth 2 neural networks, trained via gradient descent, to a global minimum.
Our model has the following features: regression with quadratic loss function, fully connected feedforward architecture, RelU activations, Gaussian data instances, adversarial labels.
They strongly suggest that, at least in our model, the convergence phenomenon extends well beyond the NTK regime''
arXiv Detail & Related papers (2022-12-05T14:47:52Z) - Counterfactual Intervention Feature Transfer for Visible-Infrared Person
Re-identification [69.45543438974963]
We find graph-based methods in the visible-infrared person re-identification task (VI-ReID) suffer from bad generalization because of two issues.
The well-trained input features weaken the learning of graph topology, making it not generalized enough during the inference process.
We propose a Counterfactual Intervention Feature Transfer (CIFT) method to tackle these problems.
arXiv Detail & Related papers (2022-08-01T16:15:31Z) - Critical Investigation of Failure Modes in Physics-informed Neural
Networks [0.9137554315375919]
We show that a physics-informed neural network with a composite formulation produces highly non- learned loss surfaces that are difficult to optimize.
We also assess the training both approaches on two elliptic problems with increasingly complex target solutions.
arXiv Detail & Related papers (2022-06-20T18:43:35Z) - The Multiscale Structure of Neural Network Loss Functions: The Effect on
Optimization and Origin [12.092361450994318]
We study the structure of neural network loss functions and its implication on optimization in a region beyond the reach of good quadratic approximation.
We show that training data with different magnitudes give rise to different scales of the loss function, producing subquadratic growth or multiple separate scales.
arXiv Detail & Related papers (2022-04-24T17:34:12Z) - Mitigating Performance Saturation in Neural Marked Point Processes:
Architectures and Loss Functions [50.674773358075015]
We propose a simple graph-based network structure called GCHP, which utilizes only graph convolutional layers.
We show that GCHP can significantly reduce training time and the likelihood ratio loss with interarrival time probability assumptions can greatly improve the model performance.
arXiv Detail & Related papers (2021-07-07T16:59:14Z) - Topological obstructions in neural networks learning [67.8848058842671]
We study global properties of the loss gradient function flow.
We use topological data analysis of the loss function and its Morse complex to relate local behavior along gradient trajectories with global properties of the loss surface.
arXiv Detail & Related papers (2020-12-31T18:53:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.