Neural Network Training and Non-Differentiable Objective Functions
- URL: http://arxiv.org/abs/2305.02024v1
- Date: Wed, 3 May 2023 10:28:23 GMT
- Title: Neural Network Training and Non-Differentiable Objective Functions
- Authors: Yash Patel
- Abstract summary: This thesis makes four main contributions toward bridging the gap between the non-differentiable objective and the training loss function.
The contributions of this thesis make the training of neural networks more scalable.
- Score: 2.3351527694849574
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many important computer vision tasks are naturally formulated to have a
non-differentiable objective. Therefore, the standard, dominant training
procedure of a neural network is not applicable since back-propagation requires
the gradients of the objective with respect to the output of the model. Most
deep learning methods side-step the problem sub-optimally by using a proxy loss
for training, which was originally designed for another task and is not
tailored to the specifics of the objective. The proxy loss functions may or may
not align well with the original non-differentiable objective. An appropriate
proxy has to be designed for a novel task, which may not be feasible for a
non-specialist. This thesis makes four main contributions toward bridging the
gap between the non-differentiable objective and the training loss function.
Throughout the thesis, we refer to a loss function as a surrogate loss if it is
a differentiable approximation of the non-differentiable objective. Note that
we use the terms objective and evaluation metric interchangeably.
The contributions of this thesis make the training of neural networks more
scalable -- to new tasks in a nearly labor-free manner when the evaluation
metric is decomposable, which will help researchers with novel tasks. For
non-decomposable evaluation metrics, the differentiable components developed
for the recall@k surrogate, such as sorting and counting, can also be used for
creating new surrogates.
Related papers
- A Novel Information-Theoretic Objective to Disentangle Representations
for Fair Classification [46.884905701771004]
One of the main application for such disentangled representations is fair classification.
We adopt an information-theoretic view of this problem which motivates a novel family of regularizers.
The resulting set of losses, called CLINIC, is parameter free and thus, it is easier and faster to train.
arXiv Detail & Related papers (2023-10-21T12:35:48Z) - A Simple Contrastive Learning Objective for Alleviating Neural Text
Degeneration [56.64703901898937]
We propose a new contrastive token learning objective that inherits the advantages of cross-entropy and unlikelihood training.
Comprehensive experiments on language modeling and open-domain dialogue generation tasks show that the proposed contrastive token objective yields less repetitive texts.
arXiv Detail & Related papers (2022-05-05T08:50:50Z) - Relational Surrogate Loss Learning [41.61184221367546]
This paper revisits the surrogate loss learning, where a deep neural network is employed to approximate the evaluation metrics.
In this paper, we show that directly maintaining the relation of models between surrogate losses and metrics suffices.
Our method is much easier to optimize and enjoys significant efficiency and performance gains.
arXiv Detail & Related papers (2022-02-26T17:32:57Z) - Generative multitask learning mitigates target-causing confounding [61.21582323566118]
We propose a simple and scalable approach to causal representation learning for multitask learning.
The improvement comes from mitigating unobserved confounders that cause the targets, but not the input.
Our results on the Attributes of People and Taskonomy datasets reflect the conceptual improvement in robustness to prior probability shift.
arXiv Detail & Related papers (2022-02-08T20:42:14Z) - Learning Curves for Sequential Training of Neural Networks:
Self-Knowledge Transfer and Forgetting [9.734033555407406]
We consider neural networks in the neural tangent kernel regime that continually learn target functions from task to task.
We investigate a variant of continual learning where the model learns the same target function in multiple tasks.
Even for the same target, the trained model shows some transfer and forgetting depending on the sample size of each task.
arXiv Detail & Related papers (2021-12-03T00:25:01Z) - Outcome-Driven Reinforcement Learning via Variational Inference [95.82770132618862]
We discuss a new perspective on reinforcement learning, recasting it as the problem of inferring actions that achieve desired outcomes, rather than a problem of maximizing rewards.
To solve the resulting outcome-directed inference problem, we establish a novel variational inference formulation that allows us to derive a well-shaped reward function.
We empirically demonstrate that this method eliminates the need to design reward functions and leads to effective goal-directed behaviors.
arXiv Detail & Related papers (2021-04-20T18:16:21Z) - Why Do Better Loss Functions Lead to Less Transferable Features? [93.47297944685114]
This paper studies how the choice of training objective affects the transferability of the hidden representations of convolutional neural networks trained on ImageNet.
We show that many objectives lead to statistically significant improvements in ImageNet accuracy over vanilla softmax cross-entropy, but the resulting fixed feature extractors transfer substantially worse to downstream tasks.
arXiv Detail & Related papers (2020-10-30T17:50:31Z) - Multi-task Supervised Learning via Cross-learning [102.64082402388192]
We consider a problem known as multi-task learning, consisting of fitting a set of regression functions intended for solving different tasks.
In our novel formulation, we couple the parameters of these functions, so that they learn in their task specific domains while staying close to each other.
This facilitates cross-fertilization in which data collected across different domains help improving the learning performance at each other task.
arXiv Detail & Related papers (2020-10-24T21:35:57Z) - Uniform Priors for Data-Efficient Transfer [65.086680950871]
We show that features that are most transferable have high uniformity in the embedding space.
We evaluate the regularization on its ability to facilitate adaptation to unseen tasks and data.
arXiv Detail & Related papers (2020-06-30T04:39:36Z) - Anti-Transfer Learning for Task Invariance in Convolutional Neural
Networks for Speech Processing [6.376852004129252]
We introduce the novel concept of anti-transfer learning for speech processing with convolutional neural networks.
We show that anti-transfer actually leads to the intended invariance to the task and to more appropriate features for the target task at hand.
arXiv Detail & Related papers (2020-06-11T15:03:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.