Jacobian Descent for Multi-Objective Optimization
- URL: http://arxiv.org/abs/2406.16232v2
- Date: Fri, 04 Oct 2024 13:48:21 GMT
- Title: Jacobian Descent for Multi-Objective Optimization
- Authors: Pierre Quinton, Valérian Rey,
- Abstract summary: gradient descent is limited to single-objective optimization.
Jacobian descent (JD) iteratively updates parameters using the Jacobian matrix of a vector-valued objective function.
- Score: 0.6138671548064355
- License:
- Abstract: Many optimization problems require balancing multiple conflicting objectives. As gradient descent is limited to single-objective optimization, we introduce its direct generalization: Jacobian descent (JD). This algorithm iteratively updates parameters using the Jacobian matrix of a vector-valued objective function, in which each row is the gradient of an individual objective. While several methods to combine gradients already exist in the literature, they are generally hindered when the objectives conflict. In contrast, we propose projecting gradients to fully resolve conflict while ensuring that they preserve an influence proportional to their norm. We prove significantly stronger convergence guarantees with this approach, supported by our empirical results. Our method also enables instance-wise risk minimization (IWRM), a novel learning paradigm in which the loss of each training example is considered a separate objective. Applied to simple image classification tasks, IWRM exhibits promising results compared to the direct minimization of the average loss. Additionally, we outline an efficient implementation of JD using the Gramian of the Jacobian matrix to reduce time and memory requirements.
Related papers
- Enhancing Generalization of Universal Adversarial Perturbation through
Gradient Aggregation [40.18851174642427]
Deep neural networks are vulnerable to universal adversarial perturbation (UAP)
In this paper, we examine the serious dilemma of UAP generation methods from a generalization perspective.
We propose a simple and effective method called Gradient Aggregation (SGA)
SGA alleviates the gradient vanishing and escapes from poor local optima at the same time.
arXiv Detail & Related papers (2023-08-11T08:44:58Z) - Direction-oriented Multi-objective Learning: Simple and Provable
Stochastic Algorithms [12.776767874217663]
We propose a new direction-oriented multi-objective problem by regularizing the common descent direction within a neighborhood of a direction.
We demonstrate the superior performance of the proposed methods in a series of tasks on multi-task supervised learning and reinforcement learning.
arXiv Detail & Related papers (2023-05-28T16:13:59Z) - Scalable Bayesian Meta-Learning through Generalized Implicit Gradients [64.21628447579772]
Implicit Bayesian meta-learning (iBaML) method broadens the scope of learnable priors, but also quantifies the associated uncertainty.
Analytical error bounds are established to demonstrate the precision and efficiency of the generalized implicit gradient over the explicit one.
arXiv Detail & Related papers (2023-03-31T02:10:30Z) - Global and Preference-based Optimization with Mixed Variables using Piecewise Affine Surrogates [0.6083861980670925]
This paper proposes a novel surrogate-based global optimization algorithm to solve linearly constrained mixed-variable problems.
We assume the objective function is black-box and expensive-to-evaluate, while the linear constraints are quantifiable unrelaxable a priori known.
We introduce two types of exploration functions to efficiently search the feasible domain via mixed-integer linear programming solvers.
arXiv Detail & Related papers (2023-02-09T15:04:35Z) - Supervised Contrastive Learning as Multi-Objective Optimization for
Fine-Tuning Large Pre-trained Language Models [3.759936323189417]
Supervised Contrastive Learning (SCL) has been shown to achieve excellent performance in most classification tasks.
In this work, we formulate the SCL problem as a Multi-Objective Optimization problem for the fine-tuning phase of RoBERTa language model.
arXiv Detail & Related papers (2022-09-28T15:13:58Z) - Provable Stochastic Optimization for Global Contrastive Learning: Small
Batch Does Not Harm Performance [53.49803579981569]
We consider a global objective for contrastive learning, which contrasts each positive pair with all negative pairs for an anchor point.
Existing methods such as SimCLR requires a large batch size in order to achieve a satisfactory result.
We propose a memory-efficient optimization algorithm for solving the Global Contrastive Learning of Representations, named SogCLR.
arXiv Detail & Related papers (2022-02-24T22:16:53Z) - Distribution-aware Margin Calibration for Semantic Segmentation in
Images [78.65312390695038]
Jaccard index, also known as Intersection-over-Union (IoU), is one of the most critical evaluation metrics in image semantic segmentation.
Direct optimization of IoU score is very difficult because the learning objective is neither differentiable nor decomposable.
We propose a margin calibration method, which can be directly used as a learning objective, for an improved generalization of IoU over the data-distribution.
arXiv Detail & Related papers (2021-12-21T22:38:25Z) - Conflict-Averse Gradient Descent for Multi-task Learning [56.379937772617]
A major challenge in optimizing a multi-task model is the conflicting gradients.
We introduce Conflict-Averse Gradient descent (CAGrad) which minimizes the average loss function.
CAGrad balances the objectives automatically and still provably converges to a minimum over the average loss.
arXiv Detail & Related papers (2021-10-26T22:03:51Z) - Follow the bisector: a simple method for multi-objective optimization [65.83318707752385]
We consider optimization problems, where multiple differentiable losses have to be minimized.
The presented method computes descent direction in every iteration to guarantee equal relative decrease of objective functions.
arXiv Detail & Related papers (2020-07-14T09:50:33Z) - Cogradient Descent for Bilinear Optimization [124.45816011848096]
We introduce a Cogradient Descent algorithm (CoGD) to address the bilinear problem.
We solve one variable by considering its coupling relationship with the other, leading to a synchronous gradient descent.
Our algorithm is applied to solve problems with one variable under the sparsity constraint.
arXiv Detail & Related papers (2020-06-16T13:41:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.