Supervised Contrastive Learning as Multi-Objective Optimization for
Fine-Tuning Large Pre-trained Language Models
- URL: http://arxiv.org/abs/2209.14161v1
- Date: Wed, 28 Sep 2022 15:13:58 GMT
- Title: Supervised Contrastive Learning as Multi-Objective Optimization for
Fine-Tuning Large Pre-trained Language Models
- Authors: Youness Moukafih, Mounir Ghogho, Kamel Smaili
- Abstract summary: Supervised Contrastive Learning (SCL) has been shown to achieve excellent performance in most classification tasks.
In this work, we formulate the SCL problem as a Multi-Objective Optimization problem for the fine-tuning phase of RoBERTa language model.
- Score: 3.759936323189417
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recently, Supervised Contrastive Learning (SCL) has been shown to achieve
excellent performance in most classification tasks. In SCL, a neural network is
trained to optimize two objectives: pull an anchor and positive samples
together in the embedding space, and push the anchor apart from the negatives.
However, these two different objectives may conflict, requiring trade-offs
between them during optimization. In this work, we formulate the SCL problem as
a Multi-Objective Optimization problem for the fine-tuning phase of RoBERTa
language model. Two methods are utilized to solve the optimization problem: (i)
the linear scalarization (LS) method, which minimizes a weighted linear
combination of pertask losses; and (ii) the Exact Pareto Optimal (EPO) method
which finds the intersection of the Pareto front with a given preference
vector. We evaluate our approach on several GLUE benchmark tasks, without using
data augmentations, memory banks, or generating adversarial examples. The
empirical results show that the proposed learning strategy significantly
outperforms a strong competitive contrastive learning baseline
Related papers
- Unlearning as multi-task optimization: A normalized gradient difference approach with an adaptive learning rate [105.86576388991713]
We introduce a normalized gradient difference (NGDiff) algorithm, enabling us to have better control over the trade-off between the objectives.
We provide a theoretical analysis and empirically demonstrate the superior performance of NGDiff among state-of-the-art unlearning methods on the TOFU and MUSE datasets.
arXiv Detail & Related papers (2024-10-29T14:41:44Z) - Self-supervised Preference Optimization: Enhance Your Language Model with Preference Degree Awareness [27.43137305486112]
We propose a novel Self-supervised Preference Optimization (SPO) framework, which constructs a self-supervised preference degree loss combined with the alignment loss.
The results demonstrate that SPO can be seamlessly integrated with existing preference optimization methods to achieve state-of-the-art performance.
arXiv Detail & Related papers (2024-09-26T12:37:26Z) - Towards Efficient Pareto Set Approximation via Mixture of Experts Based Model Fusion [53.33473557562837]
Solving multi-objective optimization problems for large deep neural networks is a challenging task due to the complexity of the loss landscape and the expensive computational cost.
We propose a practical and scalable approach to solve this problem via mixture of experts (MoE) based model fusion.
By ensembling the weights of specialized single-task models, the MoE module can effectively capture the trade-offs between multiple objectives.
arXiv Detail & Related papers (2024-06-14T07:16:18Z) - UCB-driven Utility Function Search for Multi-objective Reinforcement Learning [75.11267478778295]
In Multi-objective Reinforcement Learning (MORL) agents are tasked with optimising decision-making behaviours.
We focus on the case of linear utility functions parameterised by weight vectors w.
We introduce a method based on Upper Confidence Bound to efficiently search for the most promising weight vectors during different stages of the learning process.
arXiv Detail & Related papers (2024-05-01T09:34:42Z) - Learning Constrained Optimization with Deep Augmented Lagrangian Methods [54.22290715244502]
A machine learning (ML) model is trained to emulate a constrained optimization solver.
This paper proposes an alternative approach, in which the ML model is trained to predict dual solution estimates directly.
It enables an end-to-end training scheme is which the dual objective is as a loss function, and solution estimates toward primal feasibility, emulating a Dual Ascent method.
arXiv Detail & Related papers (2024-03-06T04:43:22Z) - Deep Negative Correlation Classification [82.45045814842595]
Existing deep ensemble methods naively train many different models and then aggregate their predictions.
We propose deep negative correlation classification (DNCC)
DNCC yields a deep classification ensemble where the individual estimator is both accurate and negatively correlated.
arXiv Detail & Related papers (2022-12-14T07:35:20Z) - Provable Stochastic Optimization for Global Contrastive Learning: Small
Batch Does Not Harm Performance [53.49803579981569]
We consider a global objective for contrastive learning, which contrasts each positive pair with all negative pairs for an anchor point.
Existing methods such as SimCLR requires a large batch size in order to achieve a satisfactory result.
We propose a memory-efficient optimization algorithm for solving the Global Contrastive Learning of Representations, named SogCLR.
arXiv Detail & Related papers (2022-02-24T22:16:53Z) - Decoupled Contrastive Learning [23.25775900388382]
We identify a noticeable negative-positive-coupling (NPC) effect in the widely used cross-entropy (InfoNCE) loss.
By properly addressing the NPC effect, we reach a decoupled contrastive learning (DCL) objective function.
Our approach achieves $66.9%$ ImageNet top-1 accuracy using batch size 256 within 200 epochs pre-training, outperforming its baseline SimCLR by $5.1%$.
arXiv Detail & Related papers (2021-10-13T16:38:43Z) - LoOp: Looking for Optimal Hard Negative Embeddings for Deep Metric
Learning [17.571160136568455]
We propose a novel approach that looks for optimal hard negatives (LoOp) in the embedding space.
Unlike mining-based methods, our approach considers the entire space between pairs of embeddings to calculate the optimal hard negatives.
arXiv Detail & Related papers (2021-08-20T19:21:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.