Related papers: Supervised Contrastive Learning as Multi-Objective Optimization for Fine-Tuning Large Pre-trained Language Models

Supervised Contrastive Learning as Multi-Objective Optimization for Fine-Tuning Large Pre-trained Language Models

URL: http://arxiv.org/abs/2209.14161v1
Date: Wed, 28 Sep 2022 15:13:58 GMT
Title: Supervised Contrastive Learning as Multi-Objective Optimization for Fine-Tuning Large Pre-trained Language Models
Authors: Youness Moukafih, Mounir Ghogho, Kamel Smaili
Abstract summary: Supervised Contrastive Learning (SCL) has been shown to achieve excellent performance in most classification tasks. In this work, we formulate the SCL problem as a Multi-Objective Optimization problem for the fine-tuning phase of RoBERTa language model.
Score: 3.759936323189417
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recently, Supervised Contrastive Learning (SCL) has been shown to achieve excellent performance in most classification tasks. In SCL, a neural network is trained to optimize two objectives: pull an anchor and positive samples together in the embedding space, and push the anchor apart from the negatives. However, these two different objectives may conflict, requiring trade-offs between them during optimization. In this work, we formulate the SCL problem as a Multi-Objective Optimization problem for the fine-tuning phase of RoBERTa language model. Two methods are utilized to solve the optimization problem: (i) the linear scalarization (LS) method, which minimizes a weighted linear combination of pertask losses; and (ii) the Exact Pareto Optimal (EPO) method which finds the intersection of the Pareto front with a given preference vector. We evaluate our approach on several GLUE benchmark tasks, without using data augmentations, memory banks, or generating adversarial examples. The empirical results show that the proposed learning strategy significantly outperforms a strong competitive contrastive learning baseline

Related papers

Make Optimization Once and for All with Fine-grained Guidance [78.14885351827232]
Learning to Optimize (L2O) enhances optimization efficiency with integrated neural networks. L2O paradigms achieve great outcomes, e.g., refitting, generating unseen solutions iteratively or directly. Our analyses explore general framework for learning optimization, called Diff-L2O, focusing on augmenting solutions from a wider view.
arXiv Detail & Related papers (2025-03-14T14:48:12Z)
Unlearning as multi-task optimization: A normalized gradient difference approach with an adaptive learning rate [105.86576388991713]
We introduce a normalized gradient difference (NGDiff) algorithm, enabling us to have better control over the trade-off between the objectives. We provide a theoretical analysis and empirically demonstrate the superior performance of NGDiff among state-of-the-art unlearning methods on the TOFU and MUSE datasets.
arXiv Detail & Related papers (2024-10-29T14:41:44Z)
Self-supervised Preference Optimization: Enhance Your Language Model with Preference Degree Awareness [27.43137305486112]
We propose a novel Self-supervised Preference Optimization (SPO) framework, which constructs a self-supervised preference degree loss combined with the alignment loss. The results demonstrate that SPO can be seamlessly integrated with existing preference optimization methods to achieve state-of-the-art performance.
arXiv Detail & Related papers (2024-09-26T12:37:26Z)
Towards Efficient Pareto Set Approximation via Mixture of Experts Based Model Fusion [53.33473557562837]
Solving multi-objective optimization problems for large deep neural networks is a challenging task due to the complexity of the loss landscape and the expensive computational cost. We propose a practical and scalable approach to solve this problem via mixture of experts (MoE) based model fusion. By ensembling the weights of specialized single-task models, the MoE module can effectively capture the trade-offs between multiple objectives.
arXiv Detail & Related papers (2024-06-14T07:16:18Z)
UCB-driven Utility Function Search for Multi-objective Reinforcement Learning [75.11267478778295]
In Multi-objective Reinforcement Learning (MORL) agents are tasked with optimising decision-making behaviours. We focus on the case of linear utility functions parameterised by weight vectors w. We introduce a method based on Upper Confidence Bound to efficiently search for the most promising weight vectors during different stages of the learning process.
arXiv Detail & Related papers (2024-05-01T09:34:42Z)
Learning Constrained Optimization with Deep Augmented Lagrangian Methods [54.22290715244502]
A machine learning (ML) model is trained to emulate a constrained optimization solver. This paper proposes an alternative approach, in which the ML model is trained to predict dual solution estimates directly. It enables an end-to-end training scheme is which the dual objective is as a loss function, and solution estimates toward primal feasibility, emulating a Dual Ascent method.
arXiv Detail & Related papers (2024-03-06T04:43:22Z)
Deep Negative Correlation Classification [82.45045814842595]
Existing deep ensemble methods naively train many different models and then aggregate their predictions. We propose deep negative correlation classification (DNCC) DNCC yields a deep classification ensemble where the individual estimator is both accurate and negatively correlated.
arXiv Detail & Related papers (2022-12-14T07:35:20Z)
Provable Stochastic Optimization for Global Contrastive Learning: Small Batch Does Not Harm Performance [53.49803579981569]
We consider a global objective for contrastive learning, which contrasts each positive pair with all negative pairs for an anchor point. Existing methods such as SimCLR requires a large batch size in order to achieve a satisfactory result. We propose a memory-efficient optimization algorithm for solving the Global Contrastive Learning of Representations, named SogCLR.
arXiv Detail & Related papers (2022-02-24T22:16:53Z)
Decoupled Contrastive Learning [23.25775900388382]
We identify a noticeable negative-positive-coupling (NPC) effect in the widely used cross-entropy (InfoNCE) loss. By properly addressing the NPC effect, we reach a decoupled contrastive learning (DCL) objective function. Our approach achieves $66.9%$ ImageNet top-1 accuracy using batch size 256 within 200 epochs pre-training, outperforming its baseline SimCLR by $5.1%$.
arXiv Detail & Related papers (2021-10-13T16:38:43Z)
LoOp: Looking for Optimal Hard Negative Embeddings for Deep Metric Learning [17.571160136568455]
We propose a novel approach that looks for optimal hard negatives (LoOp) in the embedding space. Unlike mining-based methods, our approach considers the entire space between pairs of embeddings to calculate the optimal hard negatives.
arXiv Detail & Related papers (2021-08-20T19:21:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.