Efficient Domain Adaptation of Multimodal Embeddings using Constrastive Learning
- URL: http://arxiv.org/abs/2502.02048v1
- Date: Tue, 04 Feb 2025 06:30:12 GMT
- Title: Efficient Domain Adaptation of Multimodal Embeddings using Constrastive Learning
- Authors: Georgios Margaritis, Periklis Petridis, Dimitris J. Bertsimas,
- Abstract summary: Current approaches either yield subpar results when using pretrained models without task-specific adaptation, or require substantial computational resources for fine-tuning.
We propose a novel method for adapting foundational, multimodal embeddings to downstream tasks, without the need of expensive fine-tuning processes.
- Score: 0.08192907805418582
- License:
- Abstract: Recent advancements in machine learning (ML), natural language processing (NLP), and foundational models have shown promise for real-life applications in critical, albeit compute-constrainted fields like healthcare. In such areas, combining foundational models with supervised ML offers potential for automating tasks like diagnosis and treatment planning, but the limited availability of onsite computational resources pose significant challenges before applying these technologies effectively: Current approaches either yield subpar results when using pretrained models without task-specific adaptation, or require substantial computational resources for fine-tuning, which is often a barrier to entry in such environments. This renders them inaccessible in applications where performance and quality standards are high, but computational resources are scarce. To bridge the gap between best-in-class performance and accessibility, we propose a novel method for adapting foundational, multimodal embeddings to downstream tasks, without the need of expensive fine-tuning processes. Our method leverages frozen embeddings from Large Language Models (LLMs) and Vision Models, and uses contrastive learning to train a small, task-specific nonlinear projection that can be used in the downstream task, without having to fine-tune the original foundational models. We show that this efficient procedure leads to significant performance improvements across various downstream tasks, and perhaps more importantly with minimal computational overhead, offering a practical solution for the use of advanced, foundational ML models in resource-constrained settings.
Related papers
- Q-SFT: Q-Learning for Language Models via Supervised Fine-Tuning [62.984693936073974]
Value-based reinforcement learning can learn effective policies for a wide range of multi-turn problems.
Current value-based RL methods have proven particularly challenging to scale to the setting of large language models.
We propose a novel offline RL algorithm that addresses these drawbacks, casting Q-learning as a modified supervised fine-tuning problem.
arXiv Detail & Related papers (2024-11-07T21:36:52Z) - MIREncoder: Multi-modal IR-based Pretrained Embeddings for Performance Optimizations [6.919817502555546]
In this paper, we propose MIREncoder, a Multi-modal IR-based Auto-Encoder that can be pre-trained to generate a learned embedding space.
A multi-modal approach enables us to better extract features from compilable programs.
Our evaluations will show that our proposed approach can outperform the state of the art while reducing overhead.
arXiv Detail & Related papers (2024-07-02T13:00:19Z) - AXOLOTL: Fairness through Assisted Self-Debiasing of Large Language
Model Outputs [20.772266479533776]
AXOLOTL is a novel post-processing framework that operates agnostically across tasks and models.
It identifies biases, proposes resolutions, and guides the model to self-debias its outputs.
This approach minimizes computational costs and preserves model performance.
arXiv Detail & Related papers (2024-03-01T00:02:37Z) - A Multi-Head Ensemble Multi-Task Learning Approach for Dynamical
Computation Offloading [62.34538208323411]
We propose a multi-head ensemble multi-task learning (MEMTL) approach with a shared backbone and multiple prediction heads (PHs)
MEMTL outperforms benchmark methods in both the inference accuracy and mean square error without requiring additional training data.
arXiv Detail & Related papers (2023-09-02T11:01:16Z) - Multi-Objective Optimization for Sparse Deep Multi-Task Learning [0.0]
We present a Multi-Objective Optimization algorithm using a modified Weighted Chebyshev scalarization for training Deep Neural Networks (DNNs)
Our work aims to address the (economical and also ecological) sustainability issue of DNN models, with particular focus on Deep Multi-Task models.
arXiv Detail & Related papers (2023-08-23T16:42:27Z) - Towards Efficient Task-Driven Model Reprogramming with Foundation Models [52.411508216448716]
Vision foundation models exhibit impressive power, benefiting from the extremely large model capacity and broad training data.
However, in practice, downstream scenarios may only support a small model due to the limited computational resources or efficiency considerations.
This brings a critical challenge for the real-world application of foundation models: one has to transfer the knowledge of a foundation model to the downstream task.
arXiv Detail & Related papers (2023-04-05T07:28:33Z) - Improving Multi-task Learning via Seeking Task-based Flat Regions [38.28600737969538]
Multi-Task Learning (MTL) is a powerful learning paradigm for training deep neural networks that allows learning more than one objective by a single backbone.
There is an emerging line of work in MTL that focuses on manipulating the task gradient to derive an ultimate gradient descent direction.
We propose to leverage a recently introduced training method, named Sharpness-aware Minimization, which can enhance model generalization ability on single-task learning.
arXiv Detail & Related papers (2022-11-24T17:19:30Z) - Model-Agnostic Multitask Fine-tuning for Few-shot Vision-Language
Transfer Learning [59.38343286807997]
We propose Model-Agnostic Multitask Fine-tuning (MAMF) for vision-language models on unseen tasks.
Compared with model-agnostic meta-learning (MAML), MAMF discards the bi-level optimization and uses only first-order gradients.
We show that MAMF consistently outperforms the classical fine-tuning method for few-shot transfer learning on five benchmark datasets.
arXiv Detail & Related papers (2022-03-09T17:26:53Z) - Model Reprogramming: Resource-Efficient Cross-Domain Machine Learning [65.268245109828]
In data-rich domains such as vision, language, and speech, deep learning prevails to deliver high-performance task-specific models.
Deep learning in resource-limited domains still faces multiple challenges including (i) limited data, (ii) constrained model development cost, and (iii) lack of adequate pre-trained models for effective finetuning.
Model reprogramming enables resource-efficient cross-domain machine learning by repurposing a well-developed pre-trained model from a source domain to solve tasks in a target domain without model finetuning.
arXiv Detail & Related papers (2022-02-22T02:33:54Z) - Offline Model-Based Optimization via Normalized Maximum Likelihood
Estimation [101.22379613810881]
We consider data-driven optimization problems where one must maximize a function given only queries at a fixed set of points.
This problem setting emerges in many domains where function evaluation is a complex and expensive process.
We propose a tractable approximation that allows us to scale our method to high-capacity neural network models.
arXiv Detail & Related papers (2021-02-16T06:04:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.