Cross-Modal Fine-Tuning: Align then Refine
- URL: http://arxiv.org/abs/2302.05738v2
- Date: Sat, 18 Mar 2023 17:05:04 GMT
- Title: Cross-Modal Fine-Tuning: Align then Refine
- Authors: Junhong Shen, Liam Li, Lucio M. Dery, Corey Staten, Mikhail Khodak,
Graham Neubig, Ameet Talwalkar
- Abstract summary: ORCA is a cross-modal fine-tuning framework that extends the applicability of a single large-scale pretrained model to diverse modalities.
We show that ORCA obtains state-of-the-art results on 3 benchmarks containing over 60 datasets from 12 modalities.
- Score: 83.37294254884446
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Fine-tuning large-scale pretrained models has led to tremendous progress in
well-studied modalities such as vision and NLP. However, similar gains have not
been observed in many other modalities due to a lack of relevant pretrained
models. In this work, we propose ORCA, a general cross-modal fine-tuning
framework that extends the applicability of a single large-scale pretrained
model to diverse modalities. ORCA adapts to a target task via an
align-then-refine workflow: given the target input, ORCA first learns an
embedding network that aligns the embedded feature distribution with the
pretraining modality. The pretrained model is then fine-tuned on the embedded
data to exploit the knowledge shared across modalities. Through extensive
experiments, we show that ORCA obtains state-of-the-art results on 3 benchmarks
containing over 60 datasets from 12 modalities, outperforming a wide range of
hand-designed, AutoML, general-purpose, and task-specific methods. We highlight
the importance of data alignment via a series of ablation studies and
demonstrate ORCA's utility in data-limited regimes.
Related papers
- On Machine Learning Approaches for Protein-Ligand Binding Affinity Prediction [2.874893537471256]
This study evaluates the performance of classical tree-based models and advanced neural networks in protein-ligand binding affinity prediction.
We show that combining 2D and 3D model strengths improves active learning outcomes beyond current state-of-the-art approaches.
arXiv Detail & Related papers (2024-07-15T13:06:00Z) - Towards a Generalist and Blind RGB-X Tracker [91.36268768952755]
We develop a single model tracker that can remain blind to any modality X during inference time.
Our training process is extremely simple, integrating multi-label classification loss with a routing function.
Our generalist and blind tracker can achieve competitive performance compared to well-established modal-specific models.
arXiv Detail & Related papers (2024-05-28T03:00:58Z) - A Three-Phases SFT Hybrid Model Integrated Strong Prior Module and Data Overlap Estimation in the Eduation Context [0.0]
We propose an end-to-end prior-based three-phases supervised fine-tuned model.
Our model realizes the structural disassembly and incremental guided output of educational knowledge.
Our model also achieves state-of-the-art in code abilities compared to open-source models.
arXiv Detail & Related papers (2024-03-13T05:38:39Z) - Beyond Transfer Learning: Co-finetuning for Action Localisation [64.07196901012153]
We propose co-finetuning -- simultaneously training a single model on multiple upstream'' and downstream'' tasks.
We demonstrate that co-finetuning outperforms traditional transfer learning when using the same total amount of data.
We also show how we can easily extend our approach to multiple upstream'' datasets to further improve performance.
arXiv Detail & Related papers (2022-07-08T10:25:47Z) - On Modality Bias Recognition and Reduction [70.69194431713825]
We study the modality bias problem in the context of multi-modal classification.
We propose a plug-and-play loss function method, whereby the feature space for each label is adaptively learned.
Our method yields remarkable performance improvements compared with the baselines.
arXiv Detail & Related papers (2022-02-25T13:47:09Z) - Few-Shot Named Entity Recognition: A Comprehensive Study [92.40991050806544]
We investigate three schemes to improve the model generalization ability for few-shot settings.
We perform empirical comparisons on 10 public NER datasets with various proportions of labeled data.
We create new state-of-the-art results on both few-shot and training-free settings.
arXiv Detail & Related papers (2020-12-29T23:43:16Z) - Knowledge as Priors: Cross-Modal Knowledge Generalization for Datasets
without Superior Knowledge [55.32035138692167]
Cross-modal knowledge distillation deals with transferring knowledge from a model trained with superior modalities to another model trained with weak modalities.
We propose a novel scheme to train the Student in a Target dataset where the Teacher is unavailable.
arXiv Detail & Related papers (2020-04-01T00:28:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.