Related papers: SimMAT: Exploring Transferability from Vision Foundation Models to Any Image Modality

SimMAT: Exploring Transferability from Vision Foundation Models to Any Image Modality

URL: http://arxiv.org/abs/2409.08083v1
Date: Thu, 12 Sep 2024 14:38:21 GMT
Title: SimMAT: Exploring Transferability from Vision Foundation Models to Any Image Modality
Authors: Chenyang Lei, Liyi Chen, Jun Cen, Xiao Chen, Zhen Lei, Felix Heide, Ziwei Liu, Qifeng Chen, Zhaoxiang Zhang,
Abstract summary: Foundation models like ChatGPT and Sora that are trained on a huge scale of data have made a revolutionary social impact. It is extremely challenging for sensors in many different fields to collect similar scales of natural images to train strong foundation models. This work presents a simple and effective framework SimMAT to study an open problem: the transferability from vision foundation models trained on natural RGB images to other image modalities of different physical properties.
Score: 136.82569085134554
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Foundation models like ChatGPT and Sora that are trained on a huge scale of data have made a revolutionary social impact. However, it is extremely challenging for sensors in many different fields to collect similar scales of natural images to train strong foundation models. To this end, this work presents a simple and effective framework SimMAT to study an open problem: the transferability from vision foundation models trained on natural RGB images to other image modalities of different physical properties (e.g., polarization). SimMAT consists of a modality-agnostic transfer layer (MAT) and a pretrained foundation model. We apply SimMAT to a representative vision foundation model Segment Anything Model (SAM) to support any evaluated new image modality. Given the absence of relevant benchmarks, we construct a new benchmark to evaluate the transfer learning performance. Our experiments confirm the intriguing potential of transferring vision foundation models in enhancing other sensors' performance. Specifically, SimMAT can improve the segmentation performance (mIoU) from 22.15% to 53.88% on average for evaluated modalities and consistently outperforms other baselines. We hope that SimMAT can raise awareness of cross-modal transfer learning and benefit various fields for better results with vision foundation models.

Related papers

SimCMF: A Simple Cross-modal Fine-tuning Strategy from Vision Foundation Models to Any Imaging Modality [116.54152244934775]
Foundation models like ChatGPT and Sora that are trained on a huge scale of data have made a revolutionary social impact. It is extremely challenging for sensors in many different fields to collect similar scales of natural images to train strong foundation models. This work presents a simple and effective framework, SimCMF, to study an important problem: cross-modal fine-tuning from vision foundation models trained on natural RGB images to other imaging modalities of different physical properties.
arXiv Detail & Related papers (2024-11-27T16:35:58Z)
Deep Domain Adaptation: A Sim2Real Neural Approach for Improving Eye-Tracking Systems [80.62854148838359]
Eye image segmentation is a critical step in eye tracking that has great influence over the final gaze estimate. We use dimensionality-reduction techniques to measure the overlap between the target eye images and synthetic training data. Our methods result in robust, improved performance when tackling the discrepancy between simulation and real-world data samples.
arXiv Detail & Related papers (2024-03-23T22:32:06Z)
MTP: Advancing Remote Sensing Foundation Model via Multi-Task Pretraining [73.81862342673894]
Foundation models have reshaped the landscape of Remote Sensing (RS) by enhancing various image interpretation tasks. transferring the pretrained models to downstream tasks may encounter task discrepancy due to their formulation of pretraining as image classification or object discrimination tasks. We conduct multi-task supervised pretraining on the SAMRS dataset, encompassing semantic segmentation, instance segmentation, and rotated object detection. Our models are finetuned on various RS downstream tasks, such as scene classification, horizontal and rotated object detection, semantic segmentation, and change detection.
arXiv Detail & Related papers (2024-03-20T09:17:22Z)
Diversity is Definitely Needed: Improving Model-Agnostic Zero-shot Classification via Stable Diffusion [22.237426507711362]
Model-Agnostic Zero-Shot Classification (MA-ZSC) refers to training non-specific classification architectures to classify real images without using any real images during training. Recent research has demonstrated that generating synthetic training images using diffusion models provides a potential solution to address MA-ZSC. We propose modifications to the text-to-image generation process using a pre-trained diffusion model to enhance diversity.
arXiv Detail & Related papers (2023-02-07T07:13:53Z)
sim2real: Cardiac MR Image Simulation-to-Real Translation via Unsupervised GANs [0.4433315630787158]
We provide image simulation on virtual XCAT subjects with varying anatomies. We propose sim2real translation network to improve image realism. Our usability experiments suggest that sim2real data exhibits a good potential to augment training data and boost the performance of a segmentation algorithm.
arXiv Detail & Related papers (2022-08-09T16:06:06Z)
Advancing Plain Vision Transformer Towards Remote Sensing Foundation Model [97.9548609175831]
We resort to plain vision transformers with about 100 million parameters and make the first attempt to propose large vision models customized for remote sensing tasks. Specifically, to handle the large image size and objects of various orientations in RS images, we propose a new rotated varied-size window attention. Experiments on detection tasks demonstrate the superiority of our model over all state-of-the-art models, achieving 81.16% mAP on the DOTA-V1.0 dataset.
arXiv Detail & Related papers (2022-08-08T09:08:40Z)
Task2Sim : Towards Effective Pre-training and Transfer from Synthetic Data [74.66568380558172]
We study the transferability of pre-trained models based on synthetic data generated by graphics simulators to downstream tasks. We introduce Task2Sim, a unified model mapping downstream task representations to optimal simulation parameters. It learns this mapping by training to find the set of best parameters on a set of "seen" tasks. Once trained, it can then be used to predict best simulation parameters for novel "unseen" tasks in one shot.
arXiv Detail & Related papers (2021-11-30T19:25:27Z)
Contrastive Learning Meets Transfer Learning: A Case Study In Medical Image Analysis [2.4050073971195003]
Annotated medical images are typically rarer than labeled natural images since they are limited by domain knowledge and privacy constraints. Recent advances in transfer and contrastive learning have provided effective solutions to tackle such issues from different perspectives. It would be appealing to accelerate contrastive learning with transfer learning, given that slow convergence speed is a critical limitation of modern contrastive learning approaches.
arXiv Detail & Related papers (2021-03-04T17:19:54Z)
SimAug: Learning Robust Representations from Simulation for Trajectory Prediction [78.91518036949918]
We propose a novel approach to learn robust representation through augmenting the simulation training data. We show that SimAug achieves promising results on three real-world benchmarks using zero real training data.
arXiv Detail & Related papers (2020-04-04T21:22:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.