Deep Multimodal Transfer-Learned Regression in Data-Poor Domains
- URL: http://arxiv.org/abs/2006.09310v1
- Date: Tue, 16 Jun 2020 16:52:44 GMT
- Title: Deep Multimodal Transfer-Learned Regression in Data-Poor Domains
- Authors: Levi McClenny, Mulugeta Haile, Vahid Attari, Brian Sadler, Ulisses
Braga-Neto, Raymundo Arroyave
- Abstract summary: We propose a Deep Multimodal Transfer-Learned Regressor (DMTL-R) for multimodal learning of image and feature data.
Our model is capable of fine-tuning a given set of pre-trained CNN weights on a small amount of training image data.
We present results using phase-field simulation microstructure images with an accompanying set of physical features, using pre-trained weights from various well-known CNN architectures.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In many real-world applications of deep learning, estimation of a target may
rely on various types of input data modes, such as audio-video, image-text,
etc. This task can be further complicated by a lack of sufficient data. Here we
propose a Deep Multimodal Transfer-Learned Regressor (DMTL-R) for multimodal
learning of image and feature data in a deep regression architecture effective
at predicting target parameters in data-poor domains. Our model is capable of
fine-tuning a given set of pre-trained CNN weights on a small amount of
training image data, while simultaneously conditioning on feature information
from a complimentary data mode during network training, yielding more accurate
single-target or multi-target regression than can be achieved using the images
or the features alone. We present results using phase-field simulation
microstructure images with an accompanying set of physical features, using
pre-trained weights from various well-known CNN architectures, which
demonstrate the efficacy of the proposed multimodal approach.
Related papers
- Few-shot Online Anomaly Detection and Segmentation [29.693357653538474]
This paper focuses on addressing the challenging yet practical few-shot online anomaly detection and segmentation (FOADS) task.
Under the FOADS framework, models are trained on a few-shot normal dataset, followed by inspection and improvement of their capabilities by leveraging unlabeled streaming data containing both normal and abnormal samples simultaneously.
In order to achieve improved performance with limited training samples, we employ multi-scale feature embedding extracted from a CNN pre-trained on ImageNet to obtain a robust representation.
arXiv Detail & Related papers (2024-03-27T02:24:00Z) - MTP: Advancing Remote Sensing Foundation Model via Multi-Task Pretraining [73.81862342673894]
Foundation models have reshaped the landscape of Remote Sensing (RS) by enhancing various image interpretation tasks.
transferring the pretrained models to downstream tasks may encounter task discrepancy due to their formulation of pretraining as image classification or object discrimination tasks.
We conduct multi-task supervised pretraining on the SAMRS dataset, encompassing semantic segmentation, instance segmentation, and rotated object detection.
Our models are finetuned on various RS downstream tasks, such as scene classification, horizontal and rotated object detection, semantic segmentation, and change detection.
arXiv Detail & Related papers (2024-03-20T09:17:22Z) - Rethinking Transformers Pre-training for Multi-Spectral Satellite
Imagery [78.43828998065071]
Recent advances in unsupervised learning have demonstrated the ability of large vision models to achieve promising results on downstream tasks.
Such pre-training techniques have also been explored recently in the remote sensing domain due to the availability of large amount of unlabelled data.
In this paper, we re-visit transformers pre-training and leverage multi-scale information that is effectively utilized with multiple modalities.
arXiv Detail & Related papers (2024-03-08T16:18:04Z) - Delving Deeper into Data Scaling in Masked Image Modeling [145.36501330782357]
We conduct an empirical study on the scaling capability of masked image modeling (MIM) methods for visual recognition.
Specifically, we utilize the web-collected Coyo-700M dataset.
Our goal is to investigate how the performance changes on downstream tasks when scaling with different sizes of data and models.
arXiv Detail & Related papers (2023-05-24T15:33:46Z) - Learning with Multigraph Convolutional Filters [153.20329791008095]
We introduce multigraph convolutional neural networks (MGNNs) as stacked and layered structures where information is processed according to an MSP model.
We also develop a procedure for tractable computation of filter coefficients in the MGNNs and a low cost method to reduce the dimensionality of the information transferred between layers.
arXiv Detail & Related papers (2022-10-28T17:00:50Z) - Learning Deformable Image Registration from Optimization: Perspective,
Modules, Bilevel Training and Beyond [62.730497582218284]
We develop a new deep learning based framework to optimize a diffeomorphic model via multi-scale propagation.
We conduct two groups of image registration experiments on 3D volume datasets including image-to-atlas registration on brain MRI data and image-to-image registration on liver CT data.
arXiv Detail & Related papers (2020-04-30T03:23:45Z) - Multiresolution Convolutional Autoencoders [5.0169726108025445]
We propose a multi-resolution convolutional autoencoder architecture that integrates and leverages three successful mathematical architectures.
Basic learning techniques are applied to ensure information learned from previous training steps can be rapidly transferred to the larger network.
The performance gains are illustrated through a sequence of numerical experiments on synthetic examples and real-world spatial data.
arXiv Detail & Related papers (2020-04-10T08:31:59Z) - Learning Enriched Features for Real Image Restoration and Enhancement [166.17296369600774]
convolutional neural networks (CNNs) have achieved dramatic improvements over conventional approaches for image restoration task.
We present a novel architecture with the collective goals of maintaining spatially-precise high-resolution representations through the entire network.
Our approach learns an enriched set of features that combines contextual information from multiple scales, while simultaneously preserving the high-resolution spatial details.
arXiv Detail & Related papers (2020-03-15T11:04:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.