Cross-domain and Cross-dimension Learning for Image-to-Graph Transformers
- URL: http://arxiv.org/abs/2403.06601v2
- Date: Thu, 05 Dec 2024 15:19:47 GMT
- Title: Cross-domain and Cross-dimension Learning for Image-to-Graph Transformers
- Authors: Alexander H. Berger, Laurin Lux, Suprosanna Shit, Ivan Ezhov, Georgios Kaissis, Martin J. Menten, Daniel Rueckert, Johannes C. Paetzold,
- Abstract summary: Direct image-to-graph transformation is a challenging task that involves solving object detection and relationship prediction in a single model.
Due to this task's complexity, large training datasets are rare in many domains, making the training of deep-learning methods challenging.
We introduce a set of methods enabling cross-domain and cross-dimension learning for image-to-graph transformers.
- Score: 48.74331852418905
- License:
- Abstract: Direct image-to-graph transformation is a challenging task that involves solving object detection and relationship prediction in a single model. Due to this task's complexity, large training datasets are rare in many domains, making the training of deep-learning methods challenging. This data sparsity necessitates transfer learning strategies akin to the state-of-the-art in general computer vision. In this work, we introduce a set of methods enabling cross-domain and cross-dimension learning for image-to-graph transformers. We propose (1) a regularized edge sampling loss to effectively learn object relations in multiple domains with different numbers of edges, (2) a domain adaptation framework for image-to-graph transformers aligning image- and graph-level features from different domains, and (3) a projection function that allows using 2D data for training 3D transformers. We demonstrate our method's utility in cross-domain and cross-dimension experiments, where we utilize labeled data from 2D road networks for simultaneous learning in vastly different target domains. Our method consistently outperforms standard transfer learning and self-supervised pretraining on challenging benchmarks, such as retinal or whole-brain vessel graph extraction.
Related papers
- Boosting Cross-Domain Point Classification via Distilling Relational Priors from 2D Transformers [59.0181939916084]
Traditional 3D networks mainly focus on local geometric details and ignore the topological structure between local geometries.
We propose a novel Priors Distillation (RPD) method to extract priors from the well-trained transformers on massive images.
Experiments on the PointDA-10 and the Sim-to-Real datasets verify that the proposed method consistently achieves the state-of-the-art performance of UDA for point cloud classification.
arXiv Detail & Related papers (2024-07-26T06:29:09Z) - Progressive Transformation Learning for Leveraging Virtual Images in
Training [21.590496842692744]
We introduce Progressive Transformation Learning (PTL) to augment a training dataset by adding transformed virtual images with enhanced realism.
PTL takes a novel approach that progressively iterates the following three steps: 1) select a subset from a pool of virtual images according to the domain gap, 2) transform the selected virtual images to enhance realism, and 3) add the transformed virtual images to the training set while removing them from the pool.
Experiments show that PTL results in a substantial performance increase over the baseline, especially in the small data and the cross-domain regime.
arXiv Detail & Related papers (2022-11-03T13:04:15Z) - Unsupervised Domain Adaptation with Histogram-gated Image Translation
for Delayered IC Image Analysis [2.720699926154399]
Histogram-gated Image Translation (HGIT) is an unsupervised domain adaptation framework which transforms images from a given source dataset to the domain of a target dataset.
Our method achieves the best performance compared to the reported domain adaptation techniques, and is also reasonably close to the fully supervised benchmark.
arXiv Detail & Related papers (2022-09-27T15:53:22Z) - Two-Stream Graph Convolutional Network for Intra-oral Scanner Image
Segmentation [133.02190910009384]
We propose a two-stream graph convolutional network (i.e., TSGCN) to handle inter-view confusion between different raw attributes.
Our TSGCN significantly outperforms state-of-the-art methods in 3D tooth (surface) segmentation.
arXiv Detail & Related papers (2022-04-19T10:41:09Z) - Self-Supervised Learning of Domain Invariant Features for Depth
Estimation [35.74969527929284]
We tackle the problem of unsupervised synthetic-to-realistic domain adaptation for single image depth estimation.
An essential building block of single image depth estimation is an encoder-decoder task network that takes RGB images as input and produces depth maps as output.
We propose a novel training strategy to force the task network to learn domain invariant representations in a self-supervised manner.
arXiv Detail & Related papers (2021-06-04T16:45:48Z) - Six-channel Image Representation for Cross-domain Object Detection [17.854940064699985]
Deep learning models are data-driven and the excellent performance is highly dependent on the abundant and diverse datasets.
Some image-to-image translation techniques are employed to generate some fake data of some specific scenes to train the models.
We propose to inspire the original 3-channel images and their corresponding GAN-generated fake images to form 6-channel representations of the dataset.
arXiv Detail & Related papers (2021-01-03T04:50:03Z) - Adversarial Bipartite Graph Learning for Video Domain Adaptation [50.68420708387015]
Domain adaptation techniques, which focus on adapting models between distributionally different domains, are rarely explored in the video recognition area.
Recent works on visual domain adaptation which leverage adversarial learning to unify the source and target video representations are not highly effective on the videos.
This paper proposes an Adversarial Bipartite Graph (ABG) learning framework which directly models the source-target interactions.
arXiv Detail & Related papers (2020-07-31T03:48:41Z) - Domain Adaptation with Morphologic Segmentation [8.0698976170854]
We present a novel domain adaptation framework that uses morphologic segmentation to translate images from arbitrary input domains (real and synthetic) into a uniform output domain.
Our goal is to establish a preprocessing step that unifies data from multiple sources into a common representation.
We showcase the effectiveness of our approach by qualitatively and quantitatively evaluating our method on four data sets of simulated and real data of urban scenes.
arXiv Detail & Related papers (2020-06-16T17:06:02Z) - Learning Deformable Image Registration from Optimization: Perspective,
Modules, Bilevel Training and Beyond [62.730497582218284]
We develop a new deep learning based framework to optimize a diffeomorphic model via multi-scale propagation.
We conduct two groups of image registration experiments on 3D volume datasets including image-to-atlas registration on brain MRI data and image-to-image registration on liver CT data.
arXiv Detail & Related papers (2020-04-30T03:23:45Z) - CrDoCo: Pixel-level Domain Transfer with Cross-Domain Consistency [119.45667331836583]
Unsupervised domain adaptation algorithms aim to transfer the knowledge learned from one domain to another.
We present a novel pixel-wise adversarial domain adaptation algorithm.
arXiv Detail & Related papers (2020-01-09T19:00:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.