InterNet: Unsupervised Cross-modal Homography Estimation Based on Interleaved Modality Transfer and Self-supervised Homography Prediction
- URL: http://arxiv.org/abs/2409.17993v2
- Date: Fri, 27 Sep 2024 02:35:47 GMT
- Title: InterNet: Unsupervised Cross-modal Homography Estimation Based on Interleaved Modality Transfer and Self-supervised Homography Prediction
- Authors: Junchen Yu, Si-Yuan Cao, Runmin Zhang, Chenghao Zhang, Jianxin Hu, Zhu Yu, Beinan Yu, Hui-liang Shen,
- Abstract summary: InterNet integrates modality transfer and self-supervised homography estimation.
InterNet achieves the state-of-the-art (SOTA) performance among unsupervised methods.
- Score: 9.313783457777125
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose a novel unsupervised cross-modal homography estimation framework, based on interleaved modality transfer and self-supervised homography prediction, named InterNet. InterNet integrates modality transfer and self-supervised homography estimation, introducing an innovative interleaved optimization framework to alternately promote both components. The modality transfer gradually narrows the modality gaps, facilitating the self-supervised homography estimation to fully leverage the synthetic intra-modal data. The self-supervised homography estimation progressively achieves reliable predictions, thereby providing robust cross-modal supervision for the modality transfer. To further boost the estimation accuracy, we also formulate a fine-grained homography feature loss to improve the connection between two components. Furthermore, we employ a simple yet effective distillation training technique to reduce model parameters and improve cross-domain generalization ability while maintaining comparable performance. Experiments reveal that InterNet achieves the state-of-the-art (SOTA) performance among unsupervised methods, and even outperforms many supervised methods such as MHN and LocalTrans.
Related papers
- An Intra- and Cross-frame Topological Consistency Scheme for Semi-supervised Atherosclerotic Coronary Plaque Segmentation [9.902034502749501]
Coronary atherosclerosis Analysis (CAA) relies on the analysis of vessel cross-section images reconstructed via Curved Planar Reformation.
This task presents significant challenges due to the indistinct boundaries and structures of plaques and blood vessels.
We propose a novel dual-consistency semi-supervised framework that integrates Intra-frame Topological Consistency (ITC) and Cross-frame Topological Consistency (CTC)
Our method surpasses existing semi-supervised methods and approaches the performance of supervised methods on CAA.
arXiv Detail & Related papers (2025-01-14T05:23:42Z) - Efficient Text-driven Motion Generation via Latent Consistency Training [21.348658259929053]
We propose a motion latent consistency training framework (MLCT) to solve nonlinear reverse diffusion trajectories.
By combining these enhancements, we achieve stable and consistency training in non-pixel modality and latent representation spaces.
arXiv Detail & Related papers (2024-05-05T02:11:57Z) - Motion-Scenario Decoupling for Rat-Aware Video Position Prediction:
Strategy and Benchmark [49.58762201363483]
We introduce RatPose, a bio-robot motion prediction dataset constructed by considering the influence factors of individuals and environments.
We propose a Dual-stream Motion-Scenario Decoupling framework that effectively separates scenario-oriented and motion-oriented features.
We demonstrate significant performance improvements of the proposed textitDMSD framework on different difficulty-level tasks.
arXiv Detail & Related papers (2023-05-17T14:14:31Z) - Interpolation-based Correlation Reduction Network for Semi-Supervised
Graph Learning [49.94816548023729]
We propose a novel graph contrastive learning method, termed Interpolation-based Correlation Reduction Network (ICRN)
In our method, we improve the discriminative capability of the latent feature by enlarging the margin of decision boundaries.
By combining the two settings, we extract rich supervision information from both the abundant unlabeled nodes and the rare yet valuable labeled nodes for discnative representation learning.
arXiv Detail & Related papers (2022-06-06T14:26:34Z) - Learning Relation Alignment for Calibrated Cross-modal Retrieval [52.760541762871505]
We propose a novel metric, Intra-modal Self-attention Distance (ISD), to quantify the relation consistency by measuring the semantic distance between linguistic and visual relations.
We present Inter-modal Alignment on Intra-modal Self-attentions (IAIS), a regularized training method to optimize the ISD and calibrate intra-modal self-attentions mutually via inter-modal alignment.
arXiv Detail & Related papers (2021-05-28T14:25:49Z) - Self-Supervised Multi-Frame Monocular Scene Flow [61.588808225321735]
We introduce a multi-frame monocular scene flow network based on self-supervised learning.
We observe state-of-the-art accuracy among monocular scene flow methods based on self-supervised learning.
arXiv Detail & Related papers (2021-05-05T17:49:55Z) - Self-supervised Multi-view Stereo via Effective Co-Segmentation and
Data-Augmentation [39.95831985522991]
We propose a framework integrated with more reliable supervision guided by semantic co-segmentation and data-augmentation.
Our proposed methods achieve the state-of-the-art performance among unsupervised methods, and even compete on par with supervised methods.
arXiv Detail & Related papers (2021-04-12T11:48:54Z) - Domain Adaptive Robotic Gesture Recognition with Unsupervised
Kinematic-Visual Data Alignment [60.31418655784291]
We propose a novel unsupervised domain adaptation framework which can simultaneously transfer multi-modality knowledge, i.e., both kinematic and visual data, from simulator to real robot.
It remedies the domain gap with enhanced transferable features by using temporal cues in videos, and inherent correlations in multi-modal towards recognizing gesture.
Results show that our approach recovers the performance with great improvement gains, up to 12.91% in ACC and 20.16% in F1score without using any annotations in real robot.
arXiv Detail & Related papers (2021-03-06T09:10:03Z) - Dual-Teacher++: Exploiting Intra-domain and Inter-domain Knowledge with
Reliable Transfer for Cardiac Segmentation [69.09432302497116]
We propose a cutting-edge semi-supervised domain adaptation framework, namely Dual-Teacher++.
We design novel dual teacher models, including an inter-domain teacher model to explore cross-modality priors from source domain (e.g., MR) and an intra-domain teacher model to investigate the knowledge beneath unlabeled target domain.
In this way, the student model can obtain reliable dual-domain knowledge and yield improved performance on target domain data.
arXiv Detail & Related papers (2021-01-07T05:17:38Z) - Intervention Generative Adversarial Networks [21.682592654097352]
We propose a novel approach for stabilizing the training process of Generative Adversarial Networks.
We refer to the resulting generative model as Intervention Generative Adversarial Networks (IVGAN)
arXiv Detail & Related papers (2020-08-09T11:51:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.