InterNet: Unsupervised Cross-modal Homography Estimation Based on Interleaved Modality Transfer and Self-supervised Homography Prediction
- URL: http://arxiv.org/abs/2409.17993v2
- Date: Fri, 27 Sep 2024 02:35:47 GMT
- Title: InterNet: Unsupervised Cross-modal Homography Estimation Based on Interleaved Modality Transfer and Self-supervised Homography Prediction
- Authors: Junchen Yu, Si-Yuan Cao, Runmin Zhang, Chenghao Zhang, Jianxin Hu, Zhu Yu, Beinan Yu, Hui-liang Shen,
- Abstract summary: InterNet integrates modality transfer and self-supervised homography estimation.
InterNet achieves the state-of-the-art (SOTA) performance among unsupervised methods.
- Score: 9.313783457777125
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose a novel unsupervised cross-modal homography estimation framework, based on interleaved modality transfer and self-supervised homography prediction, named InterNet. InterNet integrates modality transfer and self-supervised homography estimation, introducing an innovative interleaved optimization framework to alternately promote both components. The modality transfer gradually narrows the modality gaps, facilitating the self-supervised homography estimation to fully leverage the synthetic intra-modal data. The self-supervised homography estimation progressively achieves reliable predictions, thereby providing robust cross-modal supervision for the modality transfer. To further boost the estimation accuracy, we also formulate a fine-grained homography feature loss to improve the connection between two components. Furthermore, we employ a simple yet effective distillation training technique to reduce model parameters and improve cross-domain generalization ability while maintaining comparable performance. Experiments reveal that InterNet achieves the state-of-the-art (SOTA) performance among unsupervised methods, and even outperforms many supervised methods such as MHN and LocalTrans.
Related papers
- Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge.
Existing methods struggle to balance high model performance with low resource consumption.
We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z) - TransDAE: Dual Attention Mechanism in a Hierarchical Transformer for Efficient Medical Image Segmentation [7.013315283888431]
Medical image segmentation is crucial for accurate disease diagnosis and the development of effective treatment strategies.
We introduce TransDAE: a novel approach that reimagines the self-attention mechanism to include both spatial and channel-wise associations.
Remarkably, TransDAE outperforms existing state-of-the-art methods on the Synaps multi-organ dataset.
arXiv Detail & Related papers (2024-09-03T16:08:48Z) - DAWN: Domain-Adaptive Weakly Supervised Nuclei Segmentation via Cross-Task Interactions [17.68742587885609]
Current weakly supervised nuclei segmentation approaches follow a two-stage pseudo-label generation and network training process.
This paper introduces a novel domain-adaptive weakly supervised nuclei segmentation framework using cross-task interaction strategies.
To validate the effectiveness of our proposed method, we conduct extensive comparative and ablation experiments on six datasets.
arXiv Detail & Related papers (2024-04-23T12:01:21Z) - Unleashing Network Potentials for Semantic Scene Completion [50.95486458217653]
This paper proposes a novel SSC framework - Adrial Modality Modulation Network (AMMNet)
AMMNet introduces two core modules: a cross-modal modulation enabling the interdependence of gradient flows between modalities, and a customized adversarial training scheme leveraging dynamic gradient competition.
Extensive experimental results demonstrate that AMMNet outperforms state-of-the-art SSC methods by a large margin.
arXiv Detail & Related papers (2024-03-12T11:48:49Z) - MCU-Net: A Multi-prior Collaborative Deep Unfolding Network with Gates-controlled Spatial Attention for Accelerated MR Image Reconstruction [9.441882492801174]
Deep unfolding networks (DUNs) have demonstrated significant potential in accelerating magnetic resonance imaging (MRI)
However, they often encounter high computational costs and slow convergence rates.
We propose a multi-prior collaborative DUN, termed MCU-Net, to address these limitations.
arXiv Detail & Related papers (2024-02-04T07:29:00Z) - Probabilistic Self-supervised Learning via Scoring Rules Minimization [19.347097627898876]
We propose a novel probabilistic self-supervised learning via Scoring Rule Minimization (ProSMIN) to enhance representation quality and mitigate collapsing representations.
Our method achieves superior accuracy and calibration, surpassing the self-supervised baseline in a wide range of experiments on large-scale datasets.
arXiv Detail & Related papers (2023-09-05T08:48:25Z) - Probabilistic MIMO U-Net: Efficient and Accurate Uncertainty Estimation
for Pixel-wise Regression [1.4528189330418977]
Uncertainty estimation in machine learning is paramount for enhancing the reliability and interpretability of predictive models.
We present an adaptation of the Multiple-Input Multiple-Output (MIMO) framework for pixel-wise regression tasks.
arXiv Detail & Related papers (2023-08-14T22:08:28Z) - Motion-Scenario Decoupling for Rat-Aware Video Position Prediction:
Strategy and Benchmark [49.58762201363483]
We introduce RatPose, a bio-robot motion prediction dataset constructed by considering the influence factors of individuals and environments.
We propose a Dual-stream Motion-Scenario Decoupling framework that effectively separates scenario-oriented and motion-oriented features.
We demonstrate significant performance improvements of the proposed textitDMSD framework on different difficulty-level tasks.
arXiv Detail & Related papers (2023-05-17T14:14:31Z) - Multi-Level Contrastive Learning for Dense Prediction Task [59.591755258395594]
We present Multi-Level Contrastive Learning for Dense Prediction Task (MCL), an efficient self-supervised method for learning region-level feature representation for dense prediction tasks.
Our method is motivated by the three key factors in detection: localization, scale consistency and recognition.
Our method consistently outperforms the recent state-of-the-art methods on various datasets with significant margins.
arXiv Detail & Related papers (2023-04-04T17:59:04Z) - Interpolation-based Correlation Reduction Network for Semi-Supervised
Graph Learning [49.94816548023729]
We propose a novel graph contrastive learning method, termed Interpolation-based Correlation Reduction Network (ICRN)
In our method, we improve the discriminative capability of the latent feature by enlarging the margin of decision boundaries.
By combining the two settings, we extract rich supervision information from both the abundant unlabeled nodes and the rare yet valuable labeled nodes for discnative representation learning.
arXiv Detail & Related papers (2022-06-06T14:26:34Z) - FasterPose: A Faster Simple Baseline for Human Pose Estimation [65.8413964785972]
We propose a design paradigm for cost-effective network with LR representation for efficient pose estimation, named FasterPose.
We study the training behavior of FasterPose, and formulate a novel regressive cross-entropy (RCE) loss function for accelerating the convergence.
Compared with the previously dominant network of pose estimation, our method reduces 58% of the FLOPs and simultaneously gains 1.3% improvement of accuracy.
arXiv Detail & Related papers (2021-07-07T13:39:08Z) - Learning Relation Alignment for Calibrated Cross-modal Retrieval [52.760541762871505]
We propose a novel metric, Intra-modal Self-attention Distance (ISD), to quantify the relation consistency by measuring the semantic distance between linguistic and visual relations.
We present Inter-modal Alignment on Intra-modal Self-attentions (IAIS), a regularized training method to optimize the ISD and calibrate intra-modal self-attentions mutually via inter-modal alignment.
arXiv Detail & Related papers (2021-05-28T14:25:49Z) - Unsupervised Scale-consistent Depth Learning from Video [131.3074342883371]
We propose a monocular depth estimator SC-Depth, which requires only unlabelled videos for training.
Thanks to the capability of scale-consistent prediction, we show that our monocular-trained deep networks are readily integrated into the ORB-SLAM2 system.
The proposed hybrid Pseudo-RGBD SLAM shows compelling results in KITTI, and it generalizes well to the KAIST dataset without additional training.
arXiv Detail & Related papers (2021-05-25T02:17:56Z) - Self-Supervised Multi-Frame Monocular Scene Flow [61.588808225321735]
We introduce a multi-frame monocular scene flow network based on self-supervised learning.
We observe state-of-the-art accuracy among monocular scene flow methods based on self-supervised learning.
arXiv Detail & Related papers (2021-05-05T17:49:55Z) - Self-supervised Multi-view Stereo via Effective Co-Segmentation and
Data-Augmentation [39.95831985522991]
We propose a framework integrated with more reliable supervision guided by semantic co-segmentation and data-augmentation.
Our proposed methods achieve the state-of-the-art performance among unsupervised methods, and even compete on par with supervised methods.
arXiv Detail & Related papers (2021-04-12T11:48:54Z) - Domain Adaptive Robotic Gesture Recognition with Unsupervised
Kinematic-Visual Data Alignment [60.31418655784291]
We propose a novel unsupervised domain adaptation framework which can simultaneously transfer multi-modality knowledge, i.e., both kinematic and visual data, from simulator to real robot.
It remedies the domain gap with enhanced transferable features by using temporal cues in videos, and inherent correlations in multi-modal towards recognizing gesture.
Results show that our approach recovers the performance with great improvement gains, up to 12.91% in ACC and 20.16% in F1score without using any annotations in real robot.
arXiv Detail & Related papers (2021-03-06T09:10:03Z) - SGD-Net: Efficient Model-Based Deep Learning with Theoretical Guarantees [35.01173046356158]
We propose SGD-Net as a new methodology for improving the efficiency of deep unfolding networks.
Our theoretical analysis shows that SGD-Net can be trained to approximate batch deep unfolding networks to an arbitrary precision.
Our numerical results on intensity diffraction tomography and sparse-view computed tomography show that SGD-Net can match the performance of the batch network at a fraction of training and testing complexity.
arXiv Detail & Related papers (2021-01-22T23:33:11Z) - Dual-Teacher++: Exploiting Intra-domain and Inter-domain Knowledge with
Reliable Transfer for Cardiac Segmentation [69.09432302497116]
We propose a cutting-edge semi-supervised domain adaptation framework, namely Dual-Teacher++.
We design novel dual teacher models, including an inter-domain teacher model to explore cross-modality priors from source domain (e.g., MR) and an intra-domain teacher model to investigate the knowledge beneath unlabeled target domain.
In this way, the student model can obtain reliable dual-domain knowledge and yield improved performance on target domain data.
arXiv Detail & Related papers (2021-01-07T05:17:38Z) - Intervention Generative Adversarial Networks [21.682592654097352]
We propose a novel approach for stabilizing the training process of Generative Adversarial Networks.
We refer to the resulting generative model as Intervention Generative Adversarial Networks (IVGAN)
arXiv Detail & Related papers (2020-08-09T11:51:54Z) - Cross-Attention in Coupled Unmixing Nets for Unsupervised Hyperspectral
Super-Resolution [79.97180849505294]
We propose a novel coupled unmixing network with a cross-attention mechanism, CUCaNet, to enhance the spatial resolution of HSI.
Experiments are conducted on three widely-used HS-MS datasets in comparison with state-of-the-art HSI-SR models.
arXiv Detail & Related papers (2020-07-10T08:08:20Z) - A Mean-field Analysis of Deep ResNet and Beyond: Towards Provable
Optimization Via Overparameterization From Depth [19.866928507243617]
Training deep neural networks with gradient descent (SGD) can often achieve zero training loss on real-world landscapes.
We propose a new limit of infinity deep residual networks, which enjoys a good training in the sense that everyr is global.
arXiv Detail & Related papers (2020-03-11T20:14:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.