Unsupervised Spike Depth Estimation via Cross-modality Cross-domain Knowledge Transfer
- URL: http://arxiv.org/abs/2208.12527v3
- Date: Fri, 19 Jul 2024 15:05:09 GMT
- Title: Unsupervised Spike Depth Estimation via Cross-modality Cross-domain Knowledge Transfer
- Authors: Jiaming Liu, Qizhe Zhang, Xiaoqi Li, Jianing Li, Guanqun Wang, Ming Lu, Tiejun Huang, Shanghang Zhang,
- Abstract summary: We introduce open-source RGB data to support spike depth estimation, leveraging its annotations and spatial information.
We propose a cross-modality cross-domain (BiCross) framework to realize unsupervised spike depth estimation.
Our method achieves state-of-the-art (SOTA) performances, compared with RGB-oriented unsupervised depth estimation methods.
- Score: 53.413305467674434
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Neuromorphic spike data, an upcoming modality with high temporal resolution, has shown promising potential in autonomous driving by mitigating the challenges posed by high-velocity motion blur. However, training the spike depth estimation network holds significant challenges in two aspects: sparse spatial information for pixel-wise tasks and difficulties in achieving paired depth labels for temporally intensive spike streams. Therefore, we introduce open-source RGB data to support spike depth estimation, leveraging its annotations and spatial information. The inherent differences in modalities and data distribution make it challenging to directly apply transfer learning from open-source RGB to target spike data. To this end, we propose a cross-modality cross-domain (BiCross) framework to realize unsupervised spike depth estimation by introducing simulated mediate source spike data. Specifically, we design a Coarse-to-Fine Knowledge Distillation (CFKD) approach to facilitate comprehensive cross-modality knowledge transfer while preserving the unique strengths of both modalities, utilizing a spike-oriented uncertainty scheme. Then, we propose a Self-Correcting Teacher-Student (SCTS) mechanism to screen out reliable pixel-wise pseudo labels and ease the domain shift of the student model, which avoids error accumulation in target spike data. To verify the effectiveness of BiCross, we conduct extensive experiments on four scenarios, including Synthetic to Real, Extreme Weather, Scene Changing, and Real Spike. Our method achieves state-of-the-art (SOTA) performances, compared with RGB-oriented unsupervised depth estimation methods. Code and dataset: https://github.com/Theia-4869/BiCross
Related papers
- PGNeXt: High-Resolution Salient Object Detection via Pyramid Grafting Network [24.54269823691119]
We present an advanced study on more challenging high-resolution salient object detection (HRSOD) from both dataset and network framework perspectives.
To compensate for the lack of HRSOD dataset, we thoughtfully collect a large-scale high resolution salient object detection dataset, called UHRSD.
All the images are finely annotated in pixel-level, far exceeding previous low-resolution SOD datasets.
arXiv Detail & Related papers (2024-08-02T09:31:21Z) - CrossDF: Improving Cross-Domain Deepfake Detection with Deep Information Decomposition [53.860796916196634]
We propose a Deep Information Decomposition (DID) framework to enhance the performance of Cross-dataset Deepfake Detection (CrossDF)
Unlike most existing deepfake detection methods, our framework prioritizes high-level semantic features over specific visual artifacts.
It adaptively decomposes facial features into deepfake-related and irrelevant information, only using the intrinsic deepfake-related information for real/fake discrimination.
arXiv Detail & Related papers (2023-09-30T12:30:25Z) - Cross-modal Orthogonal High-rank Augmentation for RGB-Event
Transformer-trackers [58.802352477207094]
We explore the great potential of a pre-trained vision Transformer (ViT) to bridge the vast distribution gap between two modalities.
We propose a mask modeling strategy that randomly masks a specific modality of some tokens to enforce the interaction between tokens from different modalities interacting proactively.
Experiments demonstrate that our plug-and-play training augmentation techniques can significantly boost state-of-the-art one-stream and two trackersstream to a large extent in terms of both tracking precision and success rate.
arXiv Detail & Related papers (2023-07-09T08:58:47Z) - A Dimensional Structure based Knowledge Distillation Method for
Cross-Modal Learning [15.544134849816528]
We discover the correlation between feature discriminability and dimensional structure (DS) by analyzing and observing features extracted from simple and hard tasks.
We propose a novel cross-modal knowledge distillation (CMKD) method for better supervised cross-modal learning (CML) performance.
The proposed method enforces output features to be channel-wise independent and intermediate ones to be uniformly distributed, thereby learning semantically irrelevant features from the hard task to boost its accuracy.
arXiv Detail & Related papers (2023-06-28T07:29:26Z) - Multi-Dimensional Refinement Graph Convolutional Network with Robust
Decouple Loss for Fine-Grained Skeleton-Based Action Recognition [19.031036881780107]
We propose a flexible attention block called Channel-Variable Spatial-Temporal Attention (CVSTA) to enhance the discriminative power of spatial-temporal joints.
Based on CVSTA, we construct a Multi-Dimensional Refinement Graph Convolutional Network (MDR-GCN), which can improve the discrimination among channel-, joint- and frame-level features.
Furthermore, we propose a Robust Decouple Loss (RDL), which significantly boosts the effect of the CVSTA and reduces the impact of noise.
arXiv Detail & Related papers (2023-06-27T09:23:36Z) - Context-Preserving Instance-Level Augmentation and Deformable
Convolution Networks for SAR Ship Detection [50.53262868498824]
Shape deformation of targets in SAR image due to random orientation and partial information loss is an essential challenge in SAR ship detection.
We propose a data augmentation method to train a deep network that is robust to partial information loss within the targets.
arXiv Detail & Related papers (2022-02-14T07:01:01Z) - RGB-D Salient Object Detection with Ubiquitous Target Awareness [37.6726410843724]
We make the first attempt to solve the RGB-D salient object detection problem with a novel depth-awareness framework.
We propose a Ubiquitous Target Awareness (UTA) network to solve three important challenges in RGB-D SOD task.
Our proposed UTA network is depth-free for inference and runs in real-time with 43 FPS.
arXiv Detail & Related papers (2021-09-08T04:27:29Z) - Learning Selective Mutual Attention and Contrast for RGB-D Saliency
Detection [145.4919781325014]
How to effectively fuse cross-modal information is the key problem for RGB-D salient object detection.
Many models use the feature fusion strategy but are limited by the low-order point-to-point fusion methods.
We propose a novel mutual attention model by fusing attention and contexts from different modalities.
arXiv Detail & Related papers (2020-10-12T08:50:10Z) - DPANet: Depth Potentiality-Aware Gated Attention Network for RGB-D
Salient Object Detection [107.96418568008644]
We propose a novel network named DPANet to explicitly model the potentiality of the depth map and effectively integrate the cross-modal complementarity.
By introducing the depth potentiality perception, the network can perceive the potentiality of depth information in a learning-based manner.
arXiv Detail & Related papers (2020-03-19T07:27:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.