Translate to Adapt: RGB-D Scene Recognition across Domains
- URL: http://arxiv.org/abs/2103.14672v1
- Date: Fri, 26 Mar 2021 18:20:29 GMT
- Title: Translate to Adapt: RGB-D Scene Recognition across Domains
- Authors: Andrea Ferreri and Silvia Bucci and Tatiana Tommasi
- Abstract summary: In this work we put under the spotlight the existence of a possibly severe domain shift issue within multi-modality scene recognition datasets.
We present a method based on self-supervised inter-modality translation able to adapt across different camera domains.
- Score: 18.40373730109694
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Scene classification is one of the basic problems in computer vision research
with extensive applications in robotics. When available, depth images provide
helpful geometric cues that complement the RGB texture information and help to
identify more discriminative scene image features. Depth sensing technology
developed fast in the last years and a great variety of 3D cameras have been
introduced, each with different acquisition properties. However, when targeting
big data collections, often multi-modal images are gathered disregarding their
original nature. In this work we put under the spotlight the existence of a
possibly severe domain shift issue within multi-modality scene recognition
datasets. We design an experimental testbed to study this problem and present a
method based on self-supervised inter-modality translation able to adapt across
different camera domains. Our extensive experimental analysis confirms the
effectiveness of the proposed approach.
Related papers
- 3D Multimodal Image Registration for Plant Phenotyping [0.6697966247860049]
The use of multiple camera technologies in a combined multimodal monitoring system for plant phenotyping offers promising benefits.
The effective utilization of cross-modal patterns is dependent on precise image registration to achieve pixel-accurate alignment.
We propose a novel multimodal 3D image registration method that addresses these challenges by integrating depth information from a time-of-flight camera into the registration process.
arXiv Detail & Related papers (2024-07-03T09:29:46Z) - Anyview: Generalizable Indoor 3D Object Detection with Variable Frames [63.51422844333147]
We present a novel 3D detection framework named AnyView for our practical applications.
Our method achieves both great generalizability and high detection accuracy with a simple and clean architecture.
arXiv Detail & Related papers (2023-10-09T02:15:45Z) - Two Approaches to Supervised Image Segmentation [55.616364225463066]
The present work develops comparison experiments between deep learning and multiset neurons approaches.
The deep learning approach confirmed its potential for performing image segmentation.
The alternative multiset methodology allowed for enhanced accuracy while requiring little computational resources.
arXiv Detail & Related papers (2023-07-19T16:42:52Z) - A Multi-modal Approach to Single-modal Visual Place Classification [2.580765958706854]
Multi-sensor fusion approaches combining RGB and depth (D) have gained popularity in recent years.
We reformulate the single-modal RGB image classification task as a pseudo multi-modal RGB-D classification problem.
A practical, fully self-supervised framework for training, appropriately processing, fusing, and classifying these two modalities is described.
arXiv Detail & Related papers (2023-05-10T14:04:21Z) - Geometric-aware Pretraining for Vision-centric 3D Object Detection [77.7979088689944]
We propose a novel geometric-aware pretraining framework called GAPretrain.
GAPretrain serves as a plug-and-play solution that can be flexibly applied to multiple state-of-the-art detectors.
We achieve 46.2 mAP and 55.5 NDS on the nuScenes val set using the BEVFormer method, with a gain of 2.7 and 2.1 points, respectively.
arXiv Detail & Related papers (2023-04-06T14:33:05Z) - Multimodal Across Domains Gaze Target Detection [18.41238482101682]
This paper addresses the gaze target detection problem in single images captured from the third-person perspective.
We present a multimodal deep architecture to infer where a person in a scene is looking.
arXiv Detail & Related papers (2022-08-23T09:09:00Z) - Towards Model Generalization for Monocular 3D Object Detection [57.25828870799331]
We present an effective unified camera-generalized paradigm (CGP) for Mono3D object detection.
We also propose the 2D-3D geometry-consistent object scaling strategy (GCOS) to bridge the gap via an instance-level augment.
Our method called DGMono3D achieves remarkable performance on all evaluated datasets and surpasses the SoTA unsupervised domain adaptation scheme.
arXiv Detail & Related papers (2022-05-23T23:05:07Z) - ObjectFormer for Image Manipulation Detection and Localization [118.89882740099137]
We propose ObjectFormer to detect and localize image manipulations.
We extract high-frequency features of the images and combine them with RGB features as multimodal patch embeddings.
We conduct extensive experiments on various datasets and the results verify the effectiveness of the proposed method.
arXiv Detail & Related papers (2022-03-28T12:27:34Z) - Multi-Scale Iterative Refinement Network for RGB-D Salient Object
Detection [7.062058947498447]
salient visual cues appear in various scales and resolutions of RGB images due to semantic gaps at different feature levels.
Similar salient patterns are available in cross-modal depth images as well as multi-scale versions.
We devise attention based fusion module (ABF) to address on cross-modal correlation.
arXiv Detail & Related papers (2022-01-24T10:33:00Z) - Self-supervised Human Detection and Segmentation via Multi-view
Consensus [116.92405645348185]
We propose a multi-camera framework in which geometric constraints are embedded in the form of multi-view consistency during training.
We show that our approach outperforms state-of-the-art self-supervised person detection and segmentation techniques on images that visually depart from those of standard benchmarks.
arXiv Detail & Related papers (2020-12-09T15:47:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.