Implicit Ray-Transformers for Multi-view Remote Sensing Image
Segmentation
- URL: http://arxiv.org/abs/2303.08401v1
- Date: Wed, 15 Mar 2023 07:05:07 GMT
- Title: Implicit Ray-Transformers for Multi-view Remote Sensing Image
Segmentation
- Authors: Zipeng Qi, Hao Chen, Chenyang Liu, Zhenwei Shi and Zhengxia Zou
- Abstract summary: We propose ''Implicit Ray-Transformer (IRT)'' based on Implicit Neural Representation (INR) for RS scene semantic segmentation with sparse labels.
The proposed method includes a two-stage learning process. In the first stage, we optimize a neural field to encode the color and 3D structure of the remote sensing scene.
In the second stage, we design a Ray Transformer to leverage the relations between the neural field 3D features and 2D texture features for learning better semantic representations.
- Score: 26.726658200149544
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: The mainstream CNN-based remote sensing (RS) image semantic segmentation
approaches typically rely on massive labeled training data. Such a paradigm
struggles with the problem of RS multi-view scene segmentation with limited
labeled views due to the lack of considering 3D information within the scene.
In this paper, we propose ''Implicit Ray-Transformer (IRT)'' based on Implicit
Neural Representation (INR), for RS scene semantic segmentation with sparse
labels (such as 4-6 labels per 100 images). We explore a new way of introducing
multi-view 3D structure priors to the task for accurate and view-consistent
semantic segmentation. The proposed method includes a two-stage learning
process. In the first stage, we optimize a neural field to encode the color and
3D structure of the remote sensing scene based on multi-view images. In the
second stage, we design a Ray Transformer to leverage the relations between the
neural field 3D features and 2D texture features for learning better semantic
representations. Different from previous methods that only consider 3D prior or
2D features, we incorporate additional 2D texture information and 3D prior by
broadcasting CNN features to different point features along the sampled ray. To
verify the effectiveness of the proposed method, we construct a challenging
dataset containing six synthetic sub-datasets collected from the Carla platform
and three real sub-datasets from Google Maps. Experiments show that the
proposed method outperforms the CNN-based methods and the state-of-the-art
INR-based segmentation methods in quantitative and qualitative metrics.
Related papers
- Large Spatial Model: End-to-end Unposed Images to Semantic 3D [79.94479633598102]
Large Spatial Model (LSM) processes unposed RGB images directly into semantic radiance fields.
LSM simultaneously estimates geometry, appearance, and semantics in a single feed-forward operation.
It can generate versatile label maps by interacting with language at novel viewpoints.
arXiv Detail & Related papers (2024-10-24T17:54:42Z) - NeRFDeformer: NeRF Transformation from a Single View via 3D Scene Flows [60.291277312569285]
We present a method for automatically modifying a NeRF representation based on a single observation.
Our method defines the transformation as a 3D flow, specifically as a weighted linear blending of rigid transformations.
We also introduce a new dataset for exploring the problem of modifying a NeRF scene through a single observation.
arXiv Detail & Related papers (2024-06-15T07:58:08Z) - ALSTER: A Local Spatio-Temporal Expert for Online 3D Semantic
Reconstruction [62.599588577671796]
We propose an online 3D semantic segmentation method that incrementally reconstructs a 3D semantic map from a stream of RGB-D frames.
Unlike offline methods, ours is directly applicable to scenarios with real-time constraints, such as robotics or mixed reality.
arXiv Detail & Related papers (2023-11-29T20:30:18Z) - SeMLaPS: Real-time Semantic Mapping with Latent Prior Networks and
Quasi-Planar Segmentation [53.83313235792596]
We present a new methodology for real-time semantic mapping from RGB-D sequences.
It combines a 2D neural network and a 3D network based on a SLAM system with 3D occupancy mapping.
Our system achieves state-of-the-art semantic mapping quality within 2D-3D networks-based systems.
arXiv Detail & Related papers (2023-06-28T22:36:44Z) - Unsupervised Multi-View Object Segmentation Using Radiance Field
Propagation [55.9577535403381]
We present a novel approach to segmenting objects in 3D during reconstruction given only unlabeled multi-view images of a scene.
The core of our method is a novel propagation strategy for individual objects' radiance fields with a bidirectional photometric loss.
To the best of our knowledge, RFP is the first unsupervised approach for tackling 3D scene object segmentation for neural radiance field (NeRF)
arXiv Detail & Related papers (2022-10-02T11:14:23Z) - Neural Volumetric Object Selection [126.04480613166194]
We introduce an approach for selecting objects in neural volumetric 3D representations, such as multi-plane images (MPI) and neural radiance fields (NeRF)
Our approach takes a set of foreground and background 2D user scribbles in one view and automatically estimates a 3D segmentation of the desired object, which can be rendered into novel views.
arXiv Detail & Related papers (2022-05-30T08:55:20Z) - Exploring Deep 3D Spatial Encodings for Large-Scale 3D Scene
Understanding [19.134536179555102]
We propose an alternative approach to overcome the limitations of CNN based approaches by encoding the spatial features of raw 3D point clouds into undirected graph models.
The proposed method achieves on par state-of-the-art accuracy with improved training time and model stability thus indicating strong potential for further research.
arXiv Detail & Related papers (2020-11-29T12:56:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.