Semantics-aware Multi-modal Domain Translation:From LiDAR Point Clouds
to Panoramic Color Images
- URL: http://arxiv.org/abs/2106.13974v1
- Date: Sat, 26 Jun 2021 08:52:17 GMT
- Title: Semantics-aware Multi-modal Domain Translation:From LiDAR Point Clouds
to Panoramic Color Images
- Authors: Tiago Cortinhal, Fatih Kurnaz, Eren Aksoy
- Abstract summary: Our framework can synthesize a panoramic color image from a given full 3D LiDAR point cloud.
We provide a thorough quantitative evaluation on the SemanticKitti dataset and show that our proposed framework outperforms other strong baseline models.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this work, we present a simple yet effective framework to address the
domain translation problem between different sensor modalities with unique data
formats. By relying only on the semantics of the scene, our modular generative
framework can, for the first time, synthesize a panoramic color image from a
given full 3D LiDAR point cloud. The framework starts with semantic
segmentation of the point cloud, which is initially projected onto a spherical
surface. The same semantic segmentation is applied to the corresponding camera
image. Next, our new conditional generative model adversarially learns to
translate the predicted LiDAR segment maps to the camera image counterparts.
Finally, generated image segments are processed to render the panoramic scene
images. We provide a thorough quantitative evaluation on the SemanticKitti
dataset and show that our proposed framework outperforms other strong baseline
models.
Our source code is available at
https://github.com/halmstad-University/TITAN-NET
Related papers
- TextPSG: Panoptic Scene Graph Generation from Textual Descriptions [78.1140391134517]
We study a new problem of Panoptic Scene Graph Generation from Purely Textual Descriptions (Caption-to-PSG)
The key idea is to leverage the large collection of free image-caption data on the Web alone to generate panoptic scene graphs.
We propose a new framework TextPSG consisting of four modules, i.e., a region grouper, an entity grounder, a segment merger, and a label generator.
arXiv Detail & Related papers (2023-10-10T22:36:15Z) - LadleNet: A Two-Stage UNet for Infrared Image to Visible Image Translation Guided by Semantic Segmentation [5.125530969984795]
We propose an improved algorithm for image translation based on U-net called LadleNet.
LadleNet+ replaces the Handle module in LadleNet with a pre-trained DeepLabv3+ network, enabling the model to have a more powerful capability in constructing semantic space.
Compared to existing methods, LadleNet and LadleNet+ achieved an average improvement of 12.4% and 15.2% in SSIM metrics, and 37.9% and 50.6% in MS-SSIM metrics, respectively.
arXiv Detail & Related papers (2023-08-12T16:14:44Z) - LiDAR-Camera Panoptic Segmentation via Geometry-Consistent and
Semantic-Aware Alignment [63.83894701779067]
We propose LCPS, the first LiDAR-Camera Panoptic network.
In our approach, we conduct LiDAR-Camera fusion in three stages.
Our fusion strategy improves about 6.9% PQ performance over the LiDAR-only baseline on NuScenes dataset.
arXiv Detail & Related papers (2023-08-03T10:57:58Z) - Depth- and Semantics-aware Multi-modal Domain Translation: Generating 3D Panoramic Color Images from LiDAR Point Clouds [0.7234862895932991]
This work presents a new conditional generative model, named TITAN-Next, for cross-domain image-to-image translation in a multi-modal setup between LiDAR and camera sensors.
We claim that this is the first framework of its kind and it has practical applications in autonomous vehicles such as providing a fail-safe mechanism and augmenting available data in the target image domain.
arXiv Detail & Related papers (2023-02-15T13:48:10Z) - Drive&Segment: Unsupervised Semantic Segmentation of Urban Scenes via
Cross-modal Distillation [32.33170182669095]
This work investigates learning pixel-wise semantic image segmentation in urban scenes without any manual annotation, just from the raw non-curated data collected by cars.
We propose a novel method for cross-modal unsupervised learning of semantic image segmentation by leveraging synchronized LiDAR and image data.
arXiv Detail & Related papers (2022-03-21T17:35:46Z) - Fully Context-Aware Image Inpainting with a Learned Semantic Pyramid [102.24539566851809]
Restoring reasonable and realistic content for arbitrary missing regions in images is an important yet challenging task.
Recent image inpainting models have made significant progress in generating vivid visual details, but they can still lead to texture blurring or structural distortions.
We propose the Semantic Pyramid Network (SPN) motivated by the idea that learning multi-scale semantic priors can greatly benefit the recovery of locally missing content in images.
arXiv Detail & Related papers (2021-12-08T04:33:33Z) - Improving Semantic Image Segmentation via Label Fusion in Semantically
Textured Meshes [10.645137380835994]
We present a label fusion framework that is capable of improving semantic pixel labels of video sequences in an unsupervised manner.
We use a 3D mesh representation of the environment and fuse the predictions of different frames into a consistent representation using semantic mesh textures.
We evaluate our method on the Scannet dataset where we improve annotations produced by the state-of-the-art segmentation network ESANet from $52.05 %$ to $58.25 %$ pixel accuracy.
arXiv Detail & Related papers (2021-11-22T10:47:32Z) - Shape and Viewpoint without Keypoints [63.26977130704171]
We present a learning framework that learns to recover the 3D shape, pose and texture from a single image.
We trained on an image collection without any ground truth 3D shape, multi-view, camera viewpoints or keypoint supervision.
We obtain state-of-the-art camera prediction results and show that we can learn to predict diverse shapes and textures across objects.
arXiv Detail & Related papers (2020-07-21T17:58:28Z) - Improving Semantic Segmentation via Decoupled Body and Edge Supervision [89.57847958016981]
Existing semantic segmentation approaches either aim to improve the object's inner consistency by modeling the global context, or refine objects detail along their boundaries by multi-scale feature fusion.
In this paper, a new paradigm for semantic segmentation is proposed.
Our insight is that appealing performance of semantic segmentation requires textitexplicitly modeling the object textitbody and textitedge, which correspond to the high and low frequency of the image.
We show that the proposed framework with various baselines or backbone networks leads to better object inner consistency and object boundaries.
arXiv Detail & Related papers (2020-07-20T12:11:22Z) - Controllable Image Synthesis via SegVAE [89.04391680233493]
A semantic map is commonly used intermediate representation for conditional image generation.
In this work, we specifically target at generating semantic maps given a label-set consisting of desired categories.
The proposed framework, SegVAE, synthesizes semantic maps in an iterative manner using conditional variational autoencoder.
arXiv Detail & Related papers (2020-07-16T15:18:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.