SePaint: Semantic Map Inpainting via Multinomial Diffusion
- URL: http://arxiv.org/abs/2303.02737v1
- Date: Sun, 5 Mar 2023 18:04:28 GMT
- Title: SePaint: Semantic Map Inpainting via Multinomial Diffusion
- Authors: Zheng Chen, Deepak Duggirala, David Crandall, Lei Jiang, Lantao Liu
- Abstract summary: We propose SePaint, an inpainting model for semantic data based on generative multinomial diffusion.
We propose a novel and efficient condition strategy, Look-Back Condition (LB-Con), which performs one-step look-back operations.
We have conducted extensive experiments on different datasets, showing our proposed model outperforms commonly used methods in various robotic applications.
- Score: 12.217566404643033
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Prediction beyond partial observations is crucial for robots to navigate in
unknown environments because it can provide extra information regarding the
surroundings beyond the current sensing range or resolution. In this work, we
consider the inpainting of semantic Bird's-Eye-View maps. We propose SePaint,
an inpainting model for semantic data based on generative multinomial
diffusion. To maintain semantic consistency, we need to condition the
prediction for the missing regions on the known regions. We propose a novel and
efficient condition strategy, Look-Back Condition (LB-Con), which performs
one-step look-back operations during the reverse diffusion process. By doing
so, we are able to strengthen the harmonization between unknown and known
parts, leading to better completion performance. We have conducted extensive
experiments on different datasets, showing our proposed model outperforms
commonly used interpolation methods in various robotic applications.
Related papers
- Diff-Mosaic: Augmenting Realistic Representations in Infrared Small Target Detection via Diffusion Prior [63.64088590653005]
We propose Diff-Mosaic, a data augmentation method based on the diffusion model.
We introduce an enhancement network called Pixel-Prior, which generates highly coordinated and realistic Mosaic images.
In the second stage, we propose an image enhancement strategy named Diff-Prior. This strategy utilizes diffusion priors to model images in the real-world scene.
arXiv Detail & Related papers (2024-06-02T06:23:05Z) - SatSynth: Augmenting Image-Mask Pairs through Diffusion Models for Aerial Semantic Segmentation [69.42764583465508]
We explore the potential of generative image diffusion to address the scarcity of annotated data in earth observation tasks.
To the best of our knowledge, we are the first to generate both images and corresponding masks for satellite segmentation.
arXiv Detail & Related papers (2024-03-25T10:30:22Z) - Cross-attention Spatio-temporal Context Transformer for Semantic
Segmentation of Historical Maps [18.016789471815855]
Historical maps provide useful-temporal information on the Earth's surface before modern earth observation techniques came into being.
Aleatoric uncertainty known as data-dependent uncertainty inherent in the drawing/fading defects of the original map sheets.
We propose a U--based network that fuses maps that aggregating information at a larger range as well as through a temporal sequence.
arXiv Detail & Related papers (2023-10-19T09:49:58Z) - TempSAL -- Uncovering Temporal Information for Deep Saliency Prediction [64.63645677568384]
We introduce a novel saliency prediction model that learns to output saliency maps in sequential time intervals.
Our approach locally modulates the saliency predictions by combining the learned temporal maps.
Our code will be publicly available on GitHub.
arXiv Detail & Related papers (2023-01-05T22:10:16Z) - PRISM: Probabilistic Real-Time Inference in Spatial World Models [52.878769723544615]
PRISM is a method for real-time filtering in a probabilistic generative model of agent motion and visual perception.
The proposed solution runs at 10Hz real-time and is similarly accurate to state-of-the-art SLAM in small to medium-sized indoor environments.
arXiv Detail & Related papers (2022-12-06T13:59:06Z) - CAMERAS: Enhanced Resolution And Sanity preserving Class Activation
Mapping for image saliency [61.40511574314069]
Backpropagation image saliency aims at explaining model predictions by estimating model-centric importance of individual pixels in the input.
We propose CAMERAS, a technique to compute high-fidelity backpropagation saliency maps without requiring any external priors.
arXiv Detail & Related papers (2021-06-20T08:20:56Z) - Latent World Models For Intrinsically Motivated Exploration [140.21871701134626]
We present a self-supervised representation learning method for image-based observations.
We consider episodic and life-long uncertainties to guide the exploration of partially observable environments.
arXiv Detail & Related papers (2020-10-05T19:47:04Z) - Domain Siamese CNNs for Sparse Multispectral Disparity Estimation [15.065764374430783]
We propose a new CNN architecture able to do disparity estimation between images from different spectrum.
Our method was tested using the publicly available LITIV 2014 and LITIV 2018 datasets.
arXiv Detail & Related papers (2020-04-30T20:29:59Z) - Learning Discrete State Abstractions With Deep Variational Inference [7.273663549650618]
We propose a method for learning approximate bisimulations, a type of state abstraction.
We use a deep neural encoder to map states onto continuous embeddings.
We map these embeddings onto a discrete representation using an action-conditioned hidden Markov model.
arXiv Detail & Related papers (2020-03-09T17:58:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.