Related papers: Filling Missing Values Matters for Range Image-Based Point Cloud Segmentation

Filling Missing Values Matters for Range Image-Based Point Cloud Segmentation

URL: http://arxiv.org/abs/2405.10175v2
Date: Sat, 25 May 2024 13:40:50 GMT
Title: Filling Missing Values Matters for Range Image-Based Point Cloud Segmentation
Authors: Bike Chen, Chen Gong, Juha Röning,
Abstract summary: Point cloud segmentation (PCS) plays an essential role in robot perception and navigation tasks. To efficiently understand large-scale outdoor point clouds, their range image representation is commonly adopted. However, undesirable missing values in the range images damage the shapes and patterns of objects. This problem creates difficulty for the models in learning coherent and complete geometric information from the objects.
Score: 12.62718910894575
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Point cloud segmentation (PCS) plays an essential role in robot perception and navigation tasks. To efficiently understand large-scale outdoor point clouds, their range image representation is commonly adopted. This image-like representation is compact and structured, making range image-based PCS models practical. However, undesirable missing values in the range images damage the shapes and patterns of objects. This problem creates difficulty for the models in learning coherent and complete geometric information from the objects. Consequently, the PCS models only achieve inferior performance. Delving deeply into this issue, we find that the use of unreasonable projection approaches and deskewing scans mainly leads to unwanted missing values in the range images. Besides, almost all previous works fail to consider filling in the unexpected missing values in the PCS task. To alleviate this problem, we first propose a new projection method, namely scan unfolding++ (SU++), to avoid massive missing values in the generated range images. Then, we introduce a simple yet effective approach, namely range-dependent $K$-nearest neighbor interpolation ($K$NNI), to further fill in missing values. Finally, we introduce the Filling Missing Values Network (FMVNet) and Fast FMVNet. Extensive experimental results on SemanticKITTI, SemanticPOSS, and nuScenes datasets demonstrate that by employing the proposed SU++ and $K$NNI, existing range image-based PCS models consistently achieve better performance than the baseline models. Besides, both FMVNet and Fast FMVNet achieve state-of-the-art performance in terms of the speed-accuracy trade-off. The proposed methods can be applied to other range image-based tasks and practical applications.

Related papers

CAP-Net: A Unified Network for 6D Pose and Size Estimation of Categorical Articulated Parts from a Single RGB-D Image [86.75098349480014]
This paper tackles category-level pose estimation of articulated objects in robotic manipulation tasks. We propose a single-stage Network, CAP-Net, for estimating the 6D poses and sizes of Categorical Articulated Parts. We introduce the RGBD-Art dataset, the largest RGB-D articulated dataset to date, featuring RGB images and depth noise simulated from real sensors.
arXiv Detail & Related papers (2025-04-15T14:30:26Z)
Scale-Invariant Object Detection by Adaptive Convolution with Unified Global-Local Context [3.061662434597098]
We propose an object detection model using a Switchable (adaptive) Atrous Convolutional Network (SAC-Net) based on the efficientDet model. The proposed SAC-Net encapsulates the benefits of both low-level and high-level features to achieve improved performance on multi-scale object detection tasks. Our experiments on benchmark datasets demonstrate that the proposed SAC-Net outperforms the state-of-the-art models by a significant margin in terms of accuracy.
arXiv Detail & Related papers (2024-09-17T10:08:37Z)
Efficient Visual State Space Model for Image Deblurring [83.57239834238035]
Convolutional neural networks (CNNs) and Vision Transformers (ViTs) have achieved excellent performance in image restoration. We propose a simple yet effective visual state space model (EVSSM) for image deblurring.
arXiv Detail & Related papers (2024-05-23T09:13:36Z)
Distance Weighted Trans Network for Image Completion [52.318730994423106]
We propose a new architecture that relies on Distance-based Weighted Transformer (DWT) to better understand the relationships between an image's components. CNNs are used to augment the local texture information of coarse priors. DWT blocks are used to recover certain coarse textures and coherent visual structures.
arXiv Detail & Related papers (2023-10-11T12:46:11Z)
Efficient Context Integration through Factorized Pyramidal Learning for Ultra-Lightweight Semantic Segmentation [1.0499611180329804]
We propose a novel Factorized Pyramidal Learning (FPL) module to aggregate rich contextual information in an efficient manner. We decompose the spatial pyramid into two stages which enables a simple and efficient feature fusion within the module to solve the notorious checkerboard effect. Based on the FPL module and FIR unit, we propose an ultra-lightweight real-time network, called FPLNet, which achieves state-of-the-art accuracy-efficiency trade-off.
arXiv Detail & Related papers (2023-02-23T05:34:51Z)
Glance and Focus Networks for Dynamic Visual Recognition [36.26856080976052]
We formulate the image recognition problem as a sequential coarse-to-fine feature learning process, mimicking the human visual system. The proposed Glance and Focus Network (GFNet) first extracts a quick global representation of the input image at a low resolution scale, and then strategically attends to a series of salient (small) regions to learn finer features. It reduces the average latency of the highly efficient MobileNet-V3 on an iPhone XS Max by 1.3x without sacrificing accuracy.
arXiv Detail & Related papers (2022-01-09T14:00:56Z)
SAPNet: Segmentation-Aware Progressive Network for Perceptual Contrastive Deraining [2.615176171489612]
We present a segmentation-aware progressive network (SAPNet) based upon contrastive learning for single image deraining. Our model surpasses top-performing methods and aids object detection and semantic segmentation with considerable efficacy.
arXiv Detail & Related papers (2021-11-17T03:57:11Z)
Salient Objects in Clutter [130.63976772770368]
This paper identifies and addresses a serious design bias of existing salient object detection (SOD) datasets. This design bias has led to a saturation in performance for state-of-the-art SOD models when evaluated on existing datasets. We propose a new high-quality dataset and update the previous saliency benchmark.
arXiv Detail & Related papers (2021-05-07T03:49:26Z)
Contemplating real-world object classification [53.10151901863263]
We reanalyze the ObjectNet dataset recently proposed by Barbu et al. containing objects in daily life situations. We find that applying deep models to the isolated objects, rather than the entire scene as is done in the original paper, results in around 20-30% performance improvement.
arXiv Detail & Related papers (2021-03-08T23:29:59Z)
Group-Wise Semantic Mining for Weakly Supervised Semantic Segmentation [49.90178055521207]
This work addresses weakly supervised semantic segmentation (WSSS), with the goal of bridging the gap between image-level annotations and pixel-level segmentation. We formulate WSSS as a novel group-wise learning task that explicitly models semantic dependencies in a group of images to estimate more reliable pseudo ground-truths. In particular, we devise a graph neural network (GNN) for group-wise semantic mining, wherein input images are represented as graph nodes.
arXiv Detail & Related papers (2020-12-09T12:40:13Z)
Local Grid Rendering Networks for 3D Object Detection in Point Clouds [98.02655863113154]
CNNs are powerful but it would be computationally costly to directly apply convolutions on point data after voxelizing the entire point clouds to a dense regular 3D grid. We propose a novel and principled Local Grid Rendering (LGR) operation to render the small neighborhood of a subset of input points into a low-resolution 3D grid independently. We validate LGR-Net for 3D object detection on the challenging ScanNet and SUN RGB-D datasets.
arXiv Detail & Related papers (2020-07-04T13:57:43Z)
A deep primal-dual proximal network for image restoration [8.797434238081372]
We design a deep network, named DeepPDNet, built from primal-dual iterations associated with the minimization of a standard penalized likelihood with an analysis prior. Two different learning strategies: "Full learning" and "Partial learning" are proposed, the first one is the most efficient numerically. Extensive results show that the proposed DeepPDNet demonstrates excellent performance on the MNIST and the more complex BSD68, BSD100, and SET14 datasets for image restoration and single image super-resolution task.
arXiv Detail & Related papers (2020-07-02T08:29:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.