Towards Point Cloud Compression for Machine Perception: A Simple and Strong Baseline by Learning the Octree Depth Level Predictor
- URL: http://arxiv.org/abs/2406.00791v1
- Date: Sun, 2 Jun 2024 16:13:57 GMT
- Title: Towards Point Cloud Compression for Machine Perception: A Simple and Strong Baseline by Learning the Octree Depth Level Predictor
- Authors: Lei Liu, Zhihao Hu, Zhenghao Chen,
- Abstract summary: We propose a point cloud compression framework that simultaneously handles both human and machine vision tasks.
Our framework learns a scalable bit-stream, using only subsets for different machine vision tasks to save bit-rate.
A new octree depth-level predictor adaptively determines the optimal depth level for each octree constructed from a point cloud.
- Score: 12.510990055381452
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Point cloud compression has garnered significant interest in computer vision. However, existing algorithms primarily cater to human vision, while most point cloud data is utilized for machine vision tasks. To address this, we propose a point cloud compression framework that simultaneously handles both human and machine vision tasks. Our framework learns a scalable bit-stream, using only subsets for different machine vision tasks to save bit-rate, while employing the entire bit-stream for human vision tasks. Building on mainstream octree-based frameworks like VoxelContext-Net, OctAttention, and G-PCC, we introduce a new octree depth-level predictor. This predictor adaptively determines the optimal depth level for each octree constructed from a point cloud, controlling the bit-rate for machine vision tasks. For simpler tasks (\textit{e.g.}, classification) or objects/scenarios, we use fewer depth levels with fewer bits, saving bit-rate. Conversely, for more complex tasks (\textit{e.g}., segmentation) or objects/scenarios, we use deeper depth levels with more bits to enhance performance. Experimental results on various datasets (\textit{e.g}., ModelNet10, ModelNet40, ShapeNet, ScanNet, and KITTI) show that our point cloud compression approach improves performance for machine vision tasks without compromising human vision quality.
Related papers
- FeatSharp: Your Vision Model Features, Sharper [64.25786703202414]
We introduce a novel method to coherently and cheaply upsample the feature maps of low-res vision encoders.
We demonstrate the effectiveness of this approach on core perception tasks as well as within agglomerative model (RADIO) training.
arXiv Detail & Related papers (2025-02-22T00:54:49Z) - CLIP-based Point Cloud Classification via Point Cloud to Image Translation [19.836264118079573]
Contrastive Vision-Language Pre-training (CLIP) based point cloud classification model i.e. PointCLIP has added a new direction in the point cloud classification research domain.
We propose a Pretrained Point Cloud to Image Translation Network (PPCITNet) that produces generalized colored images along with additional salient visual cues to the point cloud depth maps.
arXiv Detail & Related papers (2024-08-07T04:50:05Z) - Point Cloud Compression with Implicit Neural Representations: A Unified Framework [54.119415852585306]
We present a pioneering point cloud compression framework capable of handling both geometry and attribute components.
Our framework utilizes two coordinate-based neural networks to implicitly represent a voxelized point cloud.
Our method exhibits high universality when contrasted with existing learning-based techniques.
arXiv Detail & Related papers (2024-05-19T09:19:40Z) - Scalable Human-Machine Point Cloud Compression [29.044369073873465]
In this paper, we present a scalable for point-cloud data that is specialized for the machine task of classification, while also providing a mechanism for human viewing.
In the proposed scalable, the "base" bitstream supports the machine task, and an "enhancement" bitstream may be used for better input reconstruction performance for human viewing.
arXiv Detail & Related papers (2024-02-19T20:43:10Z) - ViPFormer: Efficient Vision-and-Pointcloud Transformer for Unsupervised
Pointcloud Understanding [3.7966094046587786]
We propose a lightweight Vision-and-Pointcloud Transformer (ViPFormer) to unify image and point cloud processing in a single architecture.
ViPFormer learns in an unsupervised manner by optimizing intra-modal and cross-modal contrastive objectives.
Experiments on different datasets show ViPFormer surpasses previous state-of-the-art unsupervised methods with higher accuracy, lower model complexity and runtime latency.
arXiv Detail & Related papers (2023-03-25T06:47:12Z) - Ponder: Point Cloud Pre-training via Neural Rendering [93.34522605321514]
We propose a novel approach to self-supervised learning of point cloud representations by differentiable neural encoders.
The learned point-cloud can be easily integrated into various downstream tasks, including not only high-level rendering tasks like 3D detection and segmentation, but low-level tasks like 3D reconstruction and image rendering.
arXiv Detail & Related papers (2022-12-31T08:58:39Z) - A Deeper Look into DeepCap [96.67706102518238]
We propose a novel deep learning approach for monocular dense human performance capture.
Our method is trained in a weakly supervised manner based on multi-view supervision.
Our approach outperforms the state of the art in terms of quality and robustness.
arXiv Detail & Related papers (2021-11-20T11:34:33Z) - Video Coding for Machine: Compact Visual Representation Compression for
Intelligent Collaborative Analytics [101.35754364753409]
Video Coding for Machines (VCM) is committed to bridging to an extent separate research tracks of video/image compression and feature compression.
This paper summarizes VCM methodology and philosophy based on existing academia and industrial efforts.
arXiv Detail & Related papers (2021-10-18T12:42:13Z) - SE-PSNet: Silhouette-based Enhancement Feature for Panoptic Segmentation
Network [5.353718408751182]
We propose a solution to tackle the panoptic segmentation task.
The structure combines the bottom-up method and the top-down method.
The network mainly pays attention to the quality of the mask.
arXiv Detail & Related papers (2021-07-11T17:20:32Z) - Sparse Auxiliary Networks for Unified Monocular Depth Prediction and
Completion [56.85837052421469]
Estimating scene geometry from data obtained with cost-effective sensors is key for robots and self-driving cars.
In this paper, we study the problem of predicting dense depth from a single RGB image with optional sparse measurements from low-cost active depth sensors.
We introduce Sparse Networks (SANs), a new module enabling monodepth networks to perform both the tasks of depth prediction and completion.
arXiv Detail & Related papers (2021-03-30T21:22:26Z) - Compositional Prototype Network with Multi-view Comparision for Few-Shot
Point Cloud Semantic Segmentation [47.0611707526858]
A fully supervised point cloud segmentation network often requires a large amount of data with point-wise annotations.
We present the Compositional Prototype Network that can undertake point cloud segmentation with only a few labeled training data.
Inspired by the few-shot learning literature in images, our network directly transfers label information from the limited training data to unlabeled test data for prediction.
arXiv Detail & Related papers (2020-12-28T15:01:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.