3D-QAE: Fully Quantum Auto-Encoding of 3D Point Clouds
- URL: http://arxiv.org/abs/2311.05604v1
- Date: Thu, 9 Nov 2023 18:58:33 GMT
- Title: 3D-QAE: Fully Quantum Auto-Encoding of 3D Point Clouds
- Authors: Lakshika Rathi and Edith Tretschk and Christian Theobalt and Rishabh
Dabral and Vladislav Golyanik
- Abstract summary: Existing methods for learning 3D representations are deep neural networks trained and tested on classical hardware.
This paper introduces the first quantum auto-encoder for 3D point clouds.
- Score: 71.39129855825402
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Existing methods for learning 3D representations are deep neural networks
trained and tested on classical hardware. Quantum machine learning
architectures, despite their theoretically predicted advantages in terms of
speed and the representational capacity, have so far not been considered for
this problem nor for tasks involving 3D data in general. This paper thus
introduces the first quantum auto-encoder for 3D point clouds. Our 3D-QAE
approach is fully quantum, i.e. all its data processing components are designed
for quantum hardware. It is trained on collections of 3D point clouds to
produce their compressed representations. Along with finding a suitable
architecture, the core challenges in designing such a fully quantum model
include 3D data normalisation and parameter optimisation, and we propose
solutions for both these tasks. Experiments on simulated gate-based quantum
hardware demonstrate that our method outperforms simple classical baselines,
paving the way for a new research direction in 3D computer vision. The source
code is available at https://4dqv.mpi-inf.mpg.de/QAE3D/.
Related papers
- CT3D++: Improving 3D Object Detection with Keypoint-induced Channel-wise Transformer [42.68740105997167]
We introduce two frameworks for 3D object detection with minimal hand-crafted design.
Firstly, we propose CT3D, which sequentially performs raw-point-based embedding, a standard Transformer encoder, and a channel-wise decoder for point features within each proposal.
Secondly, we present an enhanced network called CT3D++, which incorporates geometric and semantic fusion-based embedding to extract more valuable and comprehensive proposal-aware information.
arXiv Detail & Related papers (2024-06-12T12:40:28Z) - OpenGaussian: Towards Point-Level 3D Gaussian-based Open Vocabulary Understanding [54.981605111365056]
This paper introduces OpenGaussian, a method based on 3D Gaussian Splatting (3DGS) capable of 3D point-level open vocabulary understanding.
Our primary motivation stems from observing that existing 3DGS-based open vocabulary methods mainly focus on 2D pixel-level parsing.
arXiv Detail & Related papers (2024-06-04T07:42:33Z) - Any2Point: Empowering Any-modality Large Models for Efficient 3D Understanding [83.63231467746598]
We introduce Any2Point, a parameter-efficient method to empower any-modality large models (vision, language, audio) for 3D understanding.
We propose a 3D-to-any (1D or 2D) virtual projection strategy that correlates the input 3D points to the original 1D or 2D positions within the source modality.
arXiv Detail & Related papers (2024-04-11T17:59:45Z) - PonderV2: Pave the Way for 3D Foundation Model with A Universal
Pre-training Paradigm [114.47216525866435]
We introduce a novel universal 3D pre-training framework designed to facilitate the acquisition of efficient 3D representation.
For the first time, PonderV2 achieves state-of-the-art performance on 11 indoor and outdoor benchmarks, implying its effectiveness.
arXiv Detail & Related papers (2023-10-12T17:59:57Z) - Maximizing Spatio-Temporal Entropy of Deep 3D CNNs for Efficient Video
Recognition [25.364148451584356]
3D convolution neural networks (CNNs) have been the prevailing option for video recognition.
We propose to automatically design efficient 3D CNN architectures via a novel training-free neural architecture search approach.
Experiments on Something-Something V1&V2 and Kinetics400 demonstrate that the E3D family achieves state-of-the-art performance.
arXiv Detail & Related papers (2023-03-05T15:11:53Z) - SNAKE: Shape-aware Neural 3D Keypoint Field [62.91169625183118]
Detecting 3D keypoints from point clouds is important for shape reconstruction.
This work investigates the dual question: can shape reconstruction benefit 3D keypoint detection?
We propose a novel unsupervised paradigm named SNAKE, which is short for shape-aware neural 3D keypoint field.
arXiv Detail & Related papers (2022-06-03T17:58:43Z) - PVNAS: 3D Neural Architecture Search with Point-Voxel Convolution [26.059213743430192]
We study 3D deep learning from the efficiency perspective.
We propose a novel hardware-efficient 3D primitive, Point-Voxel Convolution (PVConv)
arXiv Detail & Related papers (2022-04-25T17:13:55Z) - Ground-aware Monocular 3D Object Detection for Autonomous Driving [6.5702792909006735]
Estimating the 3D position and orientation of objects in the environment with a single RGB camera is a challenging task for low-cost urban autonomous driving and mobile robots.
Most of the existing algorithms are based on the geometric constraints in 2D-3D correspondence, which stems from generic 6D object pose estimation.
We introduce a novel neural network module to fully utilize such application-specific priors in the framework of deep learning.
arXiv Detail & Related papers (2021-02-01T08:18:24Z) - Making a Case for 3D Convolutions for Object Segmentation in Videos [16.167397418720483]
We show that 3D convolutional networks can be effectively applied to dense video prediction tasks such as salient object segmentation.
We propose a 3D decoder architecture, that comprises novel 3D Global Convolution layers and 3D Refinement modules.
Our approach outperforms existing state-of-the-arts by a large margin on the DAVIS'16 Unsupervised, FBMS and ViSal benchmarks.
arXiv Detail & Related papers (2020-08-26T12:24:23Z) - Implicit Functions in Feature Space for 3D Shape Reconstruction and
Completion [53.885984328273686]
Implicit Feature Networks (IF-Nets) deliver continuous outputs, can handle multiple topologies, and complete shapes for missing or sparse input data.
IF-Nets clearly outperform prior work in 3D object reconstruction in ShapeNet, and obtain significantly more accurate 3D human reconstructions.
arXiv Detail & Related papers (2020-03-03T11:14:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.