Auto-MVCNN: Neural Architecture Search for Multi-view 3D Shape
Recognition
- URL: http://arxiv.org/abs/2012.05493v1
- Date: Thu, 10 Dec 2020 07:40:28 GMT
- Title: Auto-MVCNN: Neural Architecture Search for Multi-view 3D Shape
Recognition
- Authors: Zhaoqun Li, Hongren Wang, Jinxing Li
- Abstract summary: In 3D shape recognition, multi-view based methods leverage human's perspective to analyze 3D shapes and have achieved significant outcomes.
We propose a neural architecture search method named Auto-MVCNN which is particularly designed for optimizing architecture in multi-view 3D shape recognition.
- Score: 16.13826056628379
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In 3D shape recognition, multi-view based methods leverage human's
perspective to analyze 3D shapes and have achieved significant outcomes. Most
existing research works in deep learning adopt handcrafted networks as
backbones due to their high capacity of feature extraction, and also benefit
from ImageNet pretraining. However, whether these network architectures are
suitable for 3D analysis or not remains unclear. In this paper, we propose a
neural architecture search method named Auto-MVCNN which is particularly
designed for optimizing architecture in multi-view 3D shape recognition.
Auto-MVCNN extends gradient-based frameworks to process multi-view images, by
automatically searching the fusion cell to explore intrinsic correlation among
view features. Moreover, we develop an end-to-end scheme to enhance retrieval
performance through the trade-off parameter search. Extensive experimental
results show that the searched architectures significantly outperform manually
designed counterparts in various aspects, and our method achieves
state-of-the-art performance at the same time.
Related papers
- Multi-Objective Neural Architecture Search for In-Memory Computing [0.5892638927736115]
We employ neural architecture search (NAS) to enhance the efficiency of deploying diverse machine learning (ML) tasks on in-memory computing architectures.
Our evaluation of this NAS approach for IMC architecture deployment spans three distinct image classification datasets.
arXiv Detail & Related papers (2024-06-10T19:17:09Z) - Deep Models for Multi-View 3D Object Recognition: A Review [16.500711021549947]
Multi-view 3D representations for object recognition has thus far demonstrated the most promising results for achieving state-of-the-art performance.
This review paper comprehensively covers recent progress in multi-view 3D object recognition methods for 3D classification and retrieval tasks.
arXiv Detail & Related papers (2024-04-23T16:54:31Z) - SeMLaPS: Real-time Semantic Mapping with Latent Prior Networks and
Quasi-Planar Segmentation [53.83313235792596]
We present a new methodology for real-time semantic mapping from RGB-D sequences.
It combines a 2D neural network and a 3D network based on a SLAM system with 3D occupancy mapping.
Our system achieves state-of-the-art semantic mapping quality within 2D-3D networks-based systems.
arXiv Detail & Related papers (2023-06-28T22:36:44Z) - HKNAS: Classification of Hyperspectral Imagery Based on Hyper Kernel
Neural Architecture Search [104.45426861115972]
We propose to directly generate structural parameters by utilizing the specifically designed hyper kernels.
We obtain three kinds of networks to separately conduct pixel-level or image-level classifications with 1-D or 3-D convolutions.
A series of experiments on six public datasets demonstrate that the proposed methods achieve state-of-the-art results.
arXiv Detail & Related papers (2023-04-23T17:27:40Z) - MVTN: Learning Multi-View Transformations for 3D Understanding [60.15214023270087]
We introduce the Multi-View Transformation Network (MVTN), which uses differentiable rendering to determine optimal view-points for 3D shape recognition.
MVTN can be trained end-to-end with any multi-view network for 3D shape recognition.
Our approach demonstrates state-of-the-art performance in 3D classification and shape retrieval on several benchmarks.
arXiv Detail & Related papers (2022-12-27T12:09:16Z) - Improving 3D Object Detection with Channel-wise Transformer [58.668922561622466]
We propose a two-stage 3D object detection framework (CT3D) with minimal hand-crafted design.
CT3D simultaneously performs proposal-aware embedding and channel-wise context aggregation.
It achieves the AP of 81.77% in the moderate car category on the KITTI test 3D detection benchmark.
arXiv Detail & Related papers (2021-08-23T02:03:40Z) - Learnable Online Graph Representations for 3D Multi-Object Tracking [156.58876381318402]
We propose a unified and learning based approach to the 3D MOT problem.
We employ a Neural Message Passing network for data association that is fully trainable.
We show the merit of the proposed approach on the publicly available nuScenes dataset by achieving state-of-the-art performance of 65.6% AMOTA and 58% fewer ID-switches.
arXiv Detail & Related papers (2021-04-23T17:59:28Z) - Graph Stacked Hourglass Networks for 3D Human Pose Estimation [1.0660480034605242]
We propose a novel graph convolutional network architecture, Graph Stacked Hourglass Networks, for 2D-to-3D human pose estimation tasks.
The proposed architecture consists of repeated encoder-decoder, in which graph-structured features are processed across three different scales of human skeletal representations.
arXiv Detail & Related papers (2021-03-30T14:25:43Z) - A Multisensory Learning Architecture for Rotation-invariant Object
Recognition [0.0]
This study presents a multisensory machine learning architecture for object recognition by employing a novel dataset that was constructed with the iCub robot.
The proposed architecture combines convolutional neural networks to form representations (i.e., features) ford grayscale color images and a multi-layer perceptron algorithm to process depth data.
arXiv Detail & Related papers (2020-09-14T09:39:48Z) - Searching Multi-Rate and Multi-Modal Temporal Enhanced Networks for
Gesture Recognition [89.0152015268929]
We propose the first neural architecture search (NAS)-based method for RGB-D gesture recognition.
The proposed method includes two key components: 1) enhanced temporal representation via the 3D Central Difference Convolution (3D-CDC) family, and optimized backbones for multi-modal-rate branches and lateral connections.
The resultant multi-rate network provides a new perspective to understand the relationship between RGB and depth modalities and their temporal dynamics.
arXiv Detail & Related papers (2020-08-21T10:45:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.