CAKES: Channel-wise Automatic KErnel Shrinking for Efficient 3D Networks
- URL: http://arxiv.org/abs/2003.12798v3
- Date: Wed, 16 Dec 2020 19:03:30 GMT
- Title: CAKES: Channel-wise Automatic KErnel Shrinking for Efficient 3D Networks
- Authors: Qihang Yu, Yingwei Li, Jieru Mei, Yuyin Zhou, Alan L. Yuille
- Abstract summary: 3D Convolution Neural Networks (CNNs) have been widely applied to 3D scene understanding, such as video analysis and volumetric image recognition.
We propose Channel-wise Automatic KErnel Shrinking (CAKES), to enable efficient 3D learning by shrinking standard 3D convolutions into a set of economic operations.
- Score: 87.02416370081123
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: 3D Convolution Neural Networks (CNNs) have been widely applied to 3D scene
understanding, such as video analysis and volumetric image recognition.
However, 3D networks can easily lead to over-parameterization which incurs
expensive computation cost. In this paper, we propose Channel-wise Automatic
KErnel Shrinking (CAKES), to enable efficient 3D learning by shrinking standard
3D convolutions into a set of economic operations e.g., 1D, 2D convolutions.
Unlike previous methods, CAKES performs channel-wise kernel shrinkage, which
enjoys the following benefits: 1) enabling operations deployed in every layer
to be heterogeneous, so that they can extract diverse and complementary
information to benefit the learning process; and 2) allowing for an efficient
and flexible replacement design, which can be generalized to both
spatial-temporal and volumetric data. Further, we propose a new search space
based on CAKES, so that the replacement configuration can be determined
automatically for simplifying 3D networks. CAKES shows superior performance to
other methods with similar model size, and it also achieves comparable
performance to state-of-the-art with much fewer parameters and computational
costs on tasks including 3D medical imaging segmentation and video action
recognition. Codes and models are available at
https://github.com/yucornetto/CAKES
Related papers
- Large Generative Model Assisted 3D Semantic Communication [51.17527319441436]
We propose a Generative AI Model assisted 3D SC (GAM-3DSC) system.
First, we introduce a 3D Semantic Extractor (3DSE) to extract key semantics from a 3D scenario based on user requirements.
We then present an Adaptive Semantic Compression Model (ASCM) for encoding these multi-perspective images.
Finally, we design a conditional Generative adversarial network and Diffusion model aided-Channel Estimation (GDCE) to estimate and refine the Channel State Information (CSI) of physical channels.
arXiv Detail & Related papers (2024-03-09T03:33:07Z) - Segment Any 3D Gaussians [85.93694310363325]
This paper presents SAGA, a highly efficient 3D promptable segmentation method based on 3D Gaussian Splatting (3D-GS)
Given 2D visual prompts as input, SAGA can segment the corresponding 3D target represented by 3D Gaussians within 4 ms.
We show that SAGA achieves real-time multi-granularity segmentation with quality comparable to state-of-the-art methods.
arXiv Detail & Related papers (2023-12-01T17:15:24Z) - PonderV2: Pave the Way for 3D Foundation Model with A Universal
Pre-training Paradigm [114.47216525866435]
We introduce a novel universal 3D pre-training framework designed to facilitate the acquisition of efficient 3D representation.
For the first time, PonderV2 achieves state-of-the-art performance on 11 indoor and outdoor benchmarks, implying its effectiveness.
arXiv Detail & Related papers (2023-10-12T17:59:57Z) - Spatiotemporal Modeling Encounters 3D Medical Image Analysis:
Slice-Shift UNet with Multi-View Fusion [0.0]
We propose a new 2D-based model dubbed Slice SHift UNet which encodes three-dimensional features at 2D CNN's complexity.
More precisely multi-view features are collaboratively learned by performing 2D convolutions along the three planes of a volume.
The effectiveness of our approach is validated in Multi-Modality Abdominal Multi-Organ axis (AMOS) and Multi-Atlas Labeling Beyond the Cranial Vault (BTCV) datasets.
arXiv Detail & Related papers (2023-07-24T14:53:23Z) - SpATr: MoCap 3D Human Action Recognition based on Spiral Auto-encoder and Transformer Network [1.4732811715354455]
We introduce a novel approach for 3D human action recognition, denoted as SpATr (Spiral Auto-encoder and Transformer Network)
A lightweight auto-encoder, based on spiral convolutions, is employed to extract spatial geometrical features from each 3D mesh.
The proposed method is evaluated on three prominent 3D human action datasets: Babel, MoVi, and BMLrub.
arXiv Detail & Related papers (2023-06-30T11:49:00Z) - 2D or not 2D? Adaptive 3D Convolution Selection for Efficient Video
Recognition [84.697097472401]
We introduce Ada3D, a conditional computation framework that learns instance-specific 3D usage policies to determine frames and convolution layers to be used in a 3D network.
We demonstrate that our method achieves similar accuracies to state-of-the-art 3D models while requiring 20%-50% less computation across different datasets.
arXiv Detail & Related papers (2020-12-29T21:40:38Z) - Making a Case for 3D Convolutions for Object Segmentation in Videos [16.167397418720483]
We show that 3D convolutional networks can be effectively applied to dense video prediction tasks such as salient object segmentation.
We propose a 3D decoder architecture, that comprises novel 3D Global Convolution layers and 3D Refinement modules.
Our approach outperforms existing state-of-the-arts by a large margin on the DAVIS'16 Unsupervised, FBMS and ViSal benchmarks.
arXiv Detail & Related papers (2020-08-26T12:24:23Z) - A Real-time Action Representation with Temporal Encoding and Deep
Compression [115.3739774920845]
We propose a new real-time convolutional architecture, called Temporal Convolutional 3D Network (T-C3D), for action representation.
T-C3D learns video action representations in a hierarchical multi-granularity manner while obtaining a high process speed.
Our method achieves clear improvements on UCF101 action recognition benchmark against state-of-the-art real-time methods by 5.4% in terms of accuracy and 2 times faster in terms of inference speed with a less than 5MB storage model.
arXiv Detail & Related papers (2020-06-17T06:30:43Z) - 3D Self-Supervised Methods for Medical Imaging [7.65168530693281]
We propose 3D versions for five different self-supervised methods, in the form of proxy tasks.
Our methods facilitate neural network feature learning from unlabeled 3D images, aiming to reduce the required cost for expert annotation.
The developed algorithms are 3D Contrastive Predictive Coding, 3D Rotation prediction, 3D Jigsaw puzzles, Relative 3D patch location, and 3D Exemplar networks.
arXiv Detail & Related papers (2020-06-06T09:56:58Z) - Self-Supervised Feature Extraction for 3D Axon Segmentation [7.181047714452116]
Existing learning-based methods to automatically trace axons in 3D brain imagery often rely on manually annotated segmentation labels.
We propose a self-supervised auxiliary task that utilizes the tube-like structure of axons to build a feature extractor from unlabeled data.
We demonstrate improved segmentation performance over the 3D U-Net model on both the SHIELD PVGPe dataset and the BigNeuron Project, single neuron Janelia dataset.
arXiv Detail & Related papers (2020-04-20T20:46:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.