3DPyranet Features Fusion for Spatio-temporal Feature Learning
- URL: http://arxiv.org/abs/2504.18977v1
- Date: Sat, 26 Apr 2025 17:32:37 GMT
- Title: 3DPyranet Features Fusion for Spatio-temporal Feature Learning
- Authors: Ihsan Ullah, Alfredo Petrosino,
- Abstract summary: 3D pyramidal neural pyramid called 3DPyraNet and a discriminative approach for classifier-temporal feature learning called 3DPyraNet-F are proposed.<n>3DPyraNet-F extract the features maps of the highest layer of the learned network, fuse them in a single vector, and provide it as input in a way to a linear-SVM.<n>Results are reported with 3DPyraNet in real-world environments, especially in the presence of camera induced motion.
- Score: 2.327279581393927
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Convolutional neural network (CNN) slides a kernel over the whole image to produce an output map. This kernel scheme reduces the number of parameters with respect to a fully connected neural network (NN). While CNN has proven to be an effective model in recognition of handwritten characters and traffic signal sign boards, etc. recently, its deep variants have proven to be effective in similar as well as more challenging applications like object, scene and action recognition. Deep CNN add more layers and kernels to the classical CNN, increasing the number of parameters, and partly reducing the main advantage of CNN which is less parameters. In this paper, a 3D pyramidal neural network called 3DPyraNet and a discriminative approach for spatio-temporal feature learning based on it, called 3DPyraNet-F, are proposed. 3DPyraNet introduces a new weighting scheme which learns features from both spatial and temporal dimensions analyzing multiple adjacent frames and keeping a biological plausible structure. It keeps the spatial topology of the input image and presents fewer parameters and lower computational and memory costs compared to both fully connected NNs and recent deep CNNs. 3DPyraNet-F extract the features maps of the highest layer of the learned network, fuse them in a single vector, and provide it as input in such a way to a linear-SVM classifier that enhances the recognition of human actions and dynamic scenes from the videos. Encouraging results are reported with 3DPyraNet in real-world environments, especially in the presence of camera induced motion. Further, 3DPyraNet-F clearly outperforms the state-of-the-art on three benchmark datasets and shows comparable result for the fourth.
Related papers
- SeMLaPS: Real-time Semantic Mapping with Latent Prior Networks and
Quasi-Planar Segmentation [53.83313235792596]
We present a new methodology for real-time semantic mapping from RGB-D sequences.
It combines a 2D neural network and a 3D network based on a SLAM system with 3D occupancy mapping.
Our system achieves state-of-the-art semantic mapping quality within 2D-3D networks-based systems.
arXiv Detail & Related papers (2023-06-28T22:36:44Z) - Maximizing Spatio-Temporal Entropy of Deep 3D CNNs for Efficient Video
Recognition [25.364148451584356]
3D convolution neural networks (CNNs) have been the prevailing option for video recognition.
We propose to automatically design efficient 3D CNN architectures via a novel training-free neural architecture search approach.
Experiments on Something-Something V1&V2 and Kinetics400 demonstrate that the E3D family achieves state-of-the-art performance.
arXiv Detail & Related papers (2023-03-05T15:11:53Z) - NAF: Neural Attenuation Fields for Sparse-View CBCT Reconstruction [79.13750275141139]
This paper proposes a novel and fast self-supervised solution for sparse-view CBCT reconstruction.
The desired attenuation coefficients are represented as a continuous function of 3D spatial coordinates, parameterized by a fully-connected deep neural network.
A learning-based encoder entailing hash coding is adopted to help the network capture high-frequency details.
arXiv Detail & Related papers (2022-09-29T04:06:00Z) - CondenseNeXt: An Ultra-Efficient Deep Neural Network for Embedded
Systems [0.0]
A Convolutional Neural Network (CNN) is a class of Deep Neural Network (DNN) widely used in the analysis of visual images captured by an image sensor.
In this paper, we propose a neoteric variant of deep convolutional neural network architecture to ameliorate the performance of existing CNN architectures for real-time inference on embedded systems.
arXiv Detail & Related papers (2021-12-01T18:20:52Z) - PocketNet: A Smaller Neural Network for 3D Medical Image Segmentation [0.0]
We derive a new CNN architecture called PocketNet that achieves comparable segmentation results to conventional CNNs while using less than 3% of the number of parameters.
We show that PocketNet achieves comparable segmentation results to conventional CNNs while using less than 3% of the number of parameters.
arXiv Detail & Related papers (2021-04-21T20:10:30Z) - RANP: Resource Aware Neuron Pruning at Initialization for 3D CNNs [32.054160078692036]
We introduce a Resource Aware Neuron Pruning (RANP) algorithm that prunes 3D CNNs to high sparsity levels.
Our algorithm leads to roughly 50%-95% reduction in FLOPs and 35%-80% reduction in memory with negligible loss in accuracy compared to the unpruned networks.
arXiv Detail & Related papers (2021-02-09T04:35:29Z) - Learning Hybrid Representations for Automatic 3D Vessel Centerline
Extraction [57.74609918453932]
Automatic blood vessel extraction from 3D medical images is crucial for vascular disease diagnoses.
Existing methods may suffer from discontinuities of extracted vessels when segmenting such thin tubular structures from 3D images.
We argue that preserving the continuity of extracted vessels requires to take into account the global geometry.
We propose a hybrid representation learning approach to address this challenge.
arXiv Detail & Related papers (2020-12-14T05:22:49Z) - 3D CNNs with Adaptive Temporal Feature Resolutions [83.43776851586351]
Similarity Guided Sampling (SGS) module can be plugged into any existing 3D CNN architecture.
SGS empowers 3D CNNs by learning the similarity of temporal features and grouping similar features together.
Our evaluations show that the proposed module improves the state-of-the-art by reducing the computational cost (GFLOPs) by half while preserving or even improving the accuracy.
arXiv Detail & Related papers (2020-11-17T14:34:05Z) - RANP: Resource Aware Neuron Pruning at Initialization for 3D CNNs [32.431100361351675]
We introduce a Resource Aware Neuron Pruning (RANP) algorithm that prunes 3D CNNs at high sparsity levels.
Specifically, the core idea is to obtain an importance score for each neuron based on their sensitivity to the loss function.
This neuron importance is then reweighted according to the neuron resource consumption related to FLOPs or memory.
arXiv Detail & Related papers (2020-10-06T05:34:39Z) - Local Grid Rendering Networks for 3D Object Detection in Point Clouds [98.02655863113154]
CNNs are powerful but it would be computationally costly to directly apply convolutions on point data after voxelizing the entire point clouds to a dense regular 3D grid.
We propose a novel and principled Local Grid Rendering (LGR) operation to render the small neighborhood of a subset of input points into a low-resolution 3D grid independently.
We validate LGR-Net for 3D object detection on the challenging ScanNet and SUN RGB-D datasets.
arXiv Detail & Related papers (2020-07-04T13:57:43Z) - PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection [76.30585706811993]
We present a novel and high-performance 3D object detection framework, named PointVoxel-RCNN (PV-RCNN)
Our proposed method deeply integrates both 3D voxel Convolutional Neural Network (CNN) and PointNet-based set abstraction.
It takes advantages of efficient learning and high-quality proposals of the 3D voxel CNN and the flexible receptive fields of the PointNet-based networks.
arXiv Detail & Related papers (2019-12-31T06:34:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.