Explainable 3D Convolutional Neural Networks by Learning Temporal
Transformations
- URL: http://arxiv.org/abs/2006.15983v1
- Date: Mon, 29 Jun 2020 12:29:30 GMT
- Title: Explainable 3D Convolutional Neural Networks by Learning Temporal
Transformations
- Authors: Gabri\"elle Ras, Luca Ambrogioni, Pim Haselager, Marcel A.J. van
Gerven, Umut G\"u\c{c}l\"u
- Abstract summary: We introduce the temporally factorized 3D convolution (3TConv) as an interpretable alternative to the regular 3D convolution (3DConv)
In a 3TConv the 3D convolutional filter is obtained by learning a 2D filter and a set of temporal transformation parameters.
We demonstrate that 3TConv learns temporal transformations that afford a direct interpretation.
- Score: 6.477885112149906
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper we introduce the temporally factorized 3D convolution (3TConv)
as an interpretable alternative to the regular 3D convolution (3DConv). In a
3TConv the 3D convolutional filter is obtained by learning a 2D filter and a
set of temporal transformation parameters, resulting in a sparse filter where
the 2D slices are sequentially dependent on each other in the temporal
dimension. We demonstrate that 3TConv learns temporal transformations that
afford a direct interpretation. The temporal parameters can be used in
combination with various existing 2D visualization methods. We also show that
insight about what the model learns can be achieved by analyzing the
transformation parameter statistics on a layer and model level. Finally, we
implicitly demonstrate that, in popular ConvNets, the 2DConv can be replaced
with a 3TConv and that the weights can be transferred to yield pretrained
3TConvs. pretrained 3TConvnets leverage more than a decade of work on
traditional 2DConvNets by being able to make use of features that have been
proven to deliver excellent results on image classification benchmarks.
Related papers
- Dynamic 3D Point Cloud Sequences as 2D Videos [81.46246338686478]
3D point cloud sequences serve as one of the most common and practical representation modalities of real-world environments.
We propose a novel generic representation called textitStructured Point Cloud Videos (SPCVs)
SPCVs re-organizes a point cloud sequence as a 2D video with spatial smoothness and temporal consistency, where the pixel values correspond to the 3D coordinates of points.
arXiv Detail & Related papers (2024-03-02T08:18:57Z) - MvDeCor: Multi-view Dense Correspondence Learning for Fine-grained 3D
Segmentation [91.6658845016214]
We propose to utilize self-supervised techniques in the 2D domain for fine-grained 3D shape segmentation tasks.
We render a 3D shape from multiple views, and set up a dense correspondence learning task within the contrastive learning framework.
As a result, the learned 2D representations are view-invariant and geometrically consistent.
arXiv Detail & Related papers (2022-08-18T00:48:15Z) - 3D-to-2D Distillation for Indoor Scene Parsing [78.36781565047656]
We present a new approach that enables us to leverage 3D features extracted from large-scale 3D data repository to enhance 2D features extracted from RGB images.
First, we distill 3D knowledge from a pretrained 3D network to supervise a 2D network to learn simulated 3D features from 2D features during the training.
Second, we design a two-stage dimension normalization scheme to calibrate the 2D and 3D features for better integration.
Third, we design a semantic-aware adversarial training model to extend our framework for training with unpaired 3D data.
arXiv Detail & Related papers (2021-04-06T02:22:24Z) - The Devils in the Point Clouds: Studying the Robustness of Point Cloud
Convolutions [15.997907568429177]
This paper investigates different variants of PointConv, a convolution network on point clouds, to examine their robustness to input scale and rotation changes.
We derive a novel viewpoint-invariant descriptor by utilizing 3D geometric properties as the input to PointConv.
Experiments are conducted on the 2D MNIST & CIFAR-10 datasets as well as the 3D Semantic KITTI & ScanNet dataset.
arXiv Detail & Related papers (2021-01-19T19:32:38Z) - 2D or not 2D? Adaptive 3D Convolution Selection for Efficient Video
Recognition [84.697097472401]
We introduce Ada3D, a conditional computation framework that learns instance-specific 3D usage policies to determine frames and convolution layers to be used in a 3D network.
We demonstrate that our method achieves similar accuracies to state-of-the-art 3D models while requiring 20%-50% less computation across different datasets.
arXiv Detail & Related papers (2020-12-29T21:40:38Z) - Making a Case for 3D Convolutions for Object Segmentation in Videos [16.167397418720483]
We show that 3D convolutional networks can be effectively applied to dense video prediction tasks such as salient object segmentation.
We propose a 3D decoder architecture, that comprises novel 3D Global Convolution layers and 3D Refinement modules.
Our approach outperforms existing state-of-the-arts by a large margin on the DAVIS'16 Unsupervised, FBMS and ViSal benchmarks.
arXiv Detail & Related papers (2020-08-26T12:24:23Z) - Cylinder3D: An Effective 3D Framework for Driving-scene LiDAR Semantic
Segmentation [87.54570024320354]
State-of-the-art methods for large-scale driving-scene LiDAR semantic segmentation often project and process the point clouds in the 2D space.
A straightforward solution to tackle the issue of 3D-to-2D projection is to keep the 3D representation and process the points in the 3D space.
We develop a 3D cylinder partition and a 3D cylinder convolution based framework, termed as Cylinder3D, which exploits the 3D topology relations and structures of driving-scene point clouds.
arXiv Detail & Related papers (2020-08-04T13:56:19Z) - Learning Local Neighboring Structure for Robust 3D Shape Representation [143.15904669246697]
Representation learning for 3D meshes is important in many computer vision and graphics applications.
We propose a local structure-aware anisotropic convolutional operation (LSA-Conv)
Our model produces significant improvement in 3D shape reconstruction compared to state-of-the-art methods.
arXiv Detail & Related papers (2020-04-21T13:40:03Z) - Anisotropic Convolutional Networks for 3D Semantic Scene Completion [24.9671648682339]
semantic scene completion (SSC) tries to simultaneously infer the occupancy and semantic labels for a scene from a single depth and/or RGB image.
We propose a novel module called anisotropic convolution, which properties with flexibility and power impossible for competing methods.
In contrast to the standard 3D convolution that is limited to a fixed 3D receptive field, our module is capable of modeling the dimensional anisotropy voxel-wisely.
arXiv Detail & Related papers (2020-04-05T07:57:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.