A3D: Adaptive 3D Networks for Video Action Recognition
- URL: http://arxiv.org/abs/2011.12384v1
- Date: Tue, 24 Nov 2020 21:01:11 GMT
- Title: A3D: Adaptive 3D Networks for Video Action Recognition
- Authors: Sijie Zhu and Taojiannan Yang and Matias Mendieta and Chen Chen
- Abstract summary: A3D is an adaptive 3D network that can infer at a wide range of computational one-time training.
It generates good constraints with trading off between network width andtemporal resolution.
Even under the same computational constraints, performance of our adaptive networks can be significantly boosted.
- Score: 17.118351068420086
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper presents A3D, an adaptive 3D network that can infer at a wide
range of computational constraints with one-time training. Instead of training
multiple models in a grid-search manner, it generates good configurations by
trading off between network width and spatio-temporal resolution. Furthermore,
the computation cost can be adapted after the model is deployed to meet
variable constraints, for example, on edge devices. Even under the same
computational constraints, the performance of our adaptive networks can be
significantly boosted over the baseline counterparts by the mutual training
along three dimensions. When a multiple pathway framework, e.g. SlowFast, is
adopted, our adaptive method encourages a better trade-off between pathways
than manual designs. Extensive experiments on the Kinetics dataset show the
effectiveness of the proposed framework. The performance gain is also verified
to transfer well between datasets and tasks. Code will be made available.
Related papers
- Towards Large-scale 3D Representation Learning with Multi-dataset Point Prompt Training [44.790636524264]
Point Prompt Training is a novel framework for multi-dataset synergistic learning in the context of 3D representation learning.
It can overcome the negative transfer associated with synergistic learning and produce generalizable representations.
It achieves state-of-the-art performance on each dataset using a single weight-shared model with supervised multi-dataset training.
arXiv Detail & Related papers (2023-08-18T17:59:57Z) - Fast-SNARF: A Fast Deformer for Articulated Neural Fields [92.68788512596254]
We propose a new articulation module for neural fields, Fast-SNARF, which finds accurate correspondences between canonical space and posed space.
Fast-SNARF is a drop-in replacement in to our previous work, SNARF, while significantly improving its computational efficiency.
Because learning of deformation maps is a crucial component in many 3D human avatar methods, we believe that this work represents a significant step towards the practical creation of 3D virtual humans.
arXiv Detail & Related papers (2022-11-28T17:55:34Z) - Transformation-Equivariant 3D Object Detection for Autonomous Driving [44.17100476968737]
Transformation-Equivariant 3D Detector (TED) is an efficient way to detect 3D objects in autonomous driving.
TED ranks 1st among all submissions on KITTI 3D car detection leaderboard.
arXiv Detail & Related papers (2022-11-22T02:51:56Z) - SVNet: Where SO(3) Equivariance Meets Binarization on Point Cloud
Representation [65.4396959244269]
The paper tackles the challenge by designing a general framework to construct 3D learning architectures.
The proposed approach can be applied to general backbones like PointNet and DGCNN.
Experiments on ModelNet40, ShapeNet, and the real-world dataset ScanObjectNN, demonstrated that the method achieves a great trade-off between efficiency, rotation, and accuracy.
arXiv Detail & Related papers (2022-09-13T12:12:19Z) - GLEAM: Greedy Learning for Large-Scale Accelerated MRI Reconstruction [50.248694764703714]
Unrolled neural networks have recently achieved state-of-the-art accelerated MRI reconstruction.
These networks unroll iterative optimization algorithms by alternating between physics-based consistency and neural-network based regularization.
We propose Greedy LEarning for Accelerated MRI reconstruction, an efficient training strategy for high-dimensional imaging settings.
arXiv Detail & Related papers (2022-07-18T06:01:29Z) - Dual Octree Graph Networks for Learning Adaptive Volumetric Shape
Representations [21.59311861556396]
Our method encodes the volumetric field of a 3D shape with an adaptive feature volume organized by an octree.
An encoder-decoder network is designed to learn the adaptive feature volume based on the graph convolutions over the dual graph of octree nodes.
Our method effectively encodes shape details, enables fast 3D shape reconstruction, and exhibits good generality for modeling 3D shapes out of training categories.
arXiv Detail & Related papers (2022-05-05T17:56:34Z) - Domain Adaptor Networks for Hyperspectral Image Recognition [35.95313368586933]
We consider the problem of adapting a network trained on three-channel color images to a hyperspectral domain with a large number of channels.
We propose domain adaptor networks that map the input to be compatible with a network trained on large-scale color image datasets such as ImageNet.
arXiv Detail & Related papers (2021-08-03T15:06:39Z) - Multi-Exit Semantic Segmentation Networks [78.44441236864057]
We propose a framework for converting state-of-the-art segmentation models to MESS networks.
specially trained CNNs that employ parametrised early exits along their depth to save during inference on easier samples.
We co-optimise the number, placement and architecture of the attached segmentation heads, along with the exit policy, to adapt to the device capabilities and application-specific requirements.
arXiv Detail & Related papers (2021-06-07T11:37:03Z) - Adjoint Rigid Transform Network: Task-conditioned Alignment of 3D Shapes [86.2129580231191]
Adjoint Rigid Transform (ART) Network is a neural module which can be integrated with a variety of 3D networks.
ART learns to rotate input shapes to a learned canonical orientation, which is crucial for a lot of tasks.
We will release our code and pre-trained models for further research.
arXiv Detail & Related papers (2021-02-01T20:58:45Z) - Gram Regularization for Multi-view 3D Shape Retrieval [3.655021726150368]
We propose a novel regularization term called Gram regularization.
By forcing the variance between weight kernels to be large, the regularizer can help to extract discriminative features.
The proposed Gram regularization is data independent and can converge stably and quickly without bells and whistles.
arXiv Detail & Related papers (2020-11-16T05:37:24Z) - Deep Adaptive Inference Networks for Single Image Super-Resolution [72.7304455761067]
Single image super-resolution (SISR) has witnessed tremendous progress in recent years owing to the deployment of deep convolutional neural networks (CNNs)
In this paper, we take a step forward to address this issue by leveraging the adaptive inference networks for deep SISR (AdaDSR)
Our AdaDSR involves an SISR model as backbone and a lightweight adapter module which takes image features and resource constraint as input and predicts a map of local network depth.
arXiv Detail & Related papers (2020-04-08T10:08:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.