V4D:4D Convolutional Neural Networks for Video-level Representation
Learning
- URL: http://arxiv.org/abs/2002.07442v1
- Date: Tue, 18 Feb 2020 09:27:41 GMT
- Title: V4D:4D Convolutional Neural Networks for Video-level Representation
Learning
- Authors: Shiwen Zhang and Sheng Guo and Weilin Huang and Matthew R. Scott and
Limin Wang
- Abstract summary: Most 3D CNNs for video representation learning are clip-based, and thus do not consider video-temporal evolution of features.
We propose Video-level 4D Conal Neural Networks, or V4D, to model long-range representation with 4D convolutions.
V4D achieves excellent results, surpassing recent 3D CNNs by a large margin.
- Score: 58.548331848942865
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Most existing 3D CNNs for video representation learning are clip-based
methods, and thus do not consider video-level temporal evolution of
spatio-temporal features. In this paper, we propose Video-level 4D
Convolutional Neural Networks, referred as V4D, to model the evolution of
long-range spatio-temporal representation with 4D convolutions, and at the same
time, to preserve strong 3D spatio-temporal representation with residual
connections. Specifically, we design a new 4D residual block able to capture
inter-clip interactions, which could enhance the representation power of the
original clip-level 3D CNNs. The 4D residual blocks can be easily integrated
into the existing 3D CNNs to perform long-range modeling hierarchically. We
further introduce the training and inference methods for the proposed V4D.
Extensive experiments are conducted on three video recognition benchmarks,
where V4D achieves excellent results, surpassing recent 3D CNNs by a large
margin.
Related papers
- Stereo4D: Learning How Things Move in 3D from Internet Stereo Videos [76.07894127235058]
We present a system for mining high-quality 4D reconstructions from internet stereoscopic, wide-angle videos.
We use this method to generate large-scale data in the form of world-consistent, pseudo-metric 3D point clouds.
We demonstrate the utility of this data by training a variant of DUSt3R to predict structure and 3D motion from real-world image pairs.
arXiv Detail & Related papers (2024-12-12T18:59:54Z) - Diffusion4D: Fast Spatial-temporal Consistent 4D Generation via Video Diffusion Models [116.31344506738816]
We present a novel framework, textbfDiffusion4D, for efficient and scalable 4D content generation.
We develop a 4D-aware video diffusion model capable of synthesizing orbital views of dynamic 3D assets.
Our method surpasses prior state-of-the-art techniques in terms of generation efficiency and 4D geometry consistency.
arXiv Detail & Related papers (2024-05-26T17:47:34Z) - Efficient4D: Fast Dynamic 3D Object Generation from a Single-view Video [42.10482273572879]
We propose an efficient video-to-4D object generation framework called Efficient4D.
It generates high-quality spacetime-consistent images under different camera views, and then uses them as labeled data.
Experiments on both synthetic and real videos show that Efficient4D offers a remarkable 10-fold increase in speed.
arXiv Detail & Related papers (2024-01-16T18:58:36Z) - 4DGen: Grounded 4D Content Generation with Spatial-temporal Consistency [118.15258850780417]
We present textbf4DGen, a novel framework for grounded 4D content creation.
Our pipeline facilitates controllable 4D generation, enabling users to specify the motion via monocular video or adopt image-to-video generations.
Compared to existing video-to-4D baselines, our approach yields superior results in faithfully reconstructing input signals.
arXiv Detail & Related papers (2023-12-28T18:53:39Z) - F4D: Factorized 4D Convolutional Neural Network for Efficient
Video-level Representation Learning [4.123763595394021]
Most existing 3D convolutional neural network (CNN)-based methods for video-level representation learning are clip-based.
We propose a factorized 4D CNN architecture with attention (F4D) that is capable of learning more effective, finer-grained, long-termtemporal video representations.
arXiv Detail & Related papers (2023-11-28T19:21:57Z) - Consistent4D: Consistent 360{\deg} Dynamic Object Generation from
Monocular Video [15.621374353364468]
Consistent4D is a novel approach for generating 4D dynamic objects from uncalibrated monocular videos.
We cast the 360-degree dynamic object reconstruction as a 4D generation problem, eliminating the need for tedious multi-view data collection and camera calibration.
arXiv Detail & Related papers (2023-11-06T03:26:43Z) - Learning Parallel Dense Correspondence from Spatio-Temporal Descriptors
for Efficient and Robust 4D Reconstruction [43.60322886598972]
This paper focuses on the task of 4D shape reconstruction from a sequence of point clouds.
We present a novel pipeline to learn a temporal evolution of the 3D human shape through capturing continuous transformation functions among cross-frame occupancy fields.
arXiv Detail & Related papers (2021-03-30T13:36:03Z) - Learning Compositional Representation for 4D Captures with Neural ODE [72.56606274691033]
We introduce a compositional representation for 4D captures, that disentangles shape, initial state, and motion respectively.
To model the motion, a neural Ordinary Differential Equation (ODE) is trained to update the initial state conditioned on the learned motion code.
A decoder takes the shape code and the updated pose code to reconstruct 4D captures at each time stamp.
arXiv Detail & Related papers (2021-03-15T10:55:55Z) - A Real-time Action Representation with Temporal Encoding and Deep
Compression [115.3739774920845]
We propose a new real-time convolutional architecture, called Temporal Convolutional 3D Network (T-C3D), for action representation.
T-C3D learns video action representations in a hierarchical multi-granularity manner while obtaining a high process speed.
Our method achieves clear improvements on UCF101 action recognition benchmark against state-of-the-art real-time methods by 5.4% in terms of accuracy and 2 times faster in terms of inference speed with a less than 5MB storage model.
arXiv Detail & Related papers (2020-06-17T06:30:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.