T4DT: Tensorizing Time for Learning Temporal 3D Visual Data
- URL: http://arxiv.org/abs/2208.01421v1
- Date: Tue, 2 Aug 2022 12:57:08 GMT
- Title: T4DT: Tensorizing Time for Learning Temporal 3D Visual Data
- Authors: Mikhail Usvyatsov, Rafael Ballester-Rippoll, Lina Bashaeva, Konrad
Schindler, Gonzalo Ferrer, Ivan Oseledets
- Abstract summary: We show that low-rank tensor compression is extremely compact to store and query time-varying signed distance functions.
Unlike existing iterative learning-based approaches like DeepSDF and NeRF, our method uses a closed-form algorithm with theoretical guarantees.
- Score: 19.418308324435916
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Unlike 2D raster images, there is no single dominant representation for 3D
visual data processing. Different formats like point clouds, meshes, or
implicit functions each have their strengths and weaknesses. Still, grid
representations such as signed distance functions have attractive properties
also in 3D. In particular, they offer constant-time random access and are
eminently suitable for modern machine learning. Unfortunately, the storage size
of a grid grows exponentially with its dimension. Hence they often exceed
memory limits even at moderate resolution. This work explores various low-rank
tensor formats, including the Tucker, tensor train, and quantics tensor train
decompositions, to compress time-varying 3D data. Our method iteratively
computes, voxelizes, and compresses each frame's truncated signed distance
function and applies tensor rank truncation to condense all frames into a
single, compressed tensor that represents the entire 4D scene. We show that
low-rank tensor compression is extremely compact to store and query
time-varying signed distance functions. It significantly reduces the memory
footprint of 4D scenes while surprisingly preserving their geometric quality.
Unlike existing iterative learning-based approaches like DeepSDF and NeRF, our
method uses a closed-form algorithm with theoretical guarantees.
Related papers
- Is 3D Convolution with 5D Tensors Really Necessary for Video Analysis? [4.817356884702073]
We present several novel techniques for implementing 3D convolutional blocks using 2D and/or 1D convolutions with only 4D and/or 3D tensors.
Our motivation is that 3D convolutions with 5D tensors are computationally expensive and they may not be supported by some of the edge devices used in real-time applications such as robots.
arXiv Detail & Related papers (2024-07-23T14:30:51Z) - Coarse-To-Fine Tensor Trains for Compact Visual Representations [19.216356079910533]
'Prolongation Upsampling Train' is a novel method for learning tensor train representations in a coarse-to-fine manner.
We evaluate our representation along three axes: (1) compression, (2).
denoising capability, and (3) image completion capability.
arXiv Detail & Related papers (2024-06-06T17:59:23Z) - 3D Compression Using Neural Fields [90.24458390334203]
We propose a novel NF-based compression algorithm for 3D data.
We demonstrate that our method excels at geometry compression on 3D point clouds as well as meshes.
It is straightforward to extend our compression algorithm to compress both geometry and attribute (e.g. color) of 3D data.
arXiv Detail & Related papers (2023-11-21T21:36:09Z) - TensorCodec: Compact Lossy Compression of Tensors without Strong Data
Assumptions [22.937900567884796]
TENSORCODEC is a lossy compression algorithm for general tensors that do not necessarily adhere to strong input data assumptions.
Our analysis and experiments on 8 real-world datasets demonstrate that TENSORCODEC is (a) Concise.
It gives up to 7.38x more compact compression than the best competitor with similar reconstruction error.
arXiv Detail & Related papers (2023-09-19T04:48:01Z) - Smaller3d: Smaller Models for 3D Semantic Segmentation Using Minkowski
Engine and Knowledge Distillation Methods [0.0]
This paper proposes the application of knowledge distillation techniques, especially for sparse tensors in 3D deep learning, to reduce model sizes while maintaining performance.
We analyze and purpose different loss functions, including standard methods and combinations of various losses, to simulate the performance of state-of-the-art models of different Sparse Convolutional NNs.
arXiv Detail & Related papers (2023-05-04T22:19:25Z) - Lightweight integration of 3D features to improve 2D image segmentation [1.3799488979862027]
We show that image segmentation can benefit from 3D geometric information without requiring a 3D groundtruth.
Our method can be applied to many 2D segmentation networks, improving significantly their performance.
arXiv Detail & Related papers (2022-12-16T08:22:55Z) - Low-Rank Tensor Function Representation for Multi-Dimensional Data
Recovery [52.21846313876592]
Low-rank tensor function representation (LRTFR) can continuously represent data beyond meshgrid with infinite resolution.
We develop two fundamental concepts for tensor functions, i.e., the tensor function rank and low-rank tensor function factorization.
Our method substantiates the superiority and versatility of our method as compared with state-of-the-art methods.
arXiv Detail & Related papers (2022-12-01T04:00:38Z) - MvDeCor: Multi-view Dense Correspondence Learning for Fine-grained 3D
Segmentation [91.6658845016214]
We propose to utilize self-supervised techniques in the 2D domain for fine-grained 3D shape segmentation tasks.
We render a 3D shape from multiple views, and set up a dense correspondence learning task within the contrastive learning framework.
As a result, the learned 2D representations are view-invariant and geometrically consistent.
arXiv Detail & Related papers (2022-08-18T00:48:15Z) - Displacement-Invariant Cost Computation for Efficient Stereo Matching [122.94051630000934]
Deep learning methods have dominated stereo matching leaderboards by yielding unprecedented disparity accuracy.
But their inference time is typically slow, on the order of seconds for a pair of 540p images.
We propose a emphdisplacement-invariant cost module to compute the matching costs without needing a 4D feature volume.
arXiv Detail & Related papers (2020-12-01T23:58:16Z) - Learning Deformable Tetrahedral Meshes for 3D Reconstruction [78.0514377738632]
3D shape representations that accommodate learning-based 3D reconstruction are an open problem in machine learning and computer graphics.
Previous work on neural 3D reconstruction demonstrated benefits, but also limitations, of point cloud, voxel, surface mesh, and implicit function representations.
We introduce Deformable Tetrahedral Meshes (DefTet) as a particular parameterization that utilizes volumetric tetrahedral meshes for the reconstruction problem.
arXiv Detail & Related papers (2020-11-03T02:57:01Z) - A Real-time Action Representation with Temporal Encoding and Deep
Compression [115.3739774920845]
We propose a new real-time convolutional architecture, called Temporal Convolutional 3D Network (T-C3D), for action representation.
T-C3D learns video action representations in a hierarchical multi-granularity manner while obtaining a high process speed.
Our method achieves clear improvements on UCF101 action recognition benchmark against state-of-the-art real-time methods by 5.4% in terms of accuracy and 2 times faster in terms of inference speed with a less than 5MB storage model.
arXiv Detail & Related papers (2020-06-17T06:30:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.