Related papers: T4DT: Tensorizing Time for Learning Temporal 3D Visual Data

T4DT: Tensorizing Time for Learning Temporal 3D Visual Data

URL: http://arxiv.org/abs/2208.01421v1
Date: Tue, 2 Aug 2022 12:57:08 GMT
Title: T4DT: Tensorizing Time for Learning Temporal 3D Visual Data
Authors: Mikhail Usvyatsov, Rafael Ballester-Rippoll, Lina Bashaeva, Konrad Schindler, Gonzalo Ferrer, Ivan Oseledets
Abstract summary: We show that low-rank tensor compression is extremely compact to store and query time-varying signed distance functions. Unlike existing iterative learning-based approaches like DeepSDF and NeRF, our method uses a closed-form algorithm with theoretical guarantees.
Score: 19.418308324435916
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Unlike 2D raster images, there is no single dominant representation for 3D visual data processing. Different formats like point clouds, meshes, or implicit functions each have their strengths and weaknesses. Still, grid representations such as signed distance functions have attractive properties also in 3D. In particular, they offer constant-time random access and are eminently suitable for modern machine learning. Unfortunately, the storage size of a grid grows exponentially with its dimension. Hence they often exceed memory limits even at moderate resolution. This work explores various low-rank tensor formats, including the Tucker, tensor train, and quantics tensor train decompositions, to compress time-varying 3D data. Our method iteratively computes, voxelizes, and compresses each frame's truncated signed distance function and applies tensor rank truncation to condense all frames into a single, compressed tensor that represents the entire 4D scene. We show that low-rank tensor compression is extremely compact to store and query time-varying signed distance functions. It significantly reduces the memory footprint of 4D scenes while surprisingly preserving their geometric quality. Unlike existing iterative learning-based approaches like DeepSDF and NeRF, our method uses a closed-form algorithm with theoretical guarantees.

Related papers

Mesh Compression with Quantized Neural Displacement Fields [31.316999947745614]
Implicit neural representations (INRs) have been successfully used to compress a variety of 3D surface representations. This work presents a simple yet effective method that extends the usage of INRs to compress 3D triangle meshes. We show that our method is capable of preserving intricate geometric textures and demonstrates state-of-the-art performance for compression ratios ranging from 4x to 380x.
arXiv Detail & Related papers (2025-03-28T13:35:32Z)
Is 3D Convolution with 5D Tensors Really Necessary for Video Analysis? [4.817356884702073]
We present several novel techniques for implementing 3D convolutional blocks using 2D and/or 1D convolutions with only 4D and/or 3D tensors. Our motivation is that 3D convolutions with 5D tensors are computationally expensive and they may not be supported by some of the edge devices used in real-time applications such as robots.
arXiv Detail & Related papers (2024-07-23T14:30:51Z)
Coarse-To-Fine Tensor Trains for Compact Visual Representations [19.216356079910533]
'Prolongation Upsampling Train' is a novel method for learning tensor train representations in a coarse-to-fine manner. We evaluate our representation along three axes: (1) compression, (2). denoising capability, and (3) image completion capability.
arXiv Detail & Related papers (2024-06-06T17:59:23Z)
3D Compression Using Neural Fields [90.24458390334203]
We propose a novel NF-based compression algorithm for 3D data. We demonstrate that our method excels at geometry compression on 3D point clouds as well as meshes. It is straightforward to extend our compression algorithm to compress both geometry and attribute (e.g. color) of 3D data.
arXiv Detail & Related papers (2023-11-21T21:36:09Z)
TensorCodec: Compact Lossy Compression of Tensors without Strong Data Assumptions [22.937900567884796]
TENSORCODEC is a lossy compression algorithm for general tensors that do not necessarily adhere to strong input data assumptions. Our analysis and experiments on 8 real-world datasets demonstrate that TENSORCODEC is (a) Concise. It gives up to 7.38x more compact compression than the best competitor with similar reconstruction error.
arXiv Detail & Related papers (2023-09-19T04:48:01Z)
Smaller3d: Smaller Models for 3D Semantic Segmentation Using Minkowski Engine and Knowledge Distillation Methods [0.0]
This paper proposes the application of knowledge distillation techniques, especially for sparse tensors in 3D deep learning, to reduce model sizes while maintaining performance. We analyze and purpose different loss functions, including standard methods and combinations of various losses, to simulate the performance of state-of-the-art models of different Sparse Convolutional NNs.
arXiv Detail & Related papers (2023-05-04T22:19:25Z)
Lightweight integration of 3D features to improve 2D image segmentation [1.3799488979862027]
We show that image segmentation can benefit from 3D geometric information without requiring a 3D groundtruth. Our method can be applied to many 2D segmentation networks, improving significantly their performance.
arXiv Detail & Related papers (2022-12-16T08:22:55Z)
Low-Rank Tensor Function Representation for Multi-Dimensional Data Recovery [52.21846313876592]
Low-rank tensor function representation (LRTFR) can continuously represent data beyond meshgrid with infinite resolution. We develop two fundamental concepts for tensor functions, i.e., the tensor function rank and low-rank tensor function factorization. Our method substantiates the superiority and versatility of our method as compared with state-of-the-art methods.
arXiv Detail & Related papers (2022-12-01T04:00:38Z)
MvDeCor: Multi-view Dense Correspondence Learning for Fine-grained 3D Segmentation [91.6658845016214]
We propose to utilize self-supervised techniques in the 2D domain for fine-grained 3D shape segmentation tasks. We render a 3D shape from multiple views, and set up a dense correspondence learning task within the contrastive learning framework. As a result, the learned 2D representations are view-invariant and geometrically consistent.
arXiv Detail & Related papers (2022-08-18T00:48:15Z)
Displacement-Invariant Cost Computation for Efficient Stereo Matching [122.94051630000934]
Deep learning methods have dominated stereo matching leaderboards by yielding unprecedented disparity accuracy. But their inference time is typically slow, on the order of seconds for a pair of 540p images. We propose a emphdisplacement-invariant cost module to compute the matching costs without needing a 4D feature volume.
arXiv Detail & Related papers (2020-12-01T23:58:16Z)
Learning Deformable Tetrahedral Meshes for 3D Reconstruction [78.0514377738632]
3D shape representations that accommodate learning-based 3D reconstruction are an open problem in machine learning and computer graphics. Previous work on neural 3D reconstruction demonstrated benefits, but also limitations, of point cloud, voxel, surface mesh, and implicit function representations. We introduce Deformable Tetrahedral Meshes (DefTet) as a particular parameterization that utilizes volumetric tetrahedral meshes for the reconstruction problem.
arXiv Detail & Related papers (2020-11-03T02:57:01Z)
A Real-time Action Representation with Temporal Encoding and Deep Compression [115.3739774920845]
We propose a new real-time convolutional architecture, called Temporal Convolutional 3D Network (T-C3D), for action representation. T-C3D learns video action representations in a hierarchical multi-granularity manner while obtaining a high process speed. Our method achieves clear improvements on UCF101 action recognition benchmark against state-of-the-art real-time methods by 5.4% in terms of accuracy and 2 times faster in terms of inference speed with a less than 5MB storage model.
arXiv Detail & Related papers (2020-06-17T06:30:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.