Related papers: Equivariant Light Field Convolution and Transformer

Equivariant Light Field Convolution and Transformer

URL: http://arxiv.org/abs/2212.14871v2
Date: Wed, 7 Jun 2023 18:00:48 GMT
Title: Equivariant Light Field Convolution and Transformer
Authors: Yinshuang Xu, Jiahui Lei, Kostas Daniilidis
Abstract summary: Deep learning of geometric priors from 2D images often requires each image to be represented in a $2D$ canonical frame. We show how to learn priors from multiple views equivariant to coordinate frame transformations by proposing an $SE(3)$-equivariant convolution and transformer in the space of rays in 3D.
Score: 40.840098156362316
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: 3D reconstruction and novel view rendering can greatly benefit from geometric priors when the input views are not sufficient in terms of coverage and inter-view baselines. Deep learning of geometric priors from 2D images often requires each image to be represented in a $2D$ canonical frame and the prior to be learned in a given or learned $3D$ canonical frame. In this paper, given only the relative poses of the cameras, we show how to learn priors from multiple views equivariant to coordinate frame transformations by proposing an $SE(3)$-equivariant convolution and transformer in the space of rays in 3D. This enables the creation of a light field that remains equivariant to the choice of coordinate frame. The light field as defined in our work, refers both to the radiance field and the feature field defined on the ray space. We model the ray space, the domain of the light field, as a homogeneous space of $SE(3)$ and introduce the $SE(3)$-equivariant convolution in ray space. Depending on the output domain of the convolution, we present convolution-based $SE(3)$-equivariant maps from ray space to ray space and to $\mathbb{R}^3$. Our mathematical framework allows us to go beyond convolution to $SE(3)$-equivariant attention in the ray space. We demonstrate how to tailor and adapt the equivariant convolution and transformer in the tasks of equivariant neural rendering and $3D$ reconstruction from multiple views. We demonstrate $SE(3)$-equivariance by obtaining robust results in roto-translated datasets without performing transformation augmentation.

Related papers

$π^3$: Scalable Permutation-Equivariant Visual Geometry Learning [50.80418813055225]
$pi3$ is a feed-forward neural network that offers a novel approach to visual geometry reconstruction.<n>pi3$ employs a fully permutation-equivariant architecture to predict affine-invariant camera poses and scale-invariant local point maps.
arXiv Detail & Related papers (2025-07-17T17:59:53Z)
You Need a Transition Plane: Bridging Continuous Panoramic 3D Reconstruction with Perspective Gaussian Splatting [57.44295803750027]
We present a novel framework, named TPGS, to bridge continuous panoramic 3D scene reconstruction with perspective Gaussian splatting. Specifically, we optimize 3D Gaussians within individual cube faces and then fine-tune them in the stitched panoramic space. Experiments on indoor and outdoor, egocentric, and roaming benchmark datasets demonstrate that our approach outperforms existing state-of-the-art methods.
arXiv Detail & Related papers (2025-04-12T03:42:50Z)
ODGS: 3D Scene Reconstruction from Omnidirectional Images with 3D Gaussian Splattings [48.72040500647568]
We present ODGS, a novelization pipeline for omnidirectional images, with geometric interpretation. The entire pipeline is parallelized using, achieving optimization and speeds 100 times faster than NeRF-based methods. Results show ODGS restores fine details effectively, even when reconstructing large 3D scenes.
arXiv Detail & Related papers (2024-10-28T02:45:13Z)
Learning Naturally Aggregated Appearance for Efficient 3D Editing [94.47518916521065]
We propose to replace the color field with an explicit 2D appearance aggregation, also called canonical image. To avoid the distortion effect and facilitate convenient editing, we complement the canonical image with a projection field that maps 3D points onto 2D pixels for texture lookup. Our representation, dubbed AGAP, well supports various ways of 3D editing (e.g., stylization, interactive drawing, and content extraction) with no need of re-optimization.
arXiv Detail & Related papers (2023-12-11T18:59:31Z)
MoDA: Modeling Deformable 3D Objects from Casual Videos [84.29654142118018]
We propose neural dual quaternion blend skinning (NeuDBS) to achieve 3D point deformation without skin-collapsing artifacts. In the endeavor to register 2D pixels across different frames, we establish a correspondence between canonical feature embeddings that encodes 3D points within the canonical space. Our approach can reconstruct 3D models for humans and animals with better qualitative and quantitative performance than state-of-the-art methods.
arXiv Detail & Related papers (2023-04-17T13:49:04Z)
Equivalence Between SE(3) Equivariant Networks via Steerable Kernels and Group Convolution [90.67482899242093]
A wide range of techniques have been proposed in recent years for designing neural networks for 3D data that are equivariant under rotation and translation of the input. We provide an in-depth analysis of both methods and their equivalence and relate the two constructions to multiview convolutional networks. We also derive new TFN non-linearities from our equivalence principle and test them on practical benchmark datasets.
arXiv Detail & Related papers (2022-11-29T03:42:11Z)
EpiGRAF: Rethinking training of 3D GANs [60.38818140637367]
We show that it is possible to obtain a high-resolution 3D generator with SotA image quality by following a completely different route of simply training the model patch-wise. The resulting model, named EpiGRAF, is an efficient, high-resolution, pure 3D generator.
arXiv Detail & Related papers (2022-06-21T17:08:23Z)
Rotation Equivariant 3D Hand Mesh Generation from a Single RGB Image [1.8692254863855962]
We develop a rotation equivariant model for generating 3D hand meshes from 2D RGB images. This guarantees that as the input image of a hand is rotated the generated mesh undergoes a corresponding rotation.
arXiv Detail & Related papers (2021-11-25T11:07:27Z)
i3dLoc: Image-to-range Cross-domain Localization Robust to Inconsistent Environmental Conditions [9.982307144353713]
We present a method for localizing a single camera with respect to a point cloud map in indoor and outdoor scenes. Our method can match equirectangular images to the 3D range projections by extracting cross-domain symmetric place descriptors. With a single trained model, i3dLoc can demonstrate reliable visual localization in random conditions.
arXiv Detail & Related papers (2021-05-27T00:13:11Z)
Equivariant Point Network for 3D Point Cloud Analysis [17.689949017410836]
We propose an effective and practical SE(3) (3D translation and rotation) equivariant network for point cloud analysis. First, we present SE(3) separable point convolution, a novel framework that breaks down the 6D convolution into two separable convolutional operators. Second, we introduce an attention layer to effectively harness the expressiveness of the equivariant features.
arXiv Detail & Related papers (2021-03-25T21:57:10Z)
Rotation-Invariant Autoencoders for Signals on Spheres [10.406659081400354]
We study the problem of unsupervised learning of rotation-invariant representations for spherical images. In particular, we design an autoencoder architecture consisting of $S2$ and $SO(3)$ convolutional layers. Experiments on multiple datasets demonstrate the usefulness of the learned representations on clustering, retrieval and classification applications.
arXiv Detail & Related papers (2020-12-08T15:15:03Z)
Generalizing Spatial Transformers to Projective Geometry with Applications to 2D/3D Registration [11.219924013808852]
Differentiable rendering is a technique to connect 3D scenes with corresponding 2D images. We propose a novel Projective Spatial Transformer module that generalizes spatial transformers to projective geometry.
arXiv Detail & Related papers (2020-03-24T17:26:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.