Related papers: STR-GQN: Scene Representation and Rendering for Unknown Cameras Based on Spatial Transformation Routing

STR-GQN: Scene Representation and Rendering for Unknown Cameras Based on Spatial Transformation Routing

URL: http://arxiv.org/abs/2108.03072v1
Date: Fri, 6 Aug 2021 12:10:22 GMT
Title: STR-GQN: Scene Representation and Rendering for Unknown Cameras Based on Spatial Transformation Routing
Authors: Wen-Cheng Chen, Min-Chun Hu, Chu-Song Chen
Abstract summary: We propose a Spatial Transformation Routing (STR) mechanism to model the spatial properties without applying any geometric prior. STR treats the spatial transformation as the message passing process, and the relation between the view poses and the routing weights is modeled by an end-to-end trainable neural network.
Score: 18.954990006113114
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Geometry-aware modules are widely applied in recent deep learning architectures for scene representation and rendering. However, these modules require intrinsic camera information that might not be obtained accurately. In this paper, we propose a Spatial Transformation Routing (STR) mechanism to model the spatial properties without applying any geometric prior. The STR mechanism treats the spatial transformation as the message passing process, and the relation between the view poses and the routing weights is modeled by an end-to-end trainable neural network. Besides, an Occupancy Concept Mapping (OCM) framework is proposed to provide explainable rationals for scene-fusion processes. We conducted experiments on several datasets and show that the proposed STR mechanism improves the performance of the Generative Query Network (GQN). The visualization results reveal that the routing process can pass the observed information from one location of some view to the associated location in the other view, which demonstrates the advantage of the proposed model in terms of spatial cognition.

Related papers

Review of Feed-forward 3D Reconstruction: From DUSt3R to VGGT [10.984522161856955]
3D reconstruction is a cornerstone technology for numerous applications, including augmented/virtual reality, autonomous driving, and robotics.<n>Deep learning has catalyzed a paradigm shift in 3D reconstruction.<n>New models employ a unified deep network to jointly infer camera poses and dense geometry directly from an Unconstrained set of images in a single forward pass.
arXiv Detail & Related papers (2025-07-11T09:41:54Z)
Spectral Architecture Search for Neural Network Models [0.0]
We present a novel architecture search protocol which exploits the spectral attributes of the inter-layer transfer matrices.<n>We show that the newly proposed method yields a self-emerging architecture with a minimal degree of expressivity to handle the task under investigation.
arXiv Detail & Related papers (2025-04-01T15:14:30Z)
Interpretable deformable image registration: A geometric deep learning perspective [9.13809412085203]
We present a theoretical foundation for designing an interpretable registration framework. We formulate an end-to-end process that refines transformations in a coarse-to-fine fashion. We conclude by showing significant improvement in performance metrics over state-of-the-art approaches.
arXiv Detail & Related papers (2024-12-17T19:47:10Z)
The Cooperative Network Architecture: Learning Structured Networks as Representation of Sensory Patterns [3.9848584845601014]
We introduce the Cooperative Network Architecture (CNA), a model that represents sensory signals using structured, recurrently connected networks of neurons, termed "nets" We demonstrate that net fragments can be learned without supervision and flexibly recombined to encode novel patterns, enabling figure completion and resilience to noise.
arXiv Detail & Related papers (2024-07-08T06:22:10Z)
T-TAME: Trainable Attention Mechanism for Explaining Convolutional Networks and Vision Transformers [9.284740716447342]
"Black box" nature of neural networks is a barrier to adoption in applications where explainability is essential. This paper presents T-TAME, Transformer-compatible Trainable Attention Mechanism for Explanations. Proposed architecture and training technique can be easily applied to any convolutional or Vision Transformer-like neural network.
arXiv Detail & Related papers (2024-03-07T14:25:03Z)
The Multiscale Surface Vision Transformer [10.833580445244094]
We introduce the Multiscale Surface Vision Transformer (MS-SiT) as a backbone architecture for surface deep learning. Results demonstrate that the MS-SiT outperforms existing surface deep learning methods for neonatal phenotyping prediction tasks.
arXiv Detail & Related papers (2023-03-21T15:00:17Z)
Learning Detail-Structure Alternative Optimization for Blind Super-Resolution [69.11604249813304]
We propose an effective and kernel-free network, namely DSSR, which enables recurrent detail-structure alternative optimization without blur kernel prior incorporation for blind SR. In our DSSR, a detail-structure modulation module (DSMM) is built to exploit the interaction and collaboration of image details and structures. Our method achieves the state-of-the-art against existing methods.
arXiv Detail & Related papers (2022-12-03T14:44:17Z)
SIM-Trans: Structure Information Modeling Transformer for Fine-grained Visual Categorization [59.732036564862796]
We propose the Structure Information Modeling Transformer (SIM-Trans) to incorporate object structure information into transformer for enhancing discriminative representation learning. The proposed two modules are light-weighted and can be plugged into any transformer network and trained end-to-end easily. Experiments and analyses demonstrate that the proposed SIM-Trans achieves state-of-the-art performance on fine-grained visual categorization benchmarks.
arXiv Detail & Related papers (2022-08-31T03:00:07Z)
Vision Transformer with Convolutions Architecture Search [72.70461709267497]
We propose an architecture search method-Vision Transformer with Convolutions Architecture Search (VTCAS) The high-performance backbone network searched by VTCAS introduces the desirable features of convolutional neural networks into the Transformer architecture. It enhances the robustness of the neural network for object recognition, especially in the low illumination indoor scene.
arXiv Detail & Related papers (2022-03-20T02:59:51Z)
Combining Local and Global Pose Estimation for Precise Tracking of Similar Objects [2.861848675707602]
We present a multi-object 6D detection and tracking pipeline for potentially similar and non-textured objects. A new network architecture, trained solely with synthetic images, allows simultaneous pose estimation of multiple objects. We show how the system can be used in a real AR assistance application within the field of construction.
arXiv Detail & Related papers (2022-01-31T14:36:57Z)
Self-supervised Correlation Mining Network for Person Image Generation [9.505343361614928]
Person image generation aims to perform non-rigid deformation on source images. We propose a Self-supervised Correlation Mining Network (SCM-Net) to rearrange the source images in the feature space. For improving the fidelity of cross-scale pose transformation, we propose a graph based Body Structure Retaining Loss.
arXiv Detail & Related papers (2021-11-26T03:57:46Z)
Retrieval and Localization with Observation Constraints [12.010135672015704]
We propose an integrated visual re-localization method called RLOCS. It combines image retrieval, semantic consistency and geometry verification to achieve accurate estimations. Our method achieves many performance improvements on the challenging localization benchmarks.
arXiv Detail & Related papers (2021-08-19T06:14:33Z)
Reconstructing Interactive 3D Scenes by Panoptic Mapping and CAD Model Alignments [81.38641691636847]
We rethink the problem of scene reconstruction from an embodied agent's perspective. We reconstruct an interactive scene using RGB-D data stream. This reconstructed scene replaces the object meshes in the dense panoptic map with part-based articulated CAD models.
arXiv Detail & Related papers (2021-03-30T05:56:58Z)
Unsupervised Discovery of Disentangled Manifolds in GANs [74.24771216154105]
Interpretable generation process is beneficial to various image editing applications. We propose a framework to discover interpretable directions in the latent space given arbitrary pre-trained generative adversarial networks.
arXiv Detail & Related papers (2020-11-24T02:18:08Z)
Visual Concept Reasoning Networks [93.99840807973546]
A split-transform-merge strategy has been broadly used as an architectural constraint in convolutional neural networks for visual recognition tasks. We propose to exploit this strategy and combine it with our Visual Concept Reasoning Networks (VCRNet) to enable reasoning between high-level visual concepts. Our proposed model, VCRNet, consistently improves the performance by increasing the number of parameters by less than 1%.
arXiv Detail & Related papers (2020-08-26T20:02:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.