STR-GQN: Scene Representation and Rendering for Unknown Cameras Based on
Spatial Transformation Routing
- URL: http://arxiv.org/abs/2108.03072v1
- Date: Fri, 6 Aug 2021 12:10:22 GMT
- Title: STR-GQN: Scene Representation and Rendering for Unknown Cameras Based on
Spatial Transformation Routing
- Authors: Wen-Cheng Chen, Min-Chun Hu, Chu-Song Chen
- Abstract summary: We propose a Spatial Transformation Routing (STR) mechanism to model the spatial properties without applying any geometric prior.
STR treats the spatial transformation as the message passing process, and the relation between the view poses and the routing weights is modeled by an end-to-end trainable neural network.
- Score: 18.954990006113114
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Geometry-aware modules are widely applied in recent deep learning
architectures for scene representation and rendering. However, these modules
require intrinsic camera information that might not be obtained accurately. In
this paper, we propose a Spatial Transformation Routing (STR) mechanism to
model the spatial properties without applying any geometric prior. The STR
mechanism treats the spatial transformation as the message passing process, and
the relation between the view poses and the routing weights is modeled by an
end-to-end trainable neural network. Besides, an Occupancy Concept Mapping
(OCM) framework is proposed to provide explainable rationals for scene-fusion
processes. We conducted experiments on several datasets and show that the
proposed STR mechanism improves the performance of the Generative Query Network
(GQN). The visualization results reveal that the routing process can pass the
observed information from one location of some view to the associated location
in the other view, which demonstrates the advantage of the proposed model in
terms of spatial cognition.
Related papers
- T-TAME: Trainable Attention Mechanism for Explaining Convolutional
Networks and Vision Transformers [9.284740716447342]
"Black box" nature of neural networks is a barrier to adoption in applications where explainability is essential.
This paper presents T-TAME, Transformer-compatible Trainable Attention Mechanism for Explanations.
Proposed architecture and training technique can be easily applied to any convolutional or Vision Transformer-like neural network.
arXiv Detail & Related papers (2024-03-07T14:25:03Z) - The Multiscale Surface Vision Transformer [10.833580445244094]
We introduce the Multiscale Surface Vision Transformer (MS-SiT) as a backbone architecture for surface deep learning.
Results demonstrate that the MS-SiT outperforms existing surface deep learning methods for neonatal phenotyping prediction tasks.
arXiv Detail & Related papers (2023-03-21T15:00:17Z) - Learning Detail-Structure Alternative Optimization for Blind
Super-Resolution [69.11604249813304]
We propose an effective and kernel-free network, namely DSSR, which enables recurrent detail-structure alternative optimization without blur kernel prior incorporation for blind SR.
In our DSSR, a detail-structure modulation module (DSMM) is built to exploit the interaction and collaboration of image details and structures.
Our method achieves the state-of-the-art against existing methods.
arXiv Detail & Related papers (2022-12-03T14:44:17Z) - SIM-Trans: Structure Information Modeling Transformer for Fine-grained
Visual Categorization [59.732036564862796]
We propose the Structure Information Modeling Transformer (SIM-Trans) to incorporate object structure information into transformer for enhancing discriminative representation learning.
The proposed two modules are light-weighted and can be plugged into any transformer network and trained end-to-end easily.
Experiments and analyses demonstrate that the proposed SIM-Trans achieves state-of-the-art performance on fine-grained visual categorization benchmarks.
arXiv Detail & Related papers (2022-08-31T03:00:07Z) - Vision Transformer with Convolutions Architecture Search [72.70461709267497]
We propose an architecture search method-Vision Transformer with Convolutions Architecture Search (VTCAS)
The high-performance backbone network searched by VTCAS introduces the desirable features of convolutional neural networks into the Transformer architecture.
It enhances the robustness of the neural network for object recognition, especially in the low illumination indoor scene.
arXiv Detail & Related papers (2022-03-20T02:59:51Z) - Combining Local and Global Pose Estimation for Precise Tracking of
Similar Objects [2.861848675707602]
We present a multi-object 6D detection and tracking pipeline for potentially similar and non-textured objects.
A new network architecture, trained solely with synthetic images, allows simultaneous pose estimation of multiple objects.
We show how the system can be used in a real AR assistance application within the field of construction.
arXiv Detail & Related papers (2022-01-31T14:36:57Z) - Self-supervised Correlation Mining Network for Person Image Generation [9.505343361614928]
Person image generation aims to perform non-rigid deformation on source images.
We propose a Self-supervised Correlation Mining Network (SCM-Net) to rearrange the source images in the feature space.
For improving the fidelity of cross-scale pose transformation, we propose a graph based Body Structure Retaining Loss.
arXiv Detail & Related papers (2021-11-26T03:57:46Z) - Retrieval and Localization with Observation Constraints [12.010135672015704]
We propose an integrated visual re-localization method called RLOCS.
It combines image retrieval, semantic consistency and geometry verification to achieve accurate estimations.
Our method achieves many performance improvements on the challenging localization benchmarks.
arXiv Detail & Related papers (2021-08-19T06:14:33Z) - Reconstructing Interactive 3D Scenes by Panoptic Mapping and CAD Model
Alignments [81.38641691636847]
We rethink the problem of scene reconstruction from an embodied agent's perspective.
We reconstruct an interactive scene using RGB-D data stream.
This reconstructed scene replaces the object meshes in the dense panoptic map with part-based articulated CAD models.
arXiv Detail & Related papers (2021-03-30T05:56:58Z) - Unsupervised Discovery of Disentangled Manifolds in GANs [74.24771216154105]
Interpretable generation process is beneficial to various image editing applications.
We propose a framework to discover interpretable directions in the latent space given arbitrary pre-trained generative adversarial networks.
arXiv Detail & Related papers (2020-11-24T02:18:08Z) - Visual Concept Reasoning Networks [93.99840807973546]
A split-transform-merge strategy has been broadly used as an architectural constraint in convolutional neural networks for visual recognition tasks.
We propose to exploit this strategy and combine it with our Visual Concept Reasoning Networks (VCRNet) to enable reasoning between high-level visual concepts.
Our proposed model, VCRNet, consistently improves the performance by increasing the number of parameters by less than 1%.
arXiv Detail & Related papers (2020-08-26T20:02:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.