RePAST: Relative Pose Attention Scene Representation Transformer
- URL: http://arxiv.org/abs/2304.00947v2
- Date: Mon, 10 Apr 2023 13:11:13 GMT
- Title: RePAST: Relative Pose Attention Scene Representation Transformer
- Authors: Aleksandr Safin, Daniel Duckworth, Mehdi S. M. Sajjadi
- Abstract summary: Scene Representation Transformer (SRT) is a recent method to render novel views at interactive rates.
We propose Relative Pose Attention SRT (RePAST): Instead of fixing a reference frame at the input, we inject pairwise relative camera pose information directly into the attention mechanism of the Transformers.
- Score: 78.33038881681018
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The Scene Representation Transformer (SRT) is a recent method to render novel
views at interactive rates. Since SRT uses camera poses with respect to an
arbitrarily chosen reference camera, it is not invariant to the order of the
input views. As a result, SRT is not directly applicable to large-scale scenes
where the reference frame would need to be changed regularly. In this work, we
propose Relative Pose Attention SRT (RePAST): Instead of fixing a reference
frame at the input, we inject pairwise relative camera pose information
directly into the attention mechanism of the Transformers. This leads to a
model that is by definition invariant to the choice of any global reference
frame, while still retaining the full capabilities of the original method.
Empirical results show that adding this invariance to the model does not lead
to a loss in quality. We believe that this is a step towards applying fully
latent transformer-based rendering methods to large-scale scenes.
Related papers
- Pose-Free Generalizable Rendering Transformer [72.47072706742065]
PF-GRT is a Pose-Free framework for Generalizable Rendering Transformer.
PF-GRT is parameterized using a local relative coordinate system.
Experiments with zero-shot rendering on datasets reveal that it produces superior quality in generating photo-realistic images.
arXiv Detail & Related papers (2023-10-05T17:24:36Z) - CNN Injected Transformer for Image Exposure Correction [20.282217209520006]
Previous exposure correction methods based on convolutions often produce exposure deviation in images.
We propose a CNN Injected Transformer (CIT) to harness the individual strengths of CNN and Transformer simultaneously.
In addition to the hybrid architecture design for exposure correction, we apply a set of carefully formulated loss functions to improve the spatial coherence and rectify potential color deviations.
arXiv Detail & Related papers (2023-09-08T14:53:00Z) - Coarse-to-Fine Multi-Scene Pose Regression with Transformers [19.927662512903915]
A convolutional backbone with a multi-layer perceptron (MLP) head is trained using images and pose labels to embed a single reference at a time.
We propose to learn multi-scene absolute camera pose regression with Transformers, where encoders are used to aggregate activation maps with self-attention.
Our method is evaluated on commonly benchmark indoor and outdoor datasets and has been shown to exceed both multi-scene and state-of-the-art single-scene absolute pose regressors.
arXiv Detail & Related papers (2023-08-22T20:43:31Z) - Isomer: Isomerous Transformer for Zero-shot Video Object Segmentation [59.91357714415056]
We propose two Transformer variants: Context-Sharing Transformer (CST) and Semantic Gathering-Scattering Transformer (S GST)
CST learns the global-shared contextual information within image frames with a lightweight computation; S GST models the semantic correlation separately for the foreground and background.
Compared with the baseline that uses vanilla Transformers for multi-stage fusion, ours significantly increase the speed by 13 times and achieves new state-of-the-art ZVOS performance.
arXiv Detail & Related papers (2023-08-13T06:12:00Z) - RIGID: Recurrent GAN Inversion and Editing of Real Face Videos [73.97520691413006]
GAN inversion is indispensable for applying the powerful editability of GAN to real images.
Existing methods invert video frames individually often leading to undesired inconsistent results over time.
We propose a unified recurrent framework, named textbfRecurrent vtextbfIdeo textbfGAN textbfInversion and etextbfDiting (RIGID)
Our framework learns the inherent coherence between input frames in an end-to-end manner.
arXiv Detail & Related papers (2023-08-11T12:17:24Z) - Learning to Localize in Unseen Scenes with Relative Pose Regressors [5.672132510411465]
Relative pose regressors (RPRs) localize a camera by estimating its relative translation and rotation to a pose-labelled reference.
In practice, however, the performance of RPRs is significantly degraded in unseen scenes.
We implement aggregation with concatenation, projection, and attention operations (Transformers) and learn to regress the relative pose parameters from the resulting latent codes.
Compared to state-of-the-art RPRs, our model is shown to localize significantly better in unseen environments, across both indoor and outdoor benchmarks, while maintaining competitive performance in seen scenes.
arXiv Detail & Related papers (2023-03-05T17:12:50Z) - Overparameterization Improves StyleGAN Inversion [66.8300251627992]
Existing inversion approaches obtain promising yet imperfect results.
We show that this allows us to obtain near-perfect image reconstruction without the need for encoders.
Our approach also retains editability, which we demonstrate by realistically interpolating between images.
arXiv Detail & Related papers (2022-05-12T18:42:43Z) - TransCamP: Graph Transformer for 6-DoF Camera Pose Estimation [77.09542018140823]
We propose a neural network approach with a graph transformer backbone, namely TransCamP, to address the camera relocalization problem.
TransCamP effectively fuses the image features, camera pose information and inter-frame relative camera motions into encoded graph attributes.
arXiv Detail & Related papers (2021-05-28T19:08:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.