HyperPose: Hypernetwork-Infused Camera Pose Localization and an Extended Cambridge Landmarks Dataset
- URL: http://arxiv.org/abs/2303.02610v2
- Date: Tue, 04 Mar 2025 19:46:58 GMT
- Title: HyperPose: Hypernetwork-Infused Camera Pose Localization and an Extended Cambridge Landmarks Dataset
- Authors: Ron Ferens, Yosi Keller,
- Abstract summary: We propose HyperPose, which utilizes hyper-networks in absolute camera pose regressors.<n>The inherent appearance variations in natural scenes, attributable to environmental conditions, perspective, and lighting, induce a significant domain disparity between the training and test datasets.<n>During inference, the hypernetwork dynamically computes adaptive weights for the localization regression heads based on the particular input image.
- Score: 15.055091971627832
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work, we propose HyperPose, which utilizes hyper-networks in absolute camera pose regressors. The inherent appearance variations in natural scenes, attributable to environmental conditions, perspective, and lighting, induce a significant domain disparity between the training and test datasets. This disparity degrades the precision of contemporary localization networks. To mitigate this, we advocate for incorporating hypernetworks into single-scene and multiscene camera pose regression models. During inference, the hypernetwork dynamically computes adaptive weights for the localization regression heads based on the particular input image, effectively narrowing the domain gap. Using indoor and outdoor datasets, we evaluate the HyperPose methodology across multiple established absolute pose regression architectures. We also introduce and share the Extended Cambridge Landmarks (ECL), a novel localization dataset, based on the Cambridge Landmarks dataset, showing it in multiple seasons with significantly varying appearance conditions. Our empirical experiments demonstrate that HyperPose yields notable performance enhancements for single- and multi-scene architectures. We have made our source code, pre-trained models, and the ECL dataset openly available.
Related papers
- FRAME: Floor-aligned Representation for Avatar Motion from Egocentric Video [52.33896173943054]
Egocentric motion capture with a head-mounted body-facing stereo camera is crucial for VR and AR applications.
Existing methods rely on synthetic pretraining and struggle to generate smooth and accurate predictions in real-world settings.
We propose FRAME, a simple yet effective architecture that combines device pose and camera feeds for state-of-the-art body pose prediction.
arXiv Detail & Related papers (2025-03-29T14:26:06Z) - Redundancy-Aware Camera Selection for Indoor Scene Neural Rendering [54.468355408388675]
We build a similarity matrix that incorporates both the spatial diversity of the cameras and the semantic variation of the images.
We apply a diversity-based sampling algorithm to optimize the camera selection.
We also develop a new dataset, IndoorTraj, which includes long and complex camera movements captured by humans in virtual indoor environments.
arXiv Detail & Related papers (2024-09-11T08:36:49Z) - Hyper-YOLO: When Visual Object Detection Meets Hypergraph Computation [74.65906322148997]
We introduce a new object detection method that integrates hypergraph computations to capture the complex high-order correlations among visual features.
Hyper-YOLO significantly outperforms the advanced YOLOv8-N and YOLOv9T with 12% $textval$ and 9% $APMoonLab improvements.
arXiv Detail & Related papers (2024-08-09T01:21:15Z) - ReGround: Improving Textual and Spatial Grounding at No Cost [12.944046673902415]
spatial grounding often outweighs textual grounding due to the sequential flow from gated self-attention to cross-attention.
We demonstrate that such bias can be significantly mitigated without sacrificing accuracy in either grounding by simply rewiring the network architecture.
arXiv Detail & Related papers (2024-03-20T13:37:29Z) - Hyper-VolTran: Fast and Generalizable One-Shot Image to 3D Object
Structure via HyperNetworks [53.67497327319569]
We introduce a novel neural rendering technique to solve image-to-3D from a single view.
Our approach employs the signed distance function as the surface representation and incorporates generalizable priors through geometry-encoding volumes and HyperNetworks.
Our experiments show the advantages of our proposed approach with consistent results and rapid generation.
arXiv Detail & Related papers (2023-12-24T08:42:37Z) - Coarse-to-Fine Multi-Scene Pose Regression with Transformers [19.927662512903915]
A convolutional backbone with a multi-layer perceptron (MLP) head is trained using images and pose labels to embed a single reference at a time.
We propose to learn multi-scene absolute camera pose regression with Transformers, where encoders are used to aggregate activation maps with self-attention.
Our method is evaluated on commonly benchmark indoor and outdoor datasets and has been shown to exceed both multi-scene and state-of-the-art single-scene absolute pose regressors.
arXiv Detail & Related papers (2023-08-22T20:43:31Z) - Bilevel Fast Scene Adaptation for Low-Light Image Enhancement [50.639332885989255]
Enhancing images in low-light scenes is a challenging but widely concerned task in the computer vision.
Main obstacle lies in the modeling conundrum from distribution discrepancy across different scenes.
We introduce the bilevel paradigm to model the above latent correspondence.
A bilevel learning framework is constructed to endow the scene-irrelevant generality of the encoder towards diverse scenes.
arXiv Detail & Related papers (2023-06-02T08:16:21Z) - Alignment-free HDR Deghosting with Semantics Consistent Transformer [76.91669741684173]
High dynamic range imaging aims to retrieve information from multiple low-dynamic range inputs to generate realistic output.
Existing methods often focus on the spatial misalignment across input frames caused by the foreground and/or camera motion.
We propose a novel alignment-free network with a Semantics Consistent Transformer (SCTNet) with both spatial and channel attention modules.
arXiv Detail & Related papers (2023-05-29T15:03:23Z) - HyperE2VID: Improving Event-Based Video Reconstruction via Hypernetworks [16.432164340779266]
We propose HyperE2VID, a dynamic neural network architecture for event-based video reconstruction.
Our approach uses hypernetworks to generate per-pixel adaptive filters guided by a context fusion module.
arXiv Detail & Related papers (2023-05-10T18:00:06Z) - Magnitude Invariant Parametrizations Improve Hypernetwork Learning [0.0]
Hypernetworks are powerful neural networks that predict the parameters of another neural network.
Training typically converges far more slowly than for non-hypernetwork models.
We identify a fundamental and previously unidentified problem that contributes to the challenge of training hypernetworks.
We present a simple solution to this problem using a revised hypernetwork formulation that we call Magnitude Invariant Parametrizations (MIP)
arXiv Detail & Related papers (2023-04-15T22:18:29Z) - DLGSANet: Lightweight Dynamic Local and Global Self-Attention Networks
for Image Super-Resolution [83.47467223117361]
We propose an effective lightweight dynamic local and global self-attention network (DLGSANet) to solve image super-resolution.
Motivated by the network designs of Transformers, we develop a simple yet effective multi-head dynamic local self-attention (MHDLSA) module to extract local features efficiently.
To overcome this problem, we develop a sparse global self-attention (SparseGSA) module to select the most useful similarity values.
arXiv Detail & Related papers (2023-01-05T12:06:47Z) - A Lightweight Domain Adaptive Absolute Pose Regressor Using Barlow Twins
Objective [0.6193838300896449]
In this paper, a domain adaptive training framework for absolute pose regression is introduced.
In the proposed framework, the scene image is augmented for different domains by using generative methods to train parallel branches.
The results demonstrate that, even with using roughly 24 times fewer FLOPs, 12 times fewer activations, and 5 times fewer parameters than MS-Transformer, our approach outperforms all the CNN-based architectures.
arXiv Detail & Related papers (2022-11-20T12:18:53Z) - HyperStyle: StyleGAN Inversion with HyperNetworks for Real Image Editing [2.362412515574206]
HyperStyle learns to modulate StyleGAN's weights to faithfully express a given image in editable regions of the latent space.
HyperStyle yields reconstructions comparable to those of optimization techniques with the near real-time inference capabilities of encoders.
arXiv Detail & Related papers (2021-11-30T18:56:30Z) - Global and Local Alignment Networks for Unpaired Image-to-Image
Translation [170.08142745705575]
The goal of unpaired image-to-image translation is to produce an output image reflecting the target domain's style.
Due to the lack of attention to the content change in existing methods, semantic information from source images suffers from degradation during translation.
We introduce a novel approach, Global and Local Alignment Networks (GLA-Net)
Our method effectively generates sharper and more realistic images than existing approaches.
arXiv Detail & Related papers (2021-11-19T18:01:54Z) - LENS: Localization enhanced by NeRF synthesis [3.4386226615580107]
We demonstrate improvement of camera pose regression thanks to an additional synthetic dataset rendered by the NeRF class of algorithm.
We further improved localization accuracy of pose regressors using synthesized realistic and geometry consistent images as data augmentation during training.
arXiv Detail & Related papers (2021-10-13T08:15:08Z) - Domain-invariant Similarity Activation Map Contrastive Learning for
Retrieval-based Long-term Visual Localization [30.203072945001136]
In this work, a general architecture is first formulated probabilistically to extract domain invariant feature through multi-domain image translation.
And then a novel gradient-weighted similarity activation mapping loss (Grad-SAM) is incorporated for finer localization with high accuracy.
Extensive experiments have been conducted to validate the effectiveness of the proposed approach on the CMUSeasons dataset.
Our performance is on par with or even outperforms the state-of-the-art image-based localization baselines in medium or high precision.
arXiv Detail & Related papers (2020-09-16T14:43:22Z) - 6D Camera Relocalization in Ambiguous Scenes via Continuous Multimodal
Inference [67.70859730448473]
We present a multimodal camera relocalization framework that captures ambiguities and uncertainties.
We predict multiple camera pose hypotheses as well as the respective uncertainty for each prediction.
We introduce a new dataset specifically designed to foster camera localization research in ambiguous environments.
arXiv Detail & Related papers (2020-04-09T20:55:06Z) - Image Fine-grained Inpainting [89.17316318927621]
We present a one-stage model that utilizes dense combinations of dilated convolutions to obtain larger and more effective receptive fields.
To better train this efficient generator, except for frequently-used VGG feature matching loss, we design a novel self-guided regression loss.
We also employ a discriminator with local and global branches to ensure local-global contents consistency.
arXiv Detail & Related papers (2020-02-07T03:45:25Z) - Molecule Property Prediction and Classification with Graph Hypernetworks [113.38181979662288]
We show that the replacement of the underlying networks with hypernetworks leads to a boost in performance.
A major difficulty in the application of hypernetworks is their lack of stability.
A recent work has tackled the training instability of hypernetworks in the context of error correcting codes.
arXiv Detail & Related papers (2020-02-01T16:44:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.