F$^3$Loc: Fusion and Filtering for Floorplan Localization
- URL: http://arxiv.org/abs/2403.03370v1
- Date: Tue, 5 Mar 2024 23:32:26 GMT
- Title: F$^3$Loc: Fusion and Filtering for Floorplan Localization
- Authors: Changan Chen, Rui Wang, Christoph Vogel, Marc Pollefeys
- Abstract summary: We propose an efficient data-driven solution to self-localization within a floorplan.
Our method does not require retraining per map and location or demand a large database of images of the area of interest.
- Score: 63.28504055661646
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper we propose an efficient data-driven solution to
self-localization within a floorplan. Floorplan data is readily available,
long-term persistent and inherently robust to changes in the visual appearance.
Our method does not require retraining per map and location or demand a large
database of images of the area of interest. We propose a novel probabilistic
model consisting of an observation and a novel temporal filtering module.
Operating internally with an efficient ray-based representation, the
observation module consists of a single and a multiview module to predict
horizontal depth from images and fuses their results to benefit from advantages
offered by either methodology. Our method operates on conventional consumer
hardware and overcomes a common limitation of competing methods that often
demand upright images. Our full system meets real-time requirements, while
outperforming the state-of-the-art by a significant margin.
Related papers
- Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models [44.437693135170576]
We propose a new framework, LMM with Sophisticated Tasks, Local image compression, and Mixture of global Experts (SliME)
We extract contextual information from the global view using a mixture of adapters, based on the observation that different adapters excel at different tasks.
The proposed method achieves leading performance across various benchmarks with only 2 million training data.
arXiv Detail & Related papers (2024-06-12T17:59:49Z) - Multi-view Aggregation Network for Dichotomous Image Segmentation [76.75904424539543]
Dichotomous Image (DIS) has recently emerged towards high-precision object segmentation from high-resolution natural images.
Existing methods rely on tedious multiple encoder-decoder streams and stages to gradually complete the global localization and local refinement.
Inspired by it, we model DIS as a multi-view object perception problem and provide a parsimonious multi-view aggregation network (MVANet)
Experiments on the popular DIS-5K dataset show that our MVANet significantly outperforms state-of-the-art methods in both accuracy and speed.
arXiv Detail & Related papers (2024-04-11T03:00:00Z) - Few-shot Object Localization [37.347898735345574]
This paper defines a novel task named Few-Shot Object localization (FSOL)
It aims to achieve precise localization with limited samples.
This task achieves generalized object localization by leveraging a small number of labeled support samples to query the positional information of objects within corresponding images.
Experimental results demonstrate a significant performance improvement of our approach in the FSOL task, establishing an efficient benchmark for further research.
arXiv Detail & Related papers (2024-03-19T05:50:48Z) - FlexNeRF: Photorealistic Free-viewpoint Rendering of Moving Humans from
Sparse Views [71.77680030806513]
We present FlexNeRF, a method for photorealistic freeviewpoint rendering of humans in motion from monocular videos.
Our approach works well with sparse views, which is a challenging scenario when the subject is exhibiting fast/complex motions.
Thanks to our novel temporal and cyclic consistency constraints, our approach provides high quality outputs as the observed views become sparser.
arXiv Detail & Related papers (2023-03-25T05:47:08Z) - Monocular BEV Perception of Road Scenes via Front-to-Top View Projection [57.19891435386843]
We present a novel framework that reconstructs a local map formed by road layout and vehicle occupancy in the bird's-eye view.
Our model runs at 25 FPS on a single GPU, which is efficient and applicable for real-time panorama HD map reconstruction.
arXiv Detail & Related papers (2022-11-15T13:52:41Z) - Content-aware Warping for View Synthesis [110.54435867693203]
We propose content-aware warping, which adaptively learns the weights for pixels of a relatively large neighborhood from their contextual information via a lightweight neural network.
Based on this learnable warping module, we propose a new end-to-end learning-based framework for novel view synthesis from two source views.
Experimental results on structured light field datasets with wide baselines and unstructured multi-view datasets show that the proposed method significantly outperforms state-of-the-art methods both quantitatively and visually.
arXiv Detail & Related papers (2022-01-22T11:35:05Z) - BoundarySqueeze: Image Segmentation as Boundary Squeezing [104.43159799559464]
We propose a novel method for fine-grained high-quality image segmentation of both objects and scenes.
Inspired by dilation and erosion from morphological image processing techniques, we treat the pixel level segmentation problems as squeezing object boundary.
Our method yields large gains on COCO, Cityscapes, for both instance and semantic segmentation and outperforms previous state-of-the-art PointRend in both accuracy and speed under the same setting.
arXiv Detail & Related papers (2021-05-25T04:58:51Z) - Multitarget Tracking with Transformers [21.81266872964314]
Multitarget Tracking (MTT) is a problem of tracking the states of an unknown number of objects using noisy measurements.
In this paper, we propose a high-performing deep-learning method for MTT based on the Transformer architecture.
arXiv Detail & Related papers (2021-04-01T19:14:55Z) - Monocular Real-Time Volumetric Performance Capture [28.481131687883256]
We present the first approach to volumetric performance capture and novel-view rendering at real-time speed from monocular video.
Our system reconstructs a fully textured 3D human from each frame by leveraging Pixel-Aligned Implicit Function (PIFu)
We also introduce an Online Hard Example Mining (OHEM) technique that effectively suppresses failure modes due to the rare occurrence of challenging examples.
arXiv Detail & Related papers (2020-07-28T04:45:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.