GPS as a Control Signal for Image Generation
- URL: http://arxiv.org/abs/2501.12390v2
- Date: Wed, 22 Jan 2025 05:07:28 GMT
- Title: GPS as a Control Signal for Image Generation
- Authors: Chao Feng, Ziyang Chen, Aleksander Holynski, Alexei A. Efros, Andrew Owens,
- Abstract summary: We show that the GPS tags contained in photo metadata provide a useful control signal for image generation.
We train GPS-to-image models and use them for tasks that require a fine-grained understanding of how images vary within a city.
- Score: 95.43433150105385
- License:
- Abstract: We show that the GPS tags contained in photo metadata provide a useful control signal for image generation. We train GPS-to-image models and use them for tasks that require a fine-grained understanding of how images vary within a city. In particular, we train a diffusion model to generate images conditioned on both GPS and text. The learned model generates images that capture the distinctive appearance of different neighborhoods, parks, and landmarks. We also extract 3D models from 2D GPS-to-image models through score distillation sampling, using GPS conditioning to constrain the appearance of the reconstruction from each viewpoint. Our evaluations suggest that our GPS-conditioned models successfully learn to generate images that vary based on location, and that GPS conditioning improves estimated 3D structure.
Related papers
- GAReT: Cross-view Video Geolocalization with Adapters and Auto-Regressive Transformers [53.80009458891537]
Cross-view video geo-localization aims to derive GPS trajectories from street-view videos by aligning them with aerial-view images.
Current CVGL methods use camera and odometry data, typically absent in real-world scenarios.
We propose GAReT, a fully transformer-based method for CVGL that does not require camera and odometry data.
arXiv Detail & Related papers (2024-08-05T21:29:33Z) - G3R: Generating Rich and Fine-grained mmWave Radar Data from 2D Videos for Generalized Gesture Recognition [19.95047010486547]
We develop a software pipeline that exploits wealthy 2D videos to generate realistic radar data.
It addresses the challenge of simulating diversified and fine-grained reflection properties of user gestures.
We implement and evaluate G3R using 2D videos from public data sources and self-collected real-world radar data.
arXiv Detail & Related papers (2024-04-23T11:22:59Z) - Parsing is All You Need for Accurate Gait Recognition in the Wild [51.206166843375364]
This paper presents a novel gait representation, named Gait Parsing Sequence (GPS)
GPSs are sequences of fine-grained human segmentation, extracted from video frames, so they have much higher information entropy.
We also propose a novel human parsing-based gait recognition framework, named ParsingGait.
The experimental results show a significant improvement in accuracy brought by the GPS representation and the superiority of ParsingGait.
arXiv Detail & Related papers (2023-08-31T13:57:38Z) - GeoMIM: Towards Better 3D Knowledge Transfer via Masked Image Modeling
for Multi-view 3D Understanding [42.780417042750315]
Multi-view camera-based 3D detection is a challenging problem in computer vision.
Recent works leverage a pretrained LiDAR detection model to transfer knowledge to a camera-based student network.
We propose Enhanced Geometry Masked Image Modeling (GeoMIM) to transfer the knowledge of the LiDAR model in a pretrain-finetune paradigm.
arXiv Detail & Related papers (2023-03-20T17:59:03Z) - 3D Data Augmentation for Driving Scenes on Camera [50.41413053812315]
We propose a 3D data augmentation approach termed Drive-3DAug, aiming at augmenting the driving scenes on camera in the 3D space.
We first utilize Neural Radiance Field (NeRF) to reconstruct the 3D models of background and foreground objects.
Then, augmented driving scenes can be obtained by placing the 3D objects with adapted location and orientation at the pre-defined valid region of backgrounds.
arXiv Detail & Related papers (2023-03-18T05:51:05Z) - 3D generation on ImageNet [76.0440752186121]
We develop a 3D generator with Generic Priors (3DGP): a 3D synthesis framework with more general assumptions about the training data.
Our model is based on three new ideas.
We explore our model on four datasets: SDIP Dogs 256x256, SDIP Elephants 256x256, LSUN Horses 256x256, and ImageNet 256x256.
arXiv Detail & Related papers (2023-03-02T17:06:57Z) - Unsupervised Visual Odometry and Action Integration for PointGoal
Navigation in Indoor Environment [14.363948775085534]
PointGoal navigation in indoor environment is a fundamental task for personal robots to navigate to a specified point.
To improve the PointGoal navigation accuracy without GPS signal, we use visual odometry (VO) and propose a novel action integration module (AIM) trained in unsupervised manner.
Experiments show that the proposed system achieves satisfactory results and outperforms the partially supervised learning algorithms on the popular Gibson dataset.
arXiv Detail & Related papers (2022-10-02T03:12:03Z) - Multimodal Scale Consistency and Awareness for Monocular Self-Supervised
Depth Estimation [1.1470070927586016]
Self-supervised approaches on monocular videos suffer from scale-inconsistency across long sequences.
We propose a dynamically-weighted GPS-to-Scale (g2s) loss to complement the appearance-based losses.
We demonstrate scale-consistent and -aware depth estimation during inference, improving the performance even when training with low-frequency GPS data.
arXiv Detail & Related papers (2021-03-03T15:39:41Z) - PerMO: Perceiving More at Once from a Single Image for Autonomous
Driving [76.35684439949094]
We present a novel approach to detect, segment, and reconstruct complete textured 3D models of vehicles from a single image.
Our approach combines the strengths of deep learning and the elegance of traditional techniques.
We have integrated these algorithms with an autonomous driving system.
arXiv Detail & Related papers (2020-07-16T05:02:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.