CROVIA: Seeing Drone Scenes from Car Perspective via Cross-View
Adaptation
- URL: http://arxiv.org/abs/2304.07199v1
- Date: Fri, 14 Apr 2023 15:20:40 GMT
- Title: CROVIA: Seeing Drone Scenes from Car Perspective via Cross-View
Adaptation
- Authors: Thanh-Dat Truong, Chi Nhan Duong, Ashley Dowling, Son Lam Phung,
Jackson Cothren, Khoa Luu
- Abstract summary: We propose a novel Cross-View Adaptation (CROVIA) approach to adapt the knowledge learned from on-road vehicle views to UAV views.
First, a novel geometry-based constraint to cross-view adaptation is introduced based on the geometry correlation between views.
Second, cross-view correlations from image space are effectively transferred to segmentation space without any requirement of paired on-road and UAV view data.
- Score: 20.476683921252867
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Understanding semantic scene segmentation of urban scenes captured from the
Unmanned Aerial Vehicles (UAV) perspective plays a vital role in building a
perception model for UAV. With the limitations of large-scale densely labeled
data, semantic scene segmentation for UAV views requires a broad understanding
of an object from both its top and side views. Adapting from well-annotated
autonomous driving data to unlabeled UAV data is challenging due to the
cross-view differences between the two data types. Our work proposes a novel
Cross-View Adaptation (CROVIA) approach to effectively adapt the knowledge
learned from on-road vehicle views to UAV views. First, a novel geometry-based
constraint to cross-view adaptation is introduced based on the geometry
correlation between views. Second, cross-view correlations from image space are
effectively transferred to segmentation space without any requirement of paired
on-road and UAV view data via a new Geometry-Constraint Cross-View (GeiCo)
loss. Third, the multi-modal bijective networks are introduced to enforce the
global structural modeling across views. Experimental results on new cross-view
adaptation benchmarks introduced in this work, i.e., SYNTHIA to UAVID and GTA5
to UAVID, show the State-of-the-Art (SOTA) performance of our approach over
prior adaptation methods
Related papers
- PPTFormer: Pseudo Multi-Perspective Transformer for UAV Segmentation [18.585299793391748]
We introduce the PPTFormer, a novel textbfPseudo Multi-textbfPerspective textbfTranstextbfformer network.
Our approach circumvents the need for actual multi-perspective data by creating pseudo perspectives for enhanced multi-perspective learning.
arXiv Detail & Related papers (2024-06-28T03:43:49Z) - View-Centric Multi-Object Tracking with Homographic Matching in Moving UAV [43.37259596065606]
We address the challenge of multi-object tracking (MOT) in moving Unmanned Aerial Vehicle (UAV) scenarios.
Changes in the scene background not only render traditional frame-to-frame object IOU association methods ineffective but also introduce significant view shifts in the objects.
We propose a novel universal HomView-MOT framework, which for the first time harnesses the view Homography inherent in changing scenes to solve MOT challenges.
arXiv Detail & Related papers (2024-03-16T06:48:33Z) - View Distribution Alignment with Progressive Adversarial Learning for
UAV Visual Geo-Localization [10.442998017077795]
Unmanned Aerial Vehicle (UAV) visual geo-localization aims to match images of the same geographic target captured from different views, i.e., the UAV view and the satellite view.
Previous works map images captured by UAVs and satellites to a shared feature space and employ a classification framework to learn location-dependent features.
This paper introduces distribution alignment of the two views to shorten their distance in a common space.
arXiv Detail & Related papers (2024-01-03T06:58:09Z) - Towards Viewpoint Robustness in Bird's Eye View Segmentation [85.99907496019972]
We study how AV perception models are affected by changes in camera viewpoint.
Small changes to pitch, yaw, depth, or height of the camera at inference time lead to large drops in performance.
We introduce a technique for novel view synthesis and use it to transform collected data to the viewpoint of target rigs.
arXiv Detail & Related papers (2023-09-11T02:10:07Z) - SGDViT: Saliency-Guided Dynamic Vision Transformer for UAV Tracking [12.447854608181833]
This work presents a novel saliency-guided dynamic vision Transformer (SGDViT) for UAV tracking.
The proposed method designs a new task-specific object saliency mining network to refine the cross-correlation operation.
A lightweight saliency filtering Transformer further refines saliency information and increases the focus on appearance information.
arXiv Detail & Related papers (2023-03-08T05:01:00Z) - Street-View Image Generation from a Bird's-Eye View Layout [95.36869800896335]
Bird's-Eye View (BEV) Perception has received increasing attention in recent years.
Data-driven simulation for autonomous driving has been a focal point of recent research.
We propose BEVGen, a conditional generative model that synthesizes realistic and spatially consistent surrounding images.
arXiv Detail & Related papers (2023-01-11T18:39:34Z) - Monocular BEV Perception of Road Scenes via Front-to-Top View Projection [57.19891435386843]
We present a novel framework that reconstructs a local map formed by road layout and vehicle occupancy in the bird's-eye view.
Our model runs at 25 FPS on a single GPU, which is efficient and applicable for real-time panorama HD map reconstruction.
arXiv Detail & Related papers (2022-11-15T13:52:41Z) - Self-aligned Spatial Feature Extraction Network for UAV Vehicle
Re-identification [3.449626476434765]
Vehicles with same color and type show extremely similar appearance from the UAV's perspective.
Recent works tend to extract distinguishing information by regional features and component features.
In order to extract efficient fine-grained features and avoid tedious annotating work, this letter develops an unsupervised self-aligned network.
arXiv Detail & Related papers (2022-01-08T14:25:54Z) - Structured Bird's-Eye-View Traffic Scene Understanding from Onboard
Images [128.881857704338]
We study the problem of extracting a directed graph representing the local road network in BEV coordinates, from a single onboard camera image.
We show that the method can be extended to detect dynamic objects on the BEV plane.
We validate our approach against powerful baselines and show that our network achieves superior performance.
arXiv Detail & Related papers (2021-10-05T12:40:33Z) - Visual Relationship Forecasting in Videos [56.122037294234865]
We present a new task named Visual Relationship Forecasting (VRF) in videos to explore the prediction of visual relationships in a manner of reasoning.
Given a subject-object pair with H existing frames, VRF aims to predict their future interactions for the next T frames without visual evidence.
To evaluate the VRF task, we introduce two video datasets named VRF-AG and VRF-VidOR, with a series oftemporally localized visual relation annotations in a video.
arXiv Detail & Related papers (2021-07-02T16:43:19Z) - Parsing-based View-aware Embedding Network for Vehicle Re-Identification [138.11983486734576]
We propose a parsing-based view-aware embedding network (PVEN) to achieve the view-aware feature alignment and enhancement for vehicle ReID.
The experiments conducted on three datasets show that our model outperforms state-of-the-art methods by a large margin.
arXiv Detail & Related papers (2020-04-10T13:06:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.