Quantifying the synthetic and real domain gap in aerial scene understanding
- URL: http://arxiv.org/abs/2411.19913v1
- Date: Fri, 29 Nov 2024 18:18:26 GMT
- Title: Quantifying the synthetic and real domain gap in aerial scene understanding
- Authors: Alina Marcu,
- Abstract summary: This paper introduces a novel methodology for scene complexity assessment using Multi-Model Consensus Metric (MMCM) and depth-based structural metrics.<n>Our experimental analysis, utilizing real-world (Dronescapes) and synthetic (Skyscenes) datasets, demonstrates that real-world scenes generally exhibit higher consensus among state-of-the-art vision transformers.<n>The results underline the inherent complexities and domain gaps, emphasizing the need for enhanced simulation fidelity and model generalization.
- Score: 1.696456370910212
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Quantifying the gap between synthetic and real-world imagery is essential for improving both transformer-based models - that rely on large volumes of data - and datasets, especially in underexplored domains like aerial scene understanding where the potential impact is significant. This paper introduces a novel methodology for scene complexity assessment using Multi-Model Consensus Metric (MMCM) and depth-based structural metrics, enabling a robust evaluation of perceptual and structural disparities between domains. Our experimental analysis, utilizing real-world (Dronescapes) and synthetic (Skyscenes) datasets, demonstrates that real-world scenes generally exhibit higher consensus among state-of-the-art vision transformers, while synthetic scenes show greater variability and challenge model adaptability. The results underline the inherent complexities and domain gaps, emphasizing the need for enhanced simulation fidelity and model generalization. This work provides critical insights into the interplay between domain characteristics and model performance, offering a pathway for improved domain adaptation strategies in aerial scene understanding.
Related papers
- Topology-Aware Modeling for Unsupervised Simulation-to-Reality Point Cloud Recognition [63.55828203989405]
We introduce a novel Topology-Aware Modeling (TAM) framework for Sim2Real UDA on object point clouds.<n>Our approach mitigates the domain gap by leveraging global spatial topology, characterized by low-level, high-frequency 3D structures.<n>We propose an advanced self-training strategy that combines cross-domain contrastive learning with self-training.
arXiv Detail & Related papers (2025-06-26T11:53:59Z) - Exploiting Aggregation and Segregation of Representations for Domain Adaptive Human Pose Estimation [50.31351006532924]
Human pose estimation (HPE) has received increasing attention recently due to its wide application in motion analysis, virtual reality, healthcare, etc.
It suffers from the lack of labeled diverse real-world datasets due to the time- and labor-intensive annotation.
We introduce a novel framework that capitalizes on both representation aggregation and segregation for domain adaptive human pose estimation.
arXiv Detail & Related papers (2024-12-29T17:59:45Z) - Feature Based Methods in Domain Adaptation for Object Detection: A Review Paper [0.6437284704257459]
Domain adaptation aims to enhance the performance of machine learning models when deployed in target domains with distinct data distributions.
This review delves into advanced methodologies for domain adaptation, including adversarial learning, discrepancy-based, multi-domain, teacher-student, ensemble, and Vision Language Models.
Special attention is given to strategies that minimize the reliance on extensive labeled data, particularly in scenarios involving synthetic-to-real domain shifts.
arXiv Detail & Related papers (2024-12-23T06:34:23Z) - VLPose: Bridging the Domain Gap in Pose Estimation with Language-Vision
Tuning [53.35114015288077]
We bridge the domain gap between natural and artificial scenarios with efficient tuning strategies.
We develop a novel framework called VLPose to extend the generalization and robustness of pose estimation models.
Our approach has demonstrated improvements of 2.26% and 3.74% on HumanArt and MSCOCO, respectively.
arXiv Detail & Related papers (2024-02-22T11:21:54Z) - Towards Full-scene Domain Generalization in Multi-agent Collaborative Bird's Eye View Segmentation for Connected and Autonomous Driving [49.03947018718156]
We propose a unified domain generalization framework to be utilized during the training and inference stages of collaborative perception.
We also introduce an intra-system domain alignment mechanism to reduce or potentially eliminate the domain discrepancy among connected and autonomous vehicles.
arXiv Detail & Related papers (2023-11-28T12:52:49Z) - Pre-training Contextualized World Models with In-the-wild Videos for
Reinforcement Learning [54.67880602409801]
In this paper, we study the problem of pre-training world models with abundant in-the-wild videos for efficient learning of visual control tasks.
We introduce Contextualized World Models (ContextWM) that explicitly separate context and dynamics modeling.
Our experiments show that in-the-wild video pre-training equipped with ContextWM can significantly improve the sample efficiency of model-based reinforcement learning.
arXiv Detail & Related papers (2023-05-29T14:29:12Z) - Domain-Adaptive Full-Face Gaze Estimation via Novel-View-Synthesis and Feature Disentanglement [12.857137513211866]
We propose an effective model training pipeline consisting of a training data synthesis and a gaze estimation model for unsupervised domain adaptation.
The proposed data synthesis leverages the single-image 3D reconstruction to expand the range of the head poses from the source domain without requiring a 3D facial shape dataset.
We propose a disentangling autoencoder network to separate gaze-related features and introduce background augmentation consistency loss to utilize the characteristics of the synthetic source domain.
arXiv Detail & Related papers (2023-05-25T15:15:03Z) - Synthetic-to-Real Domain Adaptation for Action Recognition: A Dataset and Baseline Performances [76.34037366117234]
We introduce a new dataset called Robot Control Gestures (RoCoG-v2)
The dataset is composed of both real and synthetic videos from seven gesture classes.
We present results using state-of-the-art action recognition and domain adaptation algorithms.
arXiv Detail & Related papers (2023-03-17T23:23:55Z) - Domain Adaptation of Synthetic Driving Datasets for Real-World
Autonomous Driving [0.11470070927586014]
Network trained with synthetic data for certain computer vision tasks degrade significantly when tested on real world data.
In this paper, we propose and evaluate novel ways for the betterment of such approaches.
We propose a novel method to efficiently incorporate semantic supervision into this pair selection, which helps in boosting the performance of the model.
arXiv Detail & Related papers (2023-02-08T15:51:54Z) - One-Shot Domain Adaptive and Generalizable Semantic Segmentation with
Class-Aware Cross-Domain Transformers [96.51828911883456]
Unsupervised sim-to-real domain adaptation (UDA) for semantic segmentation aims to improve the real-world test performance of a model trained on simulated data.
Traditional UDA often assumes that there are abundant unlabeled real-world data samples available during training for the adaptation.
We explore the one-shot unsupervised sim-to-real domain adaptation (OSUDA) and generalization problem, where only one real-world data sample is available.
arXiv Detail & Related papers (2022-12-14T15:54:15Z) - Style-Hallucinated Dual Consistency Learning for Domain Generalized
Semantic Segmentation [117.3856882511919]
We propose the Style-HAllucinated Dual consistEncy learning (SHADE) framework to handle domain shift.
Our SHADE yields significant improvement and outperforms state-of-the-art methods by 5.07% and 8.35% on the average mIoU of three real-world datasets.
arXiv Detail & Related papers (2022-04-06T02:49:06Z) - Content Disentanglement for Semantically Consistent
Synthetic-to-RealDomain Adaptation in Urban Traffic Scenes [39.38387505091648]
Synthetic data generation is an appealing approach to generate novel traffic scenarios in autonomous driving.
Deep learning techniques trained solely on synthetic data encounter dramatic performance drops when they are tested on real data.
We propose a new, unsupervised, end-to-end domain adaptation network architecture that enables semantically consistent domain adaptation between synthetic and real data.
arXiv Detail & Related papers (2021-05-18T17:42:26Z) - Generative Adversarial Transformers [13.633811200719627]
We introduce the GANsformer, a novel and efficient type of transformer, and explore it for the task of visual generative modeling.
The network employs a bipartite structure that enables long-range interactions across the image, while maintaining computation of linearly efficiency.
We show it achieves state-of-the-art results in terms of image quality and diversity, while enjoying fast learning and better data-efficiency.
arXiv Detail & Related papers (2021-03-01T18:54:04Z) - Domain Decluttering: Simplifying Images to Mitigate Synthetic-Real
Domain Shift and Improve Depth Estimation [16.153683223016973]
We develop an attention module that learns to identify and remove difficult out-of-domain regions in real images.
Visualizing the removed regions provides interpretable insights into the synthetic-real domain gap.
arXiv Detail & Related papers (2020-02-27T14:28:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.