SynthGenNet: a self-supervised approach for test-time generalization using synthetic multi-source domain mixing of street view images
- URL: http://arxiv.org/abs/2509.02287v1
- Date: Tue, 02 Sep 2025 13:08:03 GMT
- Title: SynthGenNet: a self-supervised approach for test-time generalization using synthetic multi-source domain mixing of street view images
- Authors: Pushpendra Dhakara, Prachi Chachodhia, Vaibhav Kumar,
- Abstract summary: We introduce SynthGenNet, a self-supervised student-teacher architecture to enable robust test-time domain generalization.<n>Our contributions include the novel ClassMix++ algorithm, which blends labeled data from various synthetic sources.<n>We show our model outperforms the state-of-the-art (relying on single source) by achieving 50% Mean Intersection-Over-Union (mIoU) value on real-world datasets.
- Score: 8.23277995673829
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Unstructured urban environments present unique challenges for scene understanding and generalization due to their complex and diverse layouts. We introduce SynthGenNet, a self-supervised student-teacher architecture designed to enable robust test-time domain generalization using synthetic multi-source imagery. Our contributions include the novel ClassMix++ algorithm, which blends labeled data from various synthetic sources while maintaining semantic integrity, enhancing model adaptability. We further employ Grounded Mask Consistency Loss (GMC), which leverages source ground truth to improve cross-domain prediction consistency and feature alignment. The Pseudo-Label Guided Contrastive Learning (PLGCL) mechanism is integrated into the student network to facilitate domain-invariant feature learning through iterative knowledge distillation from the teacher network. This self-supervised strategy improves prediction accuracy, addresses real-world variability, bridges the sim-to-real domain gap, and reliance on labeled target data, even in complex urban areas. Outcomes show our model outperforms the state-of-the-art (relying on single source) by achieving 50% Mean Intersection-Over-Union (mIoU) value on real-world datasets like Indian Driving Dataset (IDD).
Related papers
- ALIGN-FL: Architecture-independent Learning through Invariant Generative component sharing in Federated Learning [4.073149631618324]
We present ALIGN-FL, a novel approach to distributed learning that addresses the challenge of learning from disjoint data distributions.<n>Instead of exchanging full model parameters, our framework enables privacy-preserving learning by transferring only generative capabilities across clients.
arXiv Detail & Related papers (2025-12-15T13:35:27Z) - Enhancing Retrieval-Augmented Generation with Entity Linking for Educational Platforms [1.7842332554022695]
This study proposes an enhanced RAG architecture that integrates a factual signal derived from Entity Linking.<n>It implements three re-ranking strategies to combine semantic and entity-based information: a hybrid score weighting model, reciprocal rank fusion, and a cross-encoder re-ranker.<n>Results show that, in domain-specific contexts, the hybrid schema based on reciprocal rank fusion significantly outperforms both the baseline and the cross-encoder approach.
arXiv Detail & Related papers (2025-12-05T18:59:18Z) - Transfer Learning Under High-Dimensional Network Convolutional Regression Model [20.18595334666282]
We propose a high-dimensional transfer learning framework based on network convolutional regression ( NCR)<n>Our methodology includes a two-step transfer learning algorithm that addresses domain shift between source and target networks.<n> Empirical evaluations, including simulations and a real-world application using Sina Weibo data, demonstrate substantial improvements in prediction accuracy.
arXiv Detail & Related papers (2025-04-28T16:52:28Z) - FissionVAE: Federated Non-IID Image Generation with Latent Space and Decoder Decomposition [8.444515700910879]
Federated learning enables decentralized clients to collaboratively learn a shared model while keeping all the training data local.<n>In this paper, we address the challenges of non-IID data environments featuring multiple groups of images of different types.<n>We introduce FissionVAE that decouples the latent space and constructs decoder branches tailored to individual client groups.
arXiv Detail & Related papers (2024-08-30T08:22:30Z) - StyDeSty: Min-Max Stylization and Destylization for Single Domain Generalization [85.18995948334592]
Single domain generalization (single DG) aims at learning a robust model generalizable to unseen domains from only one training domain.
State-of-the-art approaches have mostly relied on data augmentations, such as adversarial perturbation and style enhancement, to synthesize new data.
We propose emphStyDeSty, which explicitly accounts for the alignment of the source and pseudo domains in the process of data augmentation.
arXiv Detail & Related papers (2024-06-01T02:41:34Z) - Domain Adaptation of Synthetic Driving Datasets for Real-World
Autonomous Driving [0.11470070927586014]
Network trained with synthetic data for certain computer vision tasks degrade significantly when tested on real world data.
In this paper, we propose and evaluate novel ways for the betterment of such approaches.
We propose a novel method to efficiently incorporate semantic supervision into this pair selection, which helps in boosting the performance of the model.
arXiv Detail & Related papers (2023-02-08T15:51:54Z) - Style-Hallucinated Dual Consistency Learning: A Unified Framework for
Visual Domain Generalization [113.03189252044773]
We propose a unified framework, Style-HAllucinated Dual consistEncy learning (SHADE), to handle domain shift in various visual tasks.
Our versatile SHADE can significantly enhance the generalization in various visual recognition tasks, including image classification, semantic segmentation and object detection.
arXiv Detail & Related papers (2022-12-18T11:42:51Z) - One-Shot Domain Adaptive and Generalizable Semantic Segmentation with
Class-Aware Cross-Domain Transformers [96.51828911883456]
Unsupervised sim-to-real domain adaptation (UDA) for semantic segmentation aims to improve the real-world test performance of a model trained on simulated data.
Traditional UDA often assumes that there are abundant unlabeled real-world data samples available during training for the adaptation.
We explore the one-shot unsupervised sim-to-real domain adaptation (OSUDA) and generalization problem, where only one real-world data sample is available.
arXiv Detail & Related papers (2022-12-14T15:54:15Z) - Style-Hallucinated Dual Consistency Learning for Domain Generalized
Semantic Segmentation [117.3856882511919]
We propose the Style-HAllucinated Dual consistEncy learning (SHADE) framework to handle domain shift.
Our SHADE yields significant improvement and outperforms state-of-the-art methods by 5.07% and 8.35% on the average mIoU of three real-world datasets.
arXiv Detail & Related papers (2022-04-06T02:49:06Z) - GenURL: A General Framework for Unsupervised Representation Learning [58.59752389815001]
Unsupervised representation learning (URL) learns compact embeddings of high-dimensional data without supervision.
We propose a unified similarity-based URL framework, GenURL, which can smoothly adapt to various URL tasks.
Experiments demonstrate that GenURL achieves consistent state-of-the-art performance in self-supervised visual learning, unsupervised knowledge distillation (KD), graph embeddings (GE), and dimension reduction.
arXiv Detail & Related papers (2021-10-27T16:24:39Z) - Exploring Data Aggregation and Transformations to Generalize across
Visual Domains [0.0]
This thesis contributes to research on Domain Generalization (DG), Domain Adaptation (DA) and their variations.
We propose new frameworks for Domain Generalization and Domain Adaptation which make use of feature aggregation strategies and visual transformations.
We show how our proposed solutions outperform competitive state-of-the-art approaches in established DG and DA benchmarks.
arXiv Detail & Related papers (2021-08-20T14:58:14Z) - Rethinking Architecture Design for Tackling Data Heterogeneity in
Federated Learning [53.73083199055093]
We show that attention-based architectures (e.g., Transformers) are fairly robust to distribution shifts.
Our experiments show that replacing convolutional networks with Transformers can greatly reduce catastrophic forgetting of previous devices.
arXiv Detail & Related papers (2021-06-10T21:04:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.