Prompt-Driven Dynamic Object-Centric Learning for Single Domain
Generalization
- URL: http://arxiv.org/abs/2402.18447v1
- Date: Wed, 28 Feb 2024 16:16:51 GMT
- Title: Prompt-Driven Dynamic Object-Centric Learning for Single Domain
Generalization
- Authors: Deng Li, Aming Wu, Yaowei Wang and Yahong Han
- Abstract summary: Single-domain generalization aims to learn a model from single source domain data to achieve generalized performance on other unseen target domains.
We propose a dynamic object-centric perception network based on prompt learning, aiming to adapt to the variations in image complexity.
- Score: 61.64304227831361
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Single-domain generalization aims to learn a model from single source domain
data to achieve generalized performance on other unseen target domains.
Existing works primarily focus on improving the generalization ability of
static networks. However, static networks are unable to dynamically adapt to
the diverse variations in different image scenes, leading to limited
generalization capability. Different scenes exhibit varying levels of
complexity, and the complexity of images further varies significantly in
cross-domain scenarios. In this paper, we propose a dynamic object-centric
perception network based on prompt learning, aiming to adapt to the variations
in image complexity. Specifically, we propose an object-centric gating module
based on prompt learning to focus attention on the object-centric features
guided by the various scene prompts. Then, with the object-centric gating
masks, the dynamic selective module dynamically selects highly correlated
feature regions in both spatial and channel dimensions enabling the model to
adaptively perceive object-centric relevant features, thereby enhancing the
generalization capability. Extensive experiments were conducted on
single-domain generalization tasks in image classification and object
detection. The experimental results demonstrate that our approach outperforms
state-of-the-art methods, which validates the effectiveness and generally of
our proposed method.
Related papers
- GOOD: Towards Domain Generalized Orientated Object Detection [39.76969237020444]
Oriented object detection has been rapidly developed in the past few years, but most of these methods assume the training and testing images are under the same statistical distribution.
We propose the task of domain generalized oriented object detection, which intends to explore the generalization of oriented object detectors on arbitrary unseen target domains.
arXiv Detail & Related papers (2024-02-20T07:12:22Z) - HCVP: Leveraging Hierarchical Contrastive Visual Prompt for Domain
Generalization [69.33162366130887]
Domain Generalization (DG) endeavors to create machine learning models that excel in unseen scenarios by learning invariant features.
We introduce a novel method designed to supplement the model with domain-level and task-specific characteristics.
This approach aims to guide the model in more effectively separating invariant features from specific characteristics, thereby boosting the generalization.
arXiv Detail & Related papers (2024-01-18T04:23:21Z) - Aligning and Prompting Everything All at Once for Universal Visual
Perception [79.96124061108728]
APE is a universal visual perception model for aligning and prompting everything all at once in an image to perform diverse tasks.
APE advances the convergence of detection and grounding by reformulating language-guided grounding as open-vocabulary detection.
Experiments on over 160 datasets demonstrate that APE outperforms state-of-the-art models.
arXiv Detail & Related papers (2023-12-04T18:59:50Z) - Single Domain Dynamic Generalization for Iris Presentation Attack
Detection [41.126916126040655]
Iris presentation generalization has achieved great success under intra-domain settings but easily degrades on unseen domains.
We propose a Single Domain Dynamic Generalization (SDDG) framework, which exploits domain-invariant and domain-specific features on a per-sample basis.
The proposed method is effective and outperforms the state-of-the-art on LivDet-Iris 2017 dataset.
arXiv Detail & Related papers (2023-05-22T07:54:13Z) - AF$_2$: Adaptive Focus Framework for Aerial Imagery Segmentation [86.44683367028914]
Aerial imagery segmentation has some unique challenges, the most critical one among which lies in foreground-background imbalance.
We propose Adaptive Focus Framework (AF$), which adopts a hierarchical segmentation procedure and focuses on adaptively utilizing multi-scale representations.
AF$ has significantly improved the accuracy on three widely used aerial benchmarks, as fast as the mainstream method.
arXiv Detail & Related papers (2022-02-18T10:14:45Z) - AFAN: Augmented Feature Alignment Network for Cross-Domain Object
Detection [90.18752912204778]
Unsupervised domain adaptation for object detection is a challenging problem with many real-world applications.
We propose a novel augmented feature alignment network (AFAN) which integrates intermediate domain image generation and domain-adversarial training.
Our approach significantly outperforms the state-of-the-art methods on standard benchmarks for both similar and dissimilar domain adaptations.
arXiv Detail & Related papers (2021-06-10T05:01:20Z) - Multi-modal Visual Place Recognition in Dynamics-Invariant Perception
Space [23.43468556831308]
This letter explores the use of multi-modal fusion of semantic and visual modalities to improve place recognition in dynamic environments.
We achieve this by first designing a novel deep learning architecture to generate the static semantic segmentation.
We then innovatively leverage the spatial-pyramid-matching model to encode the static semantic segmentation into feature vectors.
In parallel, the static image is encoded using the popular Bag-of-words model.
arXiv Detail & Related papers (2021-05-17T13:14:52Z) - Learning to Compose Hypercolumns for Visual Correspondence [57.93635236871264]
We introduce a novel approach to visual correspondence that dynamically composes effective features by leveraging relevant layers conditioned on the images to match.
The proposed method, dubbed Dynamic Hyperpixel Flow, learns to compose hypercolumn features on the fly by selecting a small number of relevant layers from a deep convolutional neural network.
arXiv Detail & Related papers (2020-07-21T04:03:22Z) - Generalizable Model-agnostic Semantic Segmentation via Target-specific
Normalization [24.14272032117714]
We propose a novel domain generalization framework for the generalizable semantic segmentation task.
We exploit the model-agnostic learning to simulate the domain shift problem.
Considering the data-distribution discrepancy between seen source and unseen target domains, we develop the target-specific normalization scheme.
arXiv Detail & Related papers (2020-03-27T09:25:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.