Discovering Spatial Relationships by Transformers for Domain
Generalization
- URL: http://arxiv.org/abs/2108.10046v1
- Date: Mon, 23 Aug 2021 10:35:38 GMT
- Title: Discovering Spatial Relationships by Transformers for Domain
Generalization
- Authors: Cuicui Kang and Karthik Nandakumar
- Abstract summary: Domain generalization is a challenging problem thanks to the fast development of AI techniques in computer vision.
Most advanced algorithms are proposed with deep architectures based on convolution neural nets (CNN)
- Score: 8.106918528575267
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Due to the rapid increase in the diversity of image data, the problem of
domain generalization has received increased attention recently. While domain
generalization is a challenging problem, it has achieved great development
thanks to the fast development of AI techniques in computer vision. Most of
these advanced algorithms are proposed with deep architectures based on
convolution neural nets (CNN). However, though CNNs have a strong ability to
find the discriminative features, they do a poor job of modeling the relations
between different locations in the image due to the response to CNN filters are
mostly local. Since these local and global spatial relationships are
characterized to distinguish an object under consideration, they play a
critical role in improving the generalization ability against the domain gap.
In order to get the object parts relationships to gain better domain
generalization, this work proposes to use the self attention model. However,
the attention models are proposed for sequence, which are not expert in
discriminate feature extraction for 2D images. Considering this, we proposed a
hybrid architecture to discover the spatial relationships between these local
features, and derive a composite representation that encodes both the
discriminative features and their relationships to improve the domain
generalization. Evaluation on three well-known benchmarks demonstrates the
benefits of modeling relationships between the features of an image using the
proposed method and achieves state-of-the-art domain generalization
performance. More specifically, the proposed algorithm outperforms the
state-of-the-art by $2.2\%$ and $3.4\%$ on PACS and Office-Home databases,
respectively.
Related papers
- Semantic Segmentation for Real-World and Synthetic Vehicle's Forward-Facing Camera Images [0.8562182926816566]
This is the solution for semantic segmentation problem in both real-world and synthetic images from a vehicle s forward-facing camera.
We concentrate in building a robust model which performs well across various domains of different outdoor situations.
This paper studies the effectiveness of employing real-world and synthetic data to handle the domain adaptation in semantic segmentation problem.
arXiv Detail & Related papers (2024-07-07T17:28:45Z) - Domain Generalization for In-Orbit 6D Pose Estimation [14.624172952608653]
We introduce a novel, end-to-end, neural-based architecture for spacecraft pose estimation networks.
We demonstrate that our method effectively closes the domain gap, achieving state-of-the-art accuracy on the widespread SPEED+ dataset.
arXiv Detail & Related papers (2024-06-17T17:01:20Z) - AF$_2$: Adaptive Focus Framework for Aerial Imagery Segmentation [86.44683367028914]
Aerial imagery segmentation has some unique challenges, the most critical one among which lies in foreground-background imbalance.
We propose Adaptive Focus Framework (AF$), which adopts a hierarchical segmentation procedure and focuses on adaptively utilizing multi-scale representations.
AF$ has significantly improved the accuracy on three widely used aerial benchmarks, as fast as the mainstream method.
arXiv Detail & Related papers (2022-02-18T10:14:45Z) - Domain Generalization via Frequency-based Feature Disentanglement and
Interaction [23.61154228837516]
Domain generalization aims at mining domain-irrelevant knowledge from multiple source domains.
We introduce (i) an encoder-decoder structure for high-frequency and low-frequency feature disentangling, (ii) an information interaction mechanism that ensures helpful knowledge from both parts can cooperate effectively.
The proposed method obtains state-of-the-art results on three widely used domain generalization benchmarks.
arXiv Detail & Related papers (2022-01-20T07:42:12Z) - A Unified Architecture of Semantic Segmentation and Hierarchical
Generative Adversarial Networks for Expression Manipulation [52.911307452212256]
We develop a unified architecture of semantic segmentation and hierarchical GANs.
A unique advantage of our framework is that on forward pass the semantic segmentation network conditions the generative model.
We evaluate our method on two challenging facial expression translation benchmarks, AffectNet and RaFD, and a semantic segmentation benchmark, CelebAMask-HQ.
arXiv Detail & Related papers (2021-12-08T22:06:31Z) - An attention-driven hierarchical multi-scale representation for visual
recognition [3.3302293148249125]
Convolutional Neural Networks (CNNs) have revolutionized the understanding of visual content.
We propose a method to capture high-level long-range dependencies by exploring Graph Convolutional Networks (GCNs)
Our approach is simple yet extremely effective in solving both the fine-grained and generic visual classification problems.
arXiv Detail & Related papers (2021-10-23T09:22:22Z) - Cross-Domain Facial Expression Recognition: A Unified Evaluation
Benchmark and Adversarial Graph Learning [85.6386289476598]
We develop a novel adversarial graph representation adaptation (AGRA) framework for cross-domain holistic-local feature co-adaptation.
We conduct extensive and fair evaluations on several popular benchmarks and show that the proposed AGRA framework outperforms previous state-of-the-art methods.
arXiv Detail & Related papers (2020-08-03T15:00:31Z) - Adversarial Bipartite Graph Learning for Video Domain Adaptation [50.68420708387015]
Domain adaptation techniques, which focus on adapting models between distributionally different domains, are rarely explored in the video recognition area.
Recent works on visual domain adaptation which leverage adversarial learning to unify the source and target video representations are not highly effective on the videos.
This paper proposes an Adversarial Bipartite Graph (ABG) learning framework which directly models the source-target interactions.
arXiv Detail & Related papers (2020-07-31T03:48:41Z) - Domain Conditioned Adaptation Network [90.63261870610211]
We propose a Domain Conditioned Adaptation Network (DCAN) to excite distinct convolutional channels with a domain conditioned channel attention mechanism.
This is the first work to explore the domain-wise convolutional channel activation for deep DA networks.
arXiv Detail & Related papers (2020-05-14T04:23:24Z) - Image Fine-grained Inpainting [89.17316318927621]
We present a one-stage model that utilizes dense combinations of dilated convolutions to obtain larger and more effective receptive fields.
To better train this efficient generator, except for frequently-used VGG feature matching loss, we design a novel self-guided regression loss.
We also employ a discriminator with local and global branches to ensure local-global contents consistency.
arXiv Detail & Related papers (2020-02-07T03:45:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.