An Explicit Local and Global Representation Disentanglement Framework
with Applications in Deep Clustering and Unsupervised Object Detection
- URL: http://arxiv.org/abs/2001.08957v2
- Date: Mon, 24 Feb 2020 10:11:49 GMT
- Title: An Explicit Local and Global Representation Disentanglement Framework
with Applications in Deep Clustering and Unsupervised Object Detection
- Authors: Rujikorn Charakorn, Yuttapong Thawornwattana, Sirawaj Itthipuripat,
Nick Pawlowski, Poramate Manoonpong, Nat Dilokthanakul
- Abstract summary: We propose a framework, called SPLIT, which allows us to disentangle local and global information.
Our framework adds generative assumption to the variational autoencoder (VAE) framework.
We show that the framework can effectively disentangle local and global information within these models.
- Score: 9.609936822226633
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Visual data can be understood at different levels of granularity, where
global features correspond to semantic-level information and local features
correspond to texture patterns. In this work, we propose a framework, called
SPLIT, which allows us to disentangle local and global information into two
separate sets of latent variables within the variational autoencoder (VAE)
framework. Our framework adds generative assumption to the VAE by requiring a
subset of the latent variables to generate an auxiliary set of observable data.
This additional generative assumption primes the latent variables to local
information and encourages the other latent variables to represent global
information. We examine three different flavours of VAEs with different
generative assumptions. We show that the framework can effectively disentangle
local and global information within these models leads to improved
representation, with better clustering and unsupervised object detection
benchmarks. Finally, we establish connections between SPLIT and recent research
in cognitive neuroscience regarding the disentanglement in human visual
perception. The code for our experiments is at
https://github.com/51616/split-vae .
Related papers
- GSTran: Joint Geometric and Semantic Coherence for Point Cloud Segmentation [33.72549134362884]
We propose GSTran, a novel transformer network tailored for the segmentation task.
The proposed network mainly consists of two principal components: a local geometric transformer and a global semantic transformer.
Experiments on ShapeNetPart and S3DIS benchmarks demonstrate the effectiveness of the proposed method.
arXiv Detail & Related papers (2024-08-21T12:12:37Z) - Other Tokens Matter: Exploring Global and Local Features of Vision Transformers for Object Re-Identification [63.147482497821166]
We first explore the influence of global and local features of ViT and then propose a novel Global-Local Transformer (GLTrans) for high-performance object Re-ID.
Our proposed method achieves superior performance on four object Re-ID benchmarks.
arXiv Detail & Related papers (2024-04-23T12:42:07Z) - DistFormer: Enhancing Local and Global Features for Monocular Per-Object
Distance Estimation [35.6022448037063]
Per-object distance estimation is crucial in safety-critical applications such as autonomous driving, surveillance, and robotics.
Existing approaches rely on two scales: local information (i.e., the bounding box proportions) or global information.
Our work aims to strengthen both local and global cues.
arXiv Detail & Related papers (2024-01-06T10:56:36Z) - Style-Hallucinated Dual Consistency Learning: A Unified Framework for
Visual Domain Generalization [113.03189252044773]
We propose a unified framework, Style-HAllucinated Dual consistEncy learning (SHADE), to handle domain shift in various visual tasks.
Our versatile SHADE can significantly enhance the generalization in various visual recognition tasks, including image classification, semantic segmentation and object detection.
arXiv Detail & Related papers (2022-12-18T11:42:51Z) - SphereVLAD++: Attention-based and Signal-enhanced Viewpoint Invariant
Descriptor [6.326554177747699]
We develop SphereVLAD++, an attention-enhanced viewpoint invariant place recognition method.
We show that SphereVLAD++ outperforms all relative state-of-the-art 3D place recognition methods under small or even totally reversed viewpoint differences.
arXiv Detail & Related papers (2022-07-06T20:32:43Z) - GSMFlow: Generation Shifts Mitigating Flow for Generalized Zero-Shot
Learning [55.79997930181418]
Generalized Zero-Shot Learning aims to recognize images from both the seen and unseen classes by transferring semantic knowledge from seen to unseen classes.
It is a promising solution to take the advantage of generative models to hallucinate realistic unseen samples based on the knowledge learned from the seen classes.
We propose a novel flow-based generative framework that consists of multiple conditional affine coupling layers for learning unseen data generation.
arXiv Detail & Related papers (2022-07-05T04:04:37Z) - SemAffiNet: Semantic-Affine Transformation for Point Cloud Segmentation [94.11915008006483]
We propose SemAffiNet for point cloud semantic segmentation.
We conduct extensive experiments on the ScanNetV2 and NYUv2 datasets.
arXiv Detail & Related papers (2022-05-26T17:00:23Z) - Federated and Generalized Person Re-identification through Domain and
Feature Hallucinating [88.77196261300699]
We study the problem of federated domain generalization (FedDG) for person re-identification (re-ID)
We propose a novel method, called "Domain and Feature Hallucinating (DFH)", to produce diverse features for learning generalized local and global models.
Our method achieves the state-of-the-art performance for FedDG on four large-scale re-ID benchmarks.
arXiv Detail & Related papers (2022-03-05T09:15:13Z) - LCTR: On Awakening the Local Continuity of Transformer for Weakly
Supervised Object Localization [38.376238216214524]
Weakly supervised object localization (WSOL) aims to learn object localizer solely by using image-level labels.
We propose a novel framework built upon the transformer, termed LCTR, which targets at enhancing the local perception capability of global features.
arXiv Detail & Related papers (2021-12-10T01:48:40Z) - Unsupervised Learning of Global Factors in Deep Generative Models [6.362733059568703]
We present a novel deep generative model based on non i.i.d. variational autoencoders.
We show that the model performs domain alignment to find correlations and interpolate between different databases.
We also study the ability of the global space to discriminate between groups of observations with non-trivial underlying structures.
arXiv Detail & Related papers (2020-12-15T11:55:31Z) - Global-Local Bidirectional Reasoning for Unsupervised Representation
Learning of 3D Point Clouds [109.0016923028653]
We learn point cloud representation by bidirectional reasoning between the local structures and the global shape without human supervision.
We show that our unsupervised model surpasses the state-of-the-art supervised methods on both synthetic and real-world 3D object classification datasets.
arXiv Detail & Related papers (2020-03-29T08:26:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.