DHOG: Deep Hierarchical Object Grouping
- URL: http://arxiv.org/abs/2003.08821v1
- Date: Fri, 13 Mar 2020 14:11:48 GMT
- Title: DHOG: Deep Hierarchical Object Grouping
- Authors: Luke Nicholas Darlow, Amos Storkey
- Abstract summary: We show that greedy or local methods of maximising mutual information (such as gradient optimisation) discover local optima of the mutual information criterion.
We introduce deep hierarchical object grouping (DHOG) that computes a number distinct discrete representations of images in a hierarchical order.
We find that these representations align better with the downstream task of grouping into underlying object classes.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recently, a number of competitive methods have tackled unsupervised
representation learning by maximising the mutual information between the
representations produced from augmentations. The resulting representations are
then invariant to stochastic augmentation strategies, and can be used for
downstream tasks such as clustering or classification. Yet data augmentations
preserve many properties of an image and so there is potential for a suboptimal
choice of representation that relies on matching easy-to-find features in the
data. We demonstrate that greedy or local methods of maximising mutual
information (such as stochastic gradient optimisation) discover local optima of
the mutual information criterion; the resulting representations are also
less-ideally suited to complex downstream tasks. Earlier work has not
specifically identified or addressed this issue. We introduce deep hierarchical
object grouping (DHOG) that computes a number of distinct discrete
representations of images in a hierarchical order, eventually generating
representations that better optimise the mutual information objective. We also
find that these representations align better with the downstream task of
grouping into underlying object classes. We tested DHOG on unsupervised
clustering, which is a natural downstream test as the target representation is
a discrete labelling of the data. We achieved new state-of-the-art results on
the three main benchmarks without any prefiltering or Sobel-edge detection that
proved necessary for many previous methods to work. We obtain accuracy
improvements of: 4.3% on CIFAR-10, 1.5% on CIFAR-100-20, and 7.2% on SVHN.
Related papers
- Generalized Category Discovery with Clustering Assignment Consistency [56.92546133591019]
Generalized category discovery (GCD) is a recently proposed open-world task.
We propose a co-training-based framework that encourages clustering consistency.
Our method achieves state-of-the-art performance on three generic benchmarks and three fine-grained visual recognition datasets.
arXiv Detail & Related papers (2023-10-30T00:32:47Z) - Fine-grained Recognition with Learnable Semantic Data Augmentation [68.48892326854494]
Fine-grained image recognition is a longstanding computer vision challenge.
We propose diversifying the training data at the feature-level to alleviate the discriminative region loss problem.
Our method significantly improves the generalization performance on several popular classification networks.
arXiv Detail & Related papers (2023-09-01T11:15:50Z) - From Chaos Comes Order: Ordering Event Representations for Object
Recognition and Detection [29.653946064645705]
We show how to select the appropriate representation for a task based on the Gromov-Wasserstein Discrepancy (GWD) between raw events and their representation.
It is about 200 times faster to compute than training a neural network and preserves the task performance ranking of event representations.
Our optimized representations outperform existing representations by 1.7 mAP on the 1 Mpx dataset and 0.3 mAP on the Gen1 dataset, two established object detection benchmarks, and reach a 3.8% higher classification score on the mini N-ImageNet benchmark.
arXiv Detail & Related papers (2023-04-26T11:27:34Z) - C3: Cross-instance guided Contrastive Clustering [8.953252452851862]
Clustering is the task of gathering similar data samples into clusters without using any predefined labels.
We propose a novel contrastive clustering method, Cross-instance guided Contrastive Clustering (C3)
Our proposed method can outperform state-of-the-art algorithms on benchmark computer vision datasets.
arXiv Detail & Related papers (2022-11-14T06:28:07Z) - On the Eigenvalues of Global Covariance Pooling for Fine-grained Visual
Recognition [65.67315418971688]
We show that truncating small eigenvalues of the Global Covariance Pooling (GCP) can attain smoother gradient.
On fine-grained datasets, truncating the small eigenvalues would make the model fail to converge.
Inspired by this observation, we propose a network branch dedicated to magnifying the importance of small eigenvalues.
arXiv Detail & Related papers (2022-05-26T11:41:36Z) - PointInst3D: Segmenting 3D Instances by Points [136.7261709896713]
We propose a fully-convolutional 3D point cloud instance segmentation method that works in a per-point prediction fashion.
We find the key to its success is assigning a suitable target to each sampled point.
Our approach achieves promising results on both ScanNet and S3DIS benchmarks.
arXiv Detail & Related papers (2022-04-25T02:41:46Z) - Top-Down Deep Clustering with Multi-generator GANs [0.0]
Deep clustering (DC) learns embedding spaces that are optimal for cluster analysis.
We propose HC-MGAN, a new technique based on GANs with multiple generators (MGANs)
Our method is inspired by the observation that each generator of a MGAN tends to generate data that correlates with a sub-region of the real data distribution.
arXiv Detail & Related papers (2021-12-06T22:53:12Z) - Clustering by Maximizing Mutual Information Across Views [62.21716612888669]
We propose a novel framework for image clustering that incorporates joint representation learning and clustering.
Our method significantly outperforms state-of-the-art single-stage clustering methods across a variety of image datasets.
arXiv Detail & Related papers (2021-07-24T15:36:49Z) - Adversarial Feature Augmentation and Normalization for Visual
Recognition [109.6834687220478]
Recent advances in computer vision take advantage of adversarial data augmentation to ameliorate the generalization ability of classification models.
Here, we present an effective and efficient alternative that advocates adversarial augmentation on intermediate feature embeddings.
We validate the proposed approach across diverse visual recognition tasks with representative backbone networks.
arXiv Detail & Related papers (2021-03-22T20:36:34Z) - Multi-Person Pose Estimation with Enhanced Feature Aggregation and
Selection [33.15192824888279]
We propose a novel Enhanced Feature Aggregation and Selection network (EFASNet) for multi-person 2D human pose estimation.
Our method can well handle crowded, cluttered and occluded scenes.
Comprehensive experiments demonstrate that the proposed approach outperforms the state-of-the-art methods.
arXiv Detail & Related papers (2020-03-20T08:33:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.