Learning a Deep Part-based Representation by Preserving Data
Distribution
- URL: http://arxiv.org/abs/2009.08246v1
- Date: Thu, 17 Sep 2020 12:49:36 GMT
- Title: Learning a Deep Part-based Representation by Preserving Data
Distribution
- Authors: Anyong Qin and Zhaowei Shang and Zhuolin Tan and Taiping Zhang and
Yuan Yan Tang
- Abstract summary: Unsupervised dimensionality reduction is one of the commonly used techniques in the field of high dimensional data recognition problems.
In this paper, by preserving the data distribution, a deep part-based representation can be learned, and the novel algorithm is called Distribution Preserving Network Embedding.
The experimental results on the real-world data sets show that the proposed algorithm has good performance in terms of cluster accuracy and AMI.
- Score: 21.13421736154956
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Unsupervised dimensionality reduction is one of the commonly used techniques
in the field of high dimensional data recognition problems. The deep
autoencoder network which constrains the weights to be non-negative, can learn
a low dimensional part-based representation of data. On the other hand, the
inherent structure of the each data cluster can be described by the
distribution of the intraclass samples. Then one hopes to learn a new low
dimensional representation which can preserve the intrinsic structure embedded
in the original high dimensional data space perfectly. In this paper, by
preserving the data distribution, a deep part-based representation can be
learned, and the novel algorithm is called Distribution Preserving Network
Embedding (DPNE). In DPNE, we first need to estimate the distribution of the
original high dimensional data using the $k$-nearest neighbor kernel density
estimation, and then we seek a part-based representation which respects the
above distribution. The experimental results on the real-world data sets show
that the proposed algorithm has good performance in terms of cluster accuracy
and AMI. It turns out that the manifold structure in the raw data can be well
preserved in the low dimensional feature space.
Related papers
- Distributional Reduction: Unifying Dimensionality Reduction and Clustering with Gromov-Wasserstein [56.62376364594194]
Unsupervised learning aims to capture the underlying structure of potentially large and high-dimensional datasets.
In this work, we revisit these approaches under the lens of optimal transport and exhibit relationships with the Gromov-Wasserstein problem.
This unveils a new general framework, called distributional reduction, that recovers DR and clustering as special cases and allows addressing them jointly within a single optimization problem.
arXiv Detail & Related papers (2024-02-03T19:00:19Z) - Deep Manifold Graph Auto-Encoder for Attributed Graph Embedding [51.75091298017941]
This paper proposes a novel Deep Manifold (Variational) Graph Auto-Encoder (DMVGAE/DMGAE) for attributed graph data.
The proposed method surpasses state-of-the-art baseline algorithms by a significant margin on different downstream tasks across popular datasets.
arXiv Detail & Related papers (2024-01-12T17:57:07Z) - Relative intrinsic dimensionality is intrinsic to learning [49.5738281105287]
We introduce a new notion of the intrinsic dimension of a data distribution, which precisely captures the separability properties of the data.
For this intrinsic dimension, the rule of thumb above becomes a law: high intrinsic dimension guarantees highly separable data.
We show thisRelative intrinsic dimension provides both upper and lower bounds on the probability of successfully learning and generalising in a binary classification problem.
arXiv Detail & Related papers (2023-10-10T10:41:45Z) - Learning Structure Aware Deep Spectral Embedding [11.509692423756448]
We propose a novel structure-aware deep spectral embedding by combining a spectral embedding loss and a structure preservation loss.
A deep neural network architecture is proposed that simultaneously encodes both types of information and aims to generate structure-aware spectral embedding.
The proposed algorithm is evaluated on six publicly available real-world datasets.
arXiv Detail & Related papers (2023-05-14T18:18:05Z) - Side-effects of Learning from Low Dimensional Data Embedded in an
Euclidean Space [3.093890460224435]
We study the potential regularization effects associated with the network's depth and noise in needs codimension of the data manifold.
We also present additional side effects in training due to the presence of noise.
arXiv Detail & Related papers (2022-03-01T16:55:51Z) - DeHIN: A Decentralized Framework for Embedding Large-scale Heterogeneous
Information Networks [64.62314068155997]
We present textitDecentralized Embedding Framework for Heterogeneous Information Network (DeHIN) in this paper.
DeHIN presents a context preserving partition mechanism that innovatively formulates a large HIN as a hypergraph.
Our framework then adopts a decentralized strategy to efficiently partition HINs by adopting a tree-like pipeline.
arXiv Detail & Related papers (2022-01-08T04:08:36Z) - Index $t$-SNE: Tracking Dynamics of High-Dimensional Datasets with
Coherent Embeddings [1.7188280334580195]
This paper presents a methodology to reuse an embedding to create a new one, where cluster positions are preserved.
The proposed algorithm has the same complexity as the original $t$-SNE to embed new items, and a lower one when considering the embedding of a dataset sliced into sub-pieces.
arXiv Detail & Related papers (2021-09-22T06:45:37Z) - A Local Similarity-Preserving Framework for Nonlinear Dimensionality
Reduction with Neural Networks [56.068488417457935]
We propose a novel local nonlinear approach named Vec2vec for general purpose dimensionality reduction.
To train the neural network, we build the neighborhood similarity graph of a matrix and define the context of data points.
Experiments of data classification and clustering on eight real datasets show that Vec2vec is better than several classical dimensionality reduction methods in the statistical hypothesis test.
arXiv Detail & Related papers (2021-03-10T23:10:47Z) - Kernel Two-Dimensional Ridge Regression for Subspace Clustering [45.651770340521786]
We propose a novel subspace clustering method for 2D data.
It directly uses 2D data as inputs such that the learning of representations benefits from inherent structures and relationships of the data.
arXiv Detail & Related papers (2020-11-03T04:52:46Z) - Improving Generative Adversarial Networks with Local Coordinate Coding [150.24880482480455]
Generative adversarial networks (GANs) have shown remarkable success in generating realistic data from some predefined prior distribution.
In practice, semantic information might be represented by some latent distribution learned from data.
We propose an LCCGAN model with local coordinate coding (LCC) to improve the performance of generating data.
arXiv Detail & Related papers (2020-07-28T09:17:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.