Autoencoders with Intrinsic Dimension Constraints for Learning Low
Dimensional Image Representations
- URL: http://arxiv.org/abs/2304.07686v1
- Date: Sun, 16 Apr 2023 03:43:08 GMT
- Title: Autoencoders with Intrinsic Dimension Constraints for Learning Low
Dimensional Image Representations
- Authors: Jianzhang Zheng, Hao Shen, Jian Yang, Xuan Tang, Mingsong Chen, Hui
Yu, Jielong Guo, Xian Wei
- Abstract summary: We propose a novel deep representation learning approach with autoencoder, which incorporates regularization of the global and local ID constraints into the reconstruction of data representations.
This approach not only preserves the global manifold structure of the whole dataset, but also maintains the local manifold structure of the feature maps of each point.
- Score: 27.40298734517967
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Autoencoders have achieved great success in various computer vision
applications. The autoencoder learns appropriate low dimensional image
representations through the self-supervised paradigm, i.e., reconstruction.
Existing studies mainly focus on the minimizing the reconstruction error on
pixel level of image, while ignoring the preservation of Intrinsic Dimension
(ID), which is a fundamental geometric property of data representations in Deep
Neural Networks (DNNs). Motivated by the important role of ID, in this paper,
we propose a novel deep representation learning approach with autoencoder,
which incorporates regularization of the global and local ID constraints into
the reconstruction of data representations. This approach not only preserves
the global manifold structure of the whole dataset, but also maintains the
local manifold structure of the feature maps of each point, which makes the
learned low-dimensional features more discriminant and improves the performance
of the downstream algorithms. To our best knowledge, existing works are rare
and limited on exploiting both global and local ID invariant properties on the
regularization of autoencoders. Numerical experimental results on benchmark
datasets (Extended Yale B, Caltech101 and ImageNet) show that the resulting
regularized learning models achieve better discriminative representations for
downstream tasks including image classification and clustering.
Related papers
- GRIN: Zero-Shot Metric Depth with Pixel-Level Diffusion [27.35300492569507]
We present GRIN, an efficient diffusion model designed to ingest sparse unstructured training data.
We show that GRIN establishes a new state of the art in zero-shot metric monocular depth estimation even when trained from scratch.
arXiv Detail & Related papers (2024-09-15T23:32:04Z) - UGMAE: A Unified Framework for Graph Masked Autoencoders [67.75493040186859]
We propose UGMAE, a unified framework for graph masked autoencoders.
We first develop an adaptive feature mask generator to account for the unique significance of nodes.
We then design a ranking-based structure reconstruction objective joint with feature reconstruction to capture holistic graph information.
arXiv Detail & Related papers (2024-02-12T19:39:26Z) - Low-Resolution Self-Attention for Semantic Segmentation [96.81482872022237]
We introduce the Low-Resolution Self-Attention (LRSA) mechanism to capture global context at a significantly reduced computational cost.
Our approach involves computing self-attention in a fixed low-resolution space regardless of the input image's resolution.
We demonstrate the effectiveness of our LRSA approach by building the LRFormer, a vision transformer with an encoder-decoder structure.
arXiv Detail & Related papers (2023-10-08T06:10:09Z) - ClusVPR: Efficient Visual Place Recognition with Clustering-based
Weighted Transformer [13.0858576267115]
We present ClusVPR, a novel approach that tackles the specific issues of redundant information in duplicate regions and representations of small objects.
ClusVPR introduces a unique paradigm called Clustering-based weighted Transformer Network (CWTNet)
We also introduce the optimized-VLAD layer that significantly reduces the number of parameters and enhances model efficiency.
arXiv Detail & Related papers (2023-10-06T09:01:15Z) - Deep face recognition with clustering based domain adaptation [57.29464116557734]
We propose a new clustering-based domain adaptation method designed for face recognition task in which the source and target domain do not share any classes.
Our method effectively learns the discriminative target feature by aligning the feature domain globally, and, at the meantime, distinguishing the target clusters locally.
arXiv Detail & Related papers (2022-05-27T12:29:11Z) - Learning Enriched Features for Fast Image Restoration and Enhancement [166.17296369600774]
This paper presents a holistic goal of maintaining spatially-precise high-resolution representations through the entire network.
We learn an enriched set of features that combines contextual information from multiple scales, while simultaneously preserving the high-resolution spatial details.
Our approach achieves state-of-the-art results for a variety of image processing tasks, including defocus deblurring, image denoising, super-resolution, and image enhancement.
arXiv Detail & Related papers (2022-04-19T17:59:45Z) - Single Image Internal Distribution Measurement Using Non-Local
Variational Autoencoder [11.985083962982909]
This paper proposes a novel image-specific solution, namely non-local variational autoencoder (textttNLVAE)
textttNLVAE is introduced as a self-supervised strategy that reconstructs high-resolution images using disentangled information from the non-local neighbourhood.
Experimental results from seven benchmark datasets demonstrate the effectiveness of the textttNLVAE model.
arXiv Detail & Related papers (2022-04-02T18:43:55Z) - Learning Enriched Features for Real Image Restoration and Enhancement [166.17296369600774]
convolutional neural networks (CNNs) have achieved dramatic improvements over conventional approaches for image restoration task.
We present a novel architecture with the collective goals of maintaining spatially-precise high-resolution representations through the entire network.
Our approach learns an enriched set of features that combines contextual information from multiple scales, while simultaneously preserving the high-resolution spatial details.
arXiv Detail & Related papers (2020-03-15T11:04:30Z) - A U-Net Based Discriminator for Generative Adversarial Networks [86.67102929147592]
We propose an alternative U-Net based discriminator architecture for generative adversarial networks (GANs)
The proposed architecture allows to provide detailed per-pixel feedback to the generator while maintaining the global coherence of synthesized images.
The novel discriminator improves over the state of the art in terms of the standard distribution and image quality metrics.
arXiv Detail & Related papers (2020-02-28T11:16:54Z) - Structural Deep Clustering Network [45.370272344031285]
We propose a Structural Deep Clustering Network (SDCN) to integrate the structural information into deep clustering.
Specifically, we design a delivery operator to transfer the representations learned by autoencoder to the corresponding GCN layer.
In this way, the multiple structures of data, from low-order to high-order, are naturally combined with the multiple representations learned by autoencoder.
arXiv Detail & Related papers (2020-02-05T04:33:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.