Hilbert Flattening: a Locality-Preserving Matrix Unfolding Method for
Visual Discrimination
- URL: http://arxiv.org/abs/2202.10240v7
- Date: Tue, 30 Jan 2024 06:56:43 GMT
- Title: Hilbert Flattening: a Locality-Preserving Matrix Unfolding Method for
Visual Discrimination
- Authors: Qingsong Zhao, Yi Wang, Zhipeng Zhou, Duoqian Miao, Limin Wang, Yu
Qiao, Cairong Zhao
- Abstract summary: We propose Hilbert curve flattening as an innovative method to preserve locality in flattened matrices.
We also introduce the Localformer, a vision transformer architecture that incorporates token sampling with a token aggregator to enhance its locality bias.
- Score: 51.432453379052724
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Flattening is essential in computer vision by converting multi-dimensional
feature maps or images into one-dimensional vectors. However, existing
flattening approaches neglect the preservation of local smoothness, which can
impact the representational learning capacity of vision models. In this paper,
we propose Hilbert curve flattening as an innovative method to preserve
locality in flattened matrices. We compare it with the commonly used Zigzag
operation and demonstrate that Hilbert curve flattening can better retain the
spatial relationships and local smoothness of the original grid structure,
while maintaining robustness against the input scale variance. And, we
introduce the Localformer, a vision transformer architecture that incorporates
Hilbert token sampling with a token aggregator to enhance its locality bias.
Extensive experiments on image classification and semantic segmentation tasks
demonstrate that the Localformer outperforms baseline models consistently. We
also show it brings consistent performance boosts for other popular
architectures (e.g. MLP-Mixer).
Related papers
- Compressing Image-to-Image Translation GANs Using Local Density
Structures on Their Learned Manifold [69.33930972652594]
Generative Adversarial Networks (GANs) have shown remarkable success in modeling complex data distributions for image-to-image translation.
Existing GAN compression methods mainly rely on knowledge distillation or convolutional classifiers' pruning techniques.
We propose a new approach by explicitly encouraging the pruned model to preserve the density structure of the original parameter-heavy model on its learned manifold.
Our experiments on image translation GAN models, Pix2Pix and CycleGAN, with various benchmark datasets and architectures demonstrate our method's effectiveness.
arXiv Detail & Related papers (2023-12-22T15:43:12Z) - Projected Randomized Smoothing for Certified Adversarial Robustness [9.771011198361865]
Randomized smoothing is the current state-of-the-art method for producing provably robust classifiers.
Recent research has generalized provable robustness to different norm balls as well as anisotropic regions.
We show that our method improves on the state-of-the-art by many orders of magnitude.
arXiv Detail & Related papers (2023-09-25T01:12:55Z) - Flattening-Net: Deep Regular 2D Representation for 3D Point Cloud
Analysis [66.49788145564004]
We present an unsupervised deep neural architecture called Flattening-Net to represent irregular 3D point clouds of arbitrary geometry and topology.
Our methods perform favorably against the current state-of-the-art competitors.
arXiv Detail & Related papers (2022-12-17T15:05:25Z) - GraphFit: Learning Multi-scale Graph-Convolutional Representation for
Point Cloud Normal Estimation [31.40738037512243]
We propose a precise and efficient normal estimation method for unstructured 3D point clouds.
We learn graph convolutional feature representation for normal estimation, which emphasizes more local neighborhood geometry.
Our method outperforms competitors with the state-of-the-art accuracy on various benchmark datasets.
arXiv Detail & Related papers (2022-07-23T10:29:26Z) - ReF -- Rotation Equivariant Features for Local Feature Matching [30.459559206664427]
We propose an alternative, complementary approach that centers on inducing bias in the model architecture itself to generate rotation-specific' features.
We demonstrate that this high performance, rotation-specific coverage from the steerable CNNs can be expanded to all rotation angles.
We present a detailed analysis of the performance effects of ensembling, robust estimation, network architecture variations, and the use of rotation priors.
arXiv Detail & Related papers (2022-03-10T07:36:09Z) - Spatial-spectral Hyperspectral Image Classification via Multiple Random
Anchor Graphs Ensemble Learning [88.60285937702304]
This paper proposes a novel spatial-spectral HSI classification method via multiple random anchor graphs ensemble learning (RAGE)
Firstly, the local binary pattern is adopted to extract the more descriptive features on each selected band, which preserves local structures and subtle changes of a region.
Secondly, the adaptive neighbors assignment is introduced in the construction of anchor graph, to reduce the computational complexity.
arXiv Detail & Related papers (2021-03-25T09:31:41Z) - Graph Convolution with Low-rank Learnable Local Filters [32.00396411583352]
This paper introduces a new type of graph convolution with learnable low-rank local filters.
It is provably more expressive than previous spectral graph convolution methods.
The representation against input graph data is theoretically proved, making use of the graph filter locality and the local graph regularization.
arXiv Detail & Related papers (2020-08-04T20:34:59Z) - ProAlignNet : Unsupervised Learning for Progressively Aligning Noisy
Contours [12.791313859673187]
"ProAlignNet" accounts for large scale misalignments and complex transformations between the contour shapes.
It learns by training with a novel loss function which is derived an upperbound of a proximity-sensitive and local shape-dependent similarity metric.
In two real-world applications, the proposed models consistently perform superior to state-of-the-art methods.
arXiv Detail & Related papers (2020-05-23T14:56:14Z) - Neural Subdivision [58.97214948753937]
This paper introduces Neural Subdivision, a novel framework for data-driven coarseto-fine geometry modeling.
We optimize for the same set of network weights across all local mesh patches, thus providing an architecture that is not constrained to a specific input mesh, fixed genus, or category.
We demonstrate that even when trained on a single high-resolution mesh our method generates reasonable subdivisions for novel shapes.
arXiv Detail & Related papers (2020-05-04T20:03:21Z) - Spatial Pyramid Based Graph Reasoning for Semantic Segmentation [67.47159595239798]
We apply graph convolution into the semantic segmentation task and propose an improved Laplacian.
The graph reasoning is directly performed in the original feature space organized as a spatial pyramid.
We achieve comparable performance with advantages in computational and memory overhead.
arXiv Detail & Related papers (2020-03-23T12:28:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.