Rethinking the Zigzag Flattening for Image Reading
- URL: http://arxiv.org/abs/2202.10240v8
- Date: Tue, 20 Aug 2024 03:31:41 GMT
- Title: Rethinking the Zigzag Flattening for Image Reading
- Authors: Qingsong Zhao, Yi Wang, Zhipeng Zhou, Duoqian Miao, Limin Wang, Yu Qiao, Cairong Zhao,
- Abstract summary: We investigate the Hilbert fractal flattening (HF) as another method for sequence ordering in computer vision.
The HF has proven to be superior to other curves in maintaining spatial locality.
It can be easily plugged into most deep neural networks (DNNs)
- Score: 48.976491898131265
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Sequence ordering of word vector matters a lot to text reading, which has been proven in natural language processing (NLP). However, the rule of different sequence ordering in computer vision (CV) was not well explored, e.g., why the ``zigzag" flattening (ZF) is commonly utilized as a default option to get the image patches ordering in vision networks. Notably, when decomposing multi-scale images, the ZF could not maintain the invariance of feature point positions. To this end, we investigate the Hilbert fractal flattening (HF) as another method for sequence ordering in CV and contrast it against ZF. The HF has proven to be superior to other curves in maintaining spatial locality, when performing multi-scale transformations of dimensional space. And it can be easily plugged into most deep neural networks (DNNs). Extensive experiments demonstrate that it can yield consistent and significant performance boosts for a variety of architectures. Finally, we hope that our studies spark further research about the flattening strategy of image reading.
Related papers
- Vector Field Attention for Deformable Image Registration [9.852055065890479]
Deformable image registration establishes non-linear spatial correspondences between fixed and moving images.
Most existing deep learning-based methods require neural networks to encode location information in their feature maps.
We present Vector Field Attention (VFA), a novel framework that enhances the efficiency of the existing network design by enabling direct retrieval of location correspondences.
arXiv Detail & Related papers (2024-07-14T14:06:58Z) - Breaking the Frame: Visual Place Recognition by Overlap Prediction [53.17564423756082]
We propose a novel visual place recognition approach based on overlap prediction, called VOP.
VOP proceeds co-visible image sections by obtaining patch-level embeddings using a Vision Transformer backbone.
Our approach uses a voting mechanism to assess overlap scores for potential database images.
arXiv Detail & Related papers (2024-06-23T20:00:20Z) - Towards Better Gradient Consistency for Neural Signed Distance Functions
via Level Set Alignment [50.892158511845466]
We show that gradient consistency in the field, indicated by the parallelism of level sets, is the key factor affecting the inference accuracy.
We propose a level set alignment loss to evaluate the parallelism of level sets, which can be minimized to achieve better gradient consistency.
arXiv Detail & Related papers (2023-05-19T11:28:05Z) - A Geometrically Constrained Point Matching based on View-invariant
Cross-ratios, and Homography [2.050924050557755]
A geometrically constrained algorithm is proposed to verify the correctness of initially matched SIFT keypoints based on view-invariant cross-ratios (CRs)
By randomly forming pentagons from these keypoints and matching their shape and location among images with CRs, robust planar region estimation can be achieved efficiently.
Experimental results show that satisfactory results can be obtained for various scenes with single as well as multiple planar regions.
arXiv Detail & Related papers (2022-11-06T01:55:35Z) - Neural Space-filling Curves [47.852964985588486]
We present a data-driven approach to infer a context-based scan order for a set of images.
Our work learns a spatially coherent linear ordering of pixels from the dataset of images using a graph-based neural network.
We show the advantage of using Neural SFCs in downstream applications such as image compression.
arXiv Detail & Related papers (2022-04-18T17:59:01Z) - UltraSR: Spatial Encoding is a Missing Key for Implicit Image
Function-based Arbitrary-Scale Super-Resolution [74.82282301089994]
In this work, we propose UltraSR, a simple yet effective new network design based on implicit image functions.
We show that spatial encoding is indeed a missing key towards the next-stage high-accuracy implicit image function.
Our UltraSR sets new state-of-the-art performance on the DIV2K benchmark under all super-resolution scales.
arXiv Detail & Related papers (2021-03-23T17:36:42Z) - Scalable Visual Transformers with Hierarchical Pooling [61.05787583247392]
We propose a Hierarchical Visual Transformer (HVT) which progressively pools visual tokens to shrink the sequence length.
It brings a great benefit by scaling dimensions of depth/width/resolution/patch size without introducing extra computational complexity.
Our HVT outperforms the competitive baselines on ImageNet and CIFAR-100 datasets.
arXiv Detail & Related papers (2021-03-19T03:55:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.