Related papers: Local Scale Equivariance with Latent Deep Equilibrium Canonicalizer

Local Scale Equivariance with Latent Deep Equilibrium Canonicalizer

URL: http://arxiv.org/abs/2508.14187v1
Date: Tue, 19 Aug 2025 18:21:59 GMT
Title: Local Scale Equivariance with Latent Deep Equilibrium Canonicalizer
Authors: Md Ashiqur Rahman, Chiao-An Yang, Michael N. Cheng, Lim Jun Hao, Jeremiah Jiang, Teck-Yian Lim, Raymond A. Yeh,
Abstract summary: We present a deep equilibrium canonicalizer (DEC) to improve the local scale equivariance of a model.<n> DEC can be easily incorporated into existing network architectures and can be adapted to a pre-trained model.<n>We show that on the competitive ImageNet benchmark, DEC improves both model performance and local scale consistency.
Score: 10.546719498732102
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Scale variation is a fundamental challenge in computer vision. Objects of the same class can have different sizes, and their perceived size is further affected by the distance from the camera. These variations are local to the objects, i.e., different object sizes may change differently within the same image. To effectively handle scale variations, we present a deep equilibrium canonicalizer (DEC) to improve the local scale equivariance of a model. DEC can be easily incorporated into existing network architectures and can be adapted to a pre-trained model. Notably, we show that on the competitive ImageNet benchmark, DEC improves both model performance and local scale consistency across four popular pre-trained deep-nets, e.g., ViT, DeiT, Swin, and BEiT. Our code is available at https://github.com/ashiq24/local-scale-equivariance.

Related papers

Scale-Equivariant Deep Learning for 3D Data [44.52688267348063]
Convolutional neural networks (CNNs) recognize objects regardless of their position in the image. We propose a scale-equivariant convolutional network layer for three-dimensional data. Our experiments demonstrate the effectiveness of the proposed method in achieving scale-equivariant for 3D medical image analysis.
arXiv Detail & Related papers (2023-04-12T13:56:12Z)
Self-similarity Driven Scale-invariant Learning for Weakly Supervised Person Search [66.95134080902717]
We propose a novel one-step framework, named Self-similarity driven Scale-invariant Learning (SSL) We introduce a Multi-scale Exemplar Branch to guide the network in concentrating on the foreground and learning scale-invariant features. Experiments on PRW and CUHK-SYSU databases demonstrate the effectiveness of our method.
arXiv Detail & Related papers (2023-02-25T04:48:11Z)
Just a Matter of Scale? Reevaluating Scale Equivariance in Convolutional Neural Networks [3.124871781422893]
Convolutional networks are not equivariant to variations in scale and fail to generalize to objects of different sizes. We introduce a new family of models that applies many re-scaled kernels with shared weights in parallel and then selects the most appropriate one. Our experimental results on STIR show that both the existing and proposed approaches can improve generalization across scales compared to standard convolutions.
arXiv Detail & Related papers (2022-11-18T15:27:05Z)
The Lie Derivative for Measuring Learned Equivariance [84.29366874540217]
We study the equivariance properties of hundreds of pretrained models, spanning CNNs, transformers, and Mixer architectures. We find that many violations of equivariance can be linked to spatial aliasing in ubiquitous network layers, such as pointwise non-linearities. For example, transformers can be more equivariant than convolutional neural networks after training.
arXiv Detail & Related papers (2022-10-06T15:20:55Z)
Scale Attention for Learning Deep Face Representation: A Study Against Visual Scale Variation [69.45176408639483]
We reform the conv layer by resorting to the scale-space theory. We build a novel style named SCale AttentioN Conv Neural Network (textbfSCAN-CNN) As a single-shot scheme, the inference is more efficient than multi-shot fusion.
arXiv Detail & Related papers (2022-09-19T06:35:04Z)
ScaleNet: A Shallow Architecture for Scale Estimation [25.29257353644138]
We design a new architecture, ScaleNet, that exploits dilated convolutions and self and cross-correlation layers to predict the scale between images. We show how ScaleNet can be combined with sparse local features and dense correspondence networks to improve camera pose estimation, 3D reconstruction, or dense geometric matching.
arXiv Detail & Related papers (2021-12-09T11:32:01Z)
Scale Equivariance Improves Siamese Tracking [1.7188280334580197]
Siamese trackers turn tracking into similarity estimation between a template and the candidate regions in the frame. Non-translation-equivariant architectures induce a positional bias during training. We present SE-SiamFC, a scale-equivariant variant of SiamFC built according to the recipe.
arXiv Detail & Related papers (2020-07-17T16:55:51Z)
Learning to Learn Parameterized Classification Networks for Scalable Input Images [76.44375136492827]
Convolutional Neural Networks (CNNs) do not have a predictable recognition behavior with respect to the input resolution change. We employ meta learners to generate convolutional weights of main networks for various input scales. We further utilize knowledge distillation on the fly over model predictions based on different input resolutions.
arXiv Detail & Related papers (2020-07-13T04:27:25Z)
Multiscale Deep Equilibrium Models [162.15362280927476]
We propose a new class of implicit networks, the multiscale deep equilibrium model (MDEQ) An MDEQ directly solves for and backpropagates through the equilibrium points of multiple feature resolutions simultaneously. We illustrate the effectiveness of this approach on two large-scale vision tasks: ImageNet classification and semantic segmentation on high-resolution images from the Cityscapes dataset.
arXiv Detail & Related papers (2020-06-15T18:07:44Z)
Improving Few-shot Learning by Spatially-aware Matching and CrossTransformer [116.46533207849619]
We study the impact of scale and location mismatch in the few-shot learning scenario. We propose a novel Spatially-aware Matching scheme to effectively perform matching across multiple scales and locations.
arXiv Detail & Related papers (2020-01-06T14:10:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.