ScaleNet: A Shallow Architecture for Scale Estimation
- URL: http://arxiv.org/abs/2112.04846v1
- Date: Thu, 9 Dec 2021 11:32:01 GMT
- Title: ScaleNet: A Shallow Architecture for Scale Estimation
- Authors: Axel Barroso-Laguna, Yurun Tian and Krystian Mikolajczyk
- Abstract summary: We design a new architecture, ScaleNet, that exploits dilated convolutions and self and cross-correlation layers to predict the scale between images.
We show how ScaleNet can be combined with sparse local features and dense correspondence networks to improve camera pose estimation, 3D reconstruction, or dense geometric matching.
- Score: 25.29257353644138
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we address the problem of estimating scale factors between
images. We formulate the scale estimation problem as a prediction of a
probability distribution over scale factors. We design a new architecture,
ScaleNet, that exploits dilated convolutions as well as self and
cross-correlation layers to predict the scale between images. We demonstrate
that rectifying images with estimated scales leads to significant performance
improvements for various tasks and methods. Specifically, we show how ScaleNet
can be combined with sparse local features and dense correspondence networks to
improve camera pose estimation, 3D reconstruction, or dense geometric matching
in different benchmarks and datasets. We provide an extensive evaluation on
several tasks and analyze the computational overhead of ScaleNet. The code,
evaluation protocols, and trained models are publicly available at
https://github.com/axelBarroso/ScaleNet.
Related papers
- SCAResNet: A ResNet Variant Optimized for Tiny Object Detection in Transmission and Distribution Towers [0.42028553027796633]
Traditional deep learning-based object detection networks often resize images during the data pre-processing stage to achieve a uniform size and scale in the feature map.
We introduce Positional-Multi-head Criss-Cross Imagery to capture contextual information and learn from multiple representation subspaces.
This approach allows images of different sizes and scales to generate feature maps with uniform dimensions and can be employed in feature map propagation.
arXiv Detail & Related papers (2024-04-05T15:48:36Z) - Scale-Equivariant Deep Learning for 3D Data [44.52688267348063]
Convolutional neural networks (CNNs) recognize objects regardless of their position in the image.
We propose a scale-equivariant convolutional network layer for three-dimensional data.
Our experiments demonstrate the effectiveness of the proposed method in achieving scale-equivariant for 3D medical image analysis.
arXiv Detail & Related papers (2023-04-12T13:56:12Z) - Scale-MAE: A Scale-Aware Masked Autoencoder for Multiscale Geospatial
Representation Learning [55.762840052788945]
We present Scale-MAE, a pretraining method that explicitly learns relationships between data at different, known scales.
We find that tasking the network with reconstructing both low/high frequency images leads to robust multiscale representations for remote sensing imagery.
arXiv Detail & Related papers (2022-12-30T03:15:34Z) - Scale Attention for Learning Deep Face Representation: A Study Against
Visual Scale Variation [69.45176408639483]
We reform the conv layer by resorting to the scale-space theory.
We build a novel style named SCale AttentioN Conv Neural Network (textbfSCAN-CNN)
As a single-shot scheme, the inference is more efficient than multi-shot fusion.
arXiv Detail & Related papers (2022-09-19T06:35:04Z) - Scale-Net: Learning to Reduce Scale Differences for Large-Scale
Invariant Image Matching [7.297352404640492]
We propose a scale-difference-aware image matching method (SDAIM) that reduces image scale differences before local feature extraction.
In order to accurately estimate the scale ratio, we propose a covisibility-attention-reinforced matching module (CVARM) and then design a novel neural network, termed as Scale-Net.
arXiv Detail & Related papers (2021-12-20T12:35:36Z) - Pixel-Perfect Structure-from-Motion with Featuremetric Refinement [96.73365545609191]
We refine two key steps of structure-from-motion by a direct alignment of low-level image information from multiple views.
This significantly improves the accuracy of camera poses and scene geometry for a wide range of keypoint detectors.
Our system easily scales to large image collections, enabling pixel-perfect crowd-sourced localization at scale.
arXiv Detail & Related papers (2021-08-18T17:58:55Z) - Scale-covariant and scale-invariant Gaussian derivative networks [0.0]
This paper presents a hybrid approach between scale-space theory and deep learning, where a deep learning architecture is constructed by coupling parameterized scale-space operations in cascade.
It is demonstrated that the resulting approach allows for scale generalization, enabling good performance for classifying patterns at scales not present in the training data.
arXiv Detail & Related papers (2020-11-30T13:15:10Z) - Weighing Counts: Sequential Crowd Counting by Reinforcement Learning [84.39624429527987]
We formulate counting as a sequential decision problem and present a novel crowd counting model solvable by deep reinforcement learning.
We propose a novel 'counting scale' termed LibraNet where the count value is analogized by weight.
We show that LibraNet exactly implements scale weighing by visualizing the decision process how LibraNet chooses actions.
arXiv Detail & Related papers (2020-07-16T11:16:12Z) - On the Predictability of Pruning Across Scales [29.94870276983399]
We show that the error of magnitude-pruned networks empirically follows a scaling law with interpretable coefficients that depend on the architecture and task.
As neural networks become ever larger and costlier to train, our findings suggest a framework for reasoning conceptually and analytically about a standard method for unstructured pruning.
arXiv Detail & Related papers (2020-06-18T15:41:46Z) - Crowd Counting via Hierarchical Scale Recalibration Network [61.09833400167511]
We propose a novel Hierarchical Scale Recalibration Network (HSRNet) to tackle the task of crowd counting.
HSRNet models rich contextual dependencies and recalibrating multiple scale-associated information.
Our approach can ignore various noises selectively and focus on appropriate crowd scales automatically.
arXiv Detail & Related papers (2020-03-07T10:06:47Z) - Improving Few-shot Learning by Spatially-aware Matching and
CrossTransformer [116.46533207849619]
We study the impact of scale and location mismatch in the few-shot learning scenario.
We propose a novel Spatially-aware Matching scheme to effectively perform matching across multiple scales and locations.
arXiv Detail & Related papers (2020-01-06T14:10:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.