PMatch: Paired Masked Image Modeling for Dense Geometric Matching
- URL: http://arxiv.org/abs/2303.17342v1
- Date: Thu, 30 Mar 2023 12:53:22 GMT
- Title: PMatch: Paired Masked Image Modeling for Dense Geometric Matching
- Authors: Shengjie Zhu, Xiaoming Liu
- Abstract summary: We propose a novel cross-frame global matching module (CFGM) for geometric matching.
To be robust to the textureless area, we propose a homography loss to further regularize its learning.
We achieve the State-of-The-Art (SoTA) performance on geometric matching.
- Score: 18.64065915021511
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Dense geometric matching determines the dense pixel-wise correspondence
between a source and support image corresponding to the same 3D structure.
Prior works employ an encoder of transformer blocks to correlate the two-frame
features. However, existing monocular pretraining tasks, e.g., image
classification, and masked image modeling (MIM), can not pretrain the
cross-frame module, yielding less optimal performance. To resolve this, we
reformulate the MIM from reconstructing a single masked image to reconstructing
a pair of masked images, enabling the pretraining of transformer module.
Additionally, we incorporate a decoder into pretraining for improved upsampling
results. Further, to be robust to the textureless area, we propose a novel
cross-frame global matching module (CFGM). Since the most textureless area is
planar surfaces, we propose a homography loss to further regularize its
learning. Combined together, we achieve the State-of-The-Art (SoTA) performance
on geometric matching. Codes and models are available at
https://github.com/ShngJZ/PMatch.
Related papers
- Large Spatial Model: End-to-end Unposed Images to Semantic 3D [79.94479633598102]
Large Spatial Model (LSM) processes unposed RGB images directly into semantic radiance fields.
LSM simultaneously estimates geometry, appearance, and semantics in a single feed-forward operation.
It can generate versatile label maps by interacting with language at novel viewpoints.
arXiv Detail & Related papers (2024-10-24T17:54:42Z) - 3D Geometric Shape Assembly via Efficient Point Cloud Matching [59.241448711254485]
We introduce Proxy Match Transform (PMT), an approximate high-order feature transform layer that enables reliable matching between mating surfaces of parts.
Building upon PMT, we introduce a new framework, dubbed Proxy Match TransformeR (PMTR), for the geometric assembly task.
We evaluate the proposed PMTR on the large-scale 3D geometric shape assembly benchmark dataset of Breaking Bad.
arXiv Detail & Related papers (2024-07-15T08:50:02Z) - 3DMiner: Discovering Shapes from Large-Scale Unannotated Image Datasets [34.610546020800236]
3DMiner is a pipeline for mining 3D shapes from challenging datasets.
Our method is capable of producing significantly better results than state-of-the-art unsupervised 3D reconstruction techniques.
We show how 3DMiner can be applied to in-the-wild data by reconstructing shapes present in images from the LAION-5B dataset.
arXiv Detail & Related papers (2023-10-29T23:08:19Z) - Not All Image Regions Matter: Masked Vector Quantization for
Autoregressive Image Generation [78.13793505707952]
Existing autoregressive models follow the two-stage generation paradigm that first learns a codebook in the latent space for image reconstruction and then completes the image generation autoregressively based on the learned codebook.
We propose a novel two-stage framework, which consists of Masked Quantization VAE (MQ-VAE) Stack model from modeling redundancy.
arXiv Detail & Related papers (2023-05-23T02:15:53Z) - Learning Accurate Template Matching with Differentiable Coarse-to-Fine
Correspondence Refinement [28.00275083733545]
We propose an accurate template matching method based on differentiable coarse-to-fine correspondence refinement.
An initial warp is estimated using coarse correspondences based on novel structure-aware information provided by transformers.
Our method is significantly better than state-of-the-art methods and baselines, providing good generalization ability and visually plausible results even on unseen real data.
arXiv Detail & Related papers (2023-03-15T08:24:10Z) - PixMIM: Rethinking Pixel Reconstruction in Masked Image Modeling [83.67628239775878]
Masked Image Modeling (MIM) has achieved promising progress with the advent of Masked Autoencoders (MAE) and BEiT.
This paper undertakes a fundamental analysis of MIM from the perspective of pixel reconstruction.
We propose a remarkably simple and effective method, ourmethod, that entails two strategies.
arXiv Detail & Related papers (2023-03-04T13:38:51Z) - Designing BERT for Convolutional Networks: Sparse and Hierarchical
Masked Modeling [23.164631160130092]
We extend the success of BERT-style pre-training, or the masked image modeling, to convolutional networks (convnets)
We treat unmasked pixels as sparse voxels of 3D point clouds and use sparse convolution to encode.
This is the first use of sparse convolution for 2D masked modeling.
arXiv Detail & Related papers (2023-01-09T18:59:50Z) - Stare at What You See: Masked Image Modeling without Reconstruction [154.74533119863864]
Masked Autoencoders (MAE) have been prevailing paradigms for large-scale vision representation pre-training.
Recent approaches apply semantic-rich teacher models to extract image features as the reconstruction target, leading to better performance.
We argue the features extracted by powerful teacher models already encode rich semantic correlation across regions in an intact image.
arXiv Detail & Related papers (2022-11-16T12:48:52Z) - Self-supervised Correlation Mining Network for Person Image Generation [9.505343361614928]
Person image generation aims to perform non-rigid deformation on source images.
We propose a Self-supervised Correlation Mining Network (SCM-Net) to rearrange the source images in the feature space.
For improving the fidelity of cross-scale pose transformation, we propose a graph based Body Structure Retaining Loss.
arXiv Detail & Related papers (2021-11-26T03:57:46Z) - Spatial-Separated Curve Rendering Network for Efficient and
High-Resolution Image Harmonization [59.19214040221055]
We propose a novel spatial-separated curve rendering network (S$2$CRNet) for efficient and high-resolution image harmonization.
The proposed method reduces more than 90% parameters compared with previous methods.
Our method can work smoothly on higher resolution images in real-time which is more than 10$times$ faster than the existing methods.
arXiv Detail & Related papers (2021-09-13T07:20:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.