Telling Left from Right: Identifying Geometry-Aware Semantic Correspondence
- URL: http://arxiv.org/abs/2311.17034v2
- Date: Mon, 25 Mar 2024 01:21:18 GMT
- Title: Telling Left from Right: Identifying Geometry-Aware Semantic Correspondence
- Authors: Junyi Zhang, Charles Herrmann, Junhwa Hur, Eric Chen, Varun Jampani, Deqing Sun, Ming-Hsuan Yang,
- Abstract summary: This paper identifies the importance of being geometry-aware for semantic correspondence.
We show that incorporating this information can markedly enhance semantic correspondence performance.
Our method achieves a PCK@0.10 score of 65.4 (zero-shot) and 85.6 (supervised) on the challenging SPair-71k dataset.
- Score: 80.6840060272386
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While pre-trained large-scale vision models have shown significant promise for semantic correspondence, their features often struggle to grasp the geometry and orientation of instances. This paper identifies the importance of being geometry-aware for semantic correspondence and reveals a limitation of the features of current foundation models under simple post-processing. We show that incorporating this information can markedly enhance semantic correspondence performance with simple but effective solutions in both zero-shot and supervised settings. We also construct a new challenging benchmark for semantic correspondence built from an existing animal pose estimation dataset, for both pre-training validating models. Our method achieves a PCK@0.10 score of 65.4 (zero-shot) and 85.6 (supervised) on the challenging SPair-71k dataset, outperforming the state of the art by 5.5p and 11.0p absolute gains, respectively. Our code and datasets are publicly available at: https://telling-left-from-right.github.io/.
Related papers
- Improving Semantic Correspondence with Viewpoint-Guided Spherical Maps [39.00415825387414]
We propose a new approach for semantic correspondence estimation that supplements discriminative features with 3D understanding via a weak geometric spherical prior.
Compared to more involved 3D pipelines, our model only requires weak viewpoint information, and the simplicity of our spherical representation enables us to inject informative geometric priors into the model during training.
We present results on the challenging SPair-71k dataset, where our approach demonstrates is capable of distinguishing between symmetric views and repeated parts across many object categories.
arXiv Detail & Related papers (2023-12-20T17:35:24Z) - iMatching: Imperative Correspondence Learning [5.568520539073218]
We introduce a new self-supervised scheme, imperative learning (IL), for training feature correspondence.
It enables correspondence learning on arbitrary uninterrupted videos without any camera pose or depth labels.
We demonstrate superior performance on tasks including feature matching and pose estimation.
arXiv Detail & Related papers (2023-12-04T18:58:20Z) - Match me if you can: Semi-Supervised Semantic Correspondence Learning with Unpaired Images [76.47980643420375]
This paper builds on the hypothesis that there is an inherent data-hungry matter in learning semantic correspondences.
We demonstrate a simple machine annotator reliably enriches paired key points via machine supervision.
Our models surpass current state-of-the-art models on semantic correspondence learning benchmarks like SPair-71k, PF-PASCAL, and PF-WILLOW.
arXiv Detail & Related papers (2023-11-30T13:22:15Z) - Delving into Shape-aware Zero-shot Semantic Segmentation [18.51025849474123]
We present textbfshape-aware zero-shot semantic segmentation.
Inspired by classical spectral methods, we propose to leverage the eigen vectors of Laplacian matrices constructed with self-supervised pixel-wise features.
Our method sets new state-of-the-art performance for zero-shot semantic segmentation on both Pascal and COCO.
arXiv Detail & Related papers (2023-04-17T17:59:46Z) - Learning Context-aware Classifier for Semantic Segmentation [88.88198210948426]
In this paper, contextual hints are exploited via learning a context-aware classifier.
Our method is model-agnostic and can be easily applied to generic segmentation models.
With only negligible additional parameters and +2% inference time, decent performance gain has been achieved on both small and large models.
arXiv Detail & Related papers (2023-03-21T07:00:35Z) - Fixing Model Bugs with Natural Language Patches [38.67529353406759]
We explore natural language patches that allow developers to provide corrective feedback at the right level of abstraction.
We show that with a small amount of synthetic data, we can teach models to effectively use real patches on real data.
We also show that finetuning on as many as 100 labeled examples may be needed to match the performance of a small set of language patches.
arXiv Detail & Related papers (2022-11-07T05:49:19Z) - Cross-modal Representation Learning for Zero-shot Action Recognition [67.57406812235767]
We present a cross-modal Transformer-based framework, which jointly encodes video data and text labels for zero-shot action recognition (ZSAR)
Our model employs a conceptually new pipeline by which visual representations are learned in conjunction with visual-semantic associations in an end-to-end manner.
Experiment results show our model considerably improves upon the state of the arts in ZSAR, reaching encouraging top-1 accuracy on UCF101, HMDB51, and ActivityNet benchmark datasets.
arXiv Detail & Related papers (2022-05-03T17:39:27Z) - A Graph-based Interactive Reasoning for Human-Object Interaction
Detection [71.50535113279551]
We present a novel graph-based interactive reasoning model called Interactive Graph (abbr. in-Graph) to infer HOIs.
We construct a new framework to assemble in-Graph models for detecting HOIs, namely in-GraphNet.
Our framework is end-to-end trainable and free from costly annotations like human pose.
arXiv Detail & Related papers (2020-07-14T09:29:03Z) - Pairwise Similarity Knowledge Transfer for Weakly Supervised Object
Localization [53.99850033746663]
We study the problem of learning localization model on target classes with weakly supervised image labels.
In this work, we argue that learning only an objectness function is a weak form of knowledge transfer.
Experiments on the COCO and ILSVRC 2013 detection datasets show that the performance of the localization model improves significantly with the inclusion of pairwise similarity function.
arXiv Detail & Related papers (2020-03-18T17:53:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.