Weakly-Supervised Learning of Dense Functional Correspondences
- URL: http://arxiv.org/abs/2509.03893v1
- Date: Thu, 04 Sep 2025 05:39:16 GMT
- Title: Weakly-Supervised Learning of Dense Functional Correspondences
- Authors: Stefan Stojanov, Linan Zhao, Yunzhi Zhang, Daniel L. K. Yamins, Jiajun Wu,
- Abstract summary: We propose a weakly-supervised learning paradigm to tackle the prediction task.<n>The main insight behind our approach is that we can leverage vision-language models to obtain functional parts.<n>We then integrate this with dense contrastive learning from pixel correspondences to distill both functional and spatial knowledge into a new model.
- Score: 23.794395724229762
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Establishing dense correspondences across image pairs is essential for tasks such as shape reconstruction and robot manipulation. In the challenging setting of matching across different categories, the function of an object, i.e., the effect that an object can cause on other objects, can guide how correspondences should be established. This is because object parts that enable specific functions often share similarities in shape and appearance. We derive the definition of dense functional correspondence based on this observation and propose a weakly-supervised learning paradigm to tackle the prediction task. The main insight behind our approach is that we can leverage vision-language models to pseudo-label multi-view images to obtain functional parts. We then integrate this with dense contrastive learning from pixel correspondences to distill both functional and spatial knowledge into a new model that can establish dense functional correspondence. Further, we curate synthetic and real evaluation datasets as task benchmarks. Our results demonstrate the advantages of our approach over baseline solutions consisting of off-the-shelf self-supervised image representations and grounded vision language models.
Related papers
- Structure-Aware Correspondence Learning for Relative Pose Estimation [65.44234975976451]
Relative pose estimation provides a promising way for achieving object-agnostic pose estimation.<n>Existing 3D correspondence-based methods suffer from small overlaps in visible regions and unreliable feature estimation for invisible regions.<n>We propose a novel Structure-Aware Correspondence Learning method for Relative Pose Estimation, which consists of two key modules.
arXiv Detail & Related papers (2025-03-24T13:43:44Z) - Visual-Geometric Collaborative Guidance for Affordance Learning [63.038406948791454]
We propose a visual-geometric collaborative guided affordance learning network that incorporates visual and geometric cues.
Our method outperforms the representative models regarding objective metrics and visual quality.
arXiv Detail & Related papers (2024-10-15T07:35:51Z) - Zero-Shot Image Feature Consensus with Deep Functional Maps [20.988872402347756]
We show that a better correspondence strategy is available, which directly imposes structure on the correspondence field: the functional map.
We demonstrate that our technique yields correspondences that are not only smoother but also more accurate, with the possibility of better reflecting the knowledge embedded in the large-scale vision models that we are studying.
arXiv Detail & Related papers (2024-03-18T17:59:47Z) - Learning-based Relational Object Matching Across Views [63.63338392484501]
We propose a learning-based approach which combines local keypoints with novel object-level features for matching object detections between RGB images.
We train our object-level matching features based on appearance and inter-frame and cross-frame spatial relations between objects in an associative graph neural network.
arXiv Detail & Related papers (2023-05-03T19:36:51Z) - Matching Multiple Perspectives for Efficient Representation Learning [0.0]
We present an approach that combines self-supervised learning with a multi-perspective matching technique.
We show that the availability of multiple views of the same object combined with a variety of self-supervised pretraining algorithms can lead to improved object classification performance.
arXiv Detail & Related papers (2022-08-16T10:33:13Z) - TopicFM: Robust and Interpretable Feature Matching with Topic-assisted [8.314830611853168]
We propose an architecture for image matching which is efficient, robust, and interpretable.
We introduce a novel feature matching module called TopicFM which can roughly organize same spatial structure across images into a topic.
Our method can only perform matching in co-visibility regions to reduce computations.
arXiv Detail & Related papers (2022-07-01T10:39:14Z) - Self-Supervised Visual Representation Learning with Semantic Grouping [50.14703605659837]
We tackle the problem of learning visual representations from unlabeled scene-centric data.
We propose contrastive learning from data-driven semantic slots, namely SlotCon, for joint semantic grouping and representation learning.
arXiv Detail & Related papers (2022-05-30T17:50:59Z) - Understanding Synonymous Referring Expressions via Contrastive Features [105.36814858748285]
We develop an end-to-end trainable framework to learn contrastive features on the image and object instance levels.
We conduct extensive experiments to evaluate the proposed algorithm on several benchmark datasets.
arXiv Detail & Related papers (2021-04-20T17:56:24Z) - Learning to Compose Hypercolumns for Visual Correspondence [57.93635236871264]
We introduce a novel approach to visual correspondence that dynamically composes effective features by leveraging relevant layers conditioned on the images to match.
The proposed method, dubbed Dynamic Hyperpixel Flow, learns to compose hypercolumn features on the fly by selecting a small number of relevant layers from a deep convolutional neural network.
arXiv Detail & Related papers (2020-07-21T04:03:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.