Dual Pose-invariant Embeddings: Learning Category and Object-specific
Discriminative Representations for Recognition and Retrieval
- URL: http://arxiv.org/abs/2403.00272v1
- Date: Fri, 1 Mar 2024 04:20:13 GMT
- Title: Dual Pose-invariant Embeddings: Learning Category and Object-specific
Discriminative Representations for Recognition and Retrieval
- Authors: Rohan Sarkar, Avinash Kak
- Abstract summary: We show that it is possible to achieve significant improvements in performance if both the category-based and the object-identity-based embeddings are learned simultaneously during training.
This paper presents an attention-based dual-encoder architecture with specially designed loss functions that optimize the inter- and intra-class distances.
We demonstrate the power of our approach with three challenging multi-view datasets.
- Score: 0.7770029179741429
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In the context of pose-invariant object recognition and retrieval, we
demonstrate that it is possible to achieve significant improvements in
performance if both the category-based and the object-identity-based embeddings
are learned simultaneously during training. In hindsight, that sounds intuitive
because learning about the categories is more fundamental than learning about
the individual objects that correspond to those categories. However, to the
best of what we know, no prior work in pose-invariant learning has demonstrated
this effect. This paper presents an attention-based dual-encoder architecture
with specially designed loss functions that optimize the inter- and intra-class
distances simultaneously in two different embedding spaces, one for the
category embeddings and the other for the object-level embeddings. The loss
functions we have proposed are pose-invariant ranking losses that are designed
to minimize the intra-class distances and maximize the inter-class distances in
the dual representation spaces. We demonstrate the power of our approach with
three challenging multi-view datasets, ModelNet-40, ObjectPI, and FG3D. With
our dual approach, for single-view object recognition, we outperform the
previous best by 20.0% on ModelNet40, 2.0% on ObjectPI, and 46.5% on FG3D. On
the other hand, for single-view object retrieval, we outperform the previous
best by 33.7% on ModelNet40, 18.8% on ObjectPI, and 56.9% on FG3D.
Related papers
- Multi-body SE(3) Equivariance for Unsupervised Rigid Segmentation and
Motion Estimation [49.56131393810713]
We present an SE(3) equivariant architecture and a training strategy to tackle this task in an unsupervised manner.
Our method excels in both model performance and computational efficiency, with only 0.25M parameters and 0.92G FLOPs.
arXiv Detail & Related papers (2023-06-08T22:55:32Z) - Class Anchor Margin Loss for Content-Based Image Retrieval [97.81742911657497]
We propose a novel repeller-attractor loss that falls in the metric learning paradigm, yet directly optimize for the L2 metric without the need of generating pairs.
We evaluate the proposed objective in the context of few-shot and full-set training on the CBIR task, by using both convolutional and transformer architectures.
arXiv Detail & Related papers (2023-06-01T12:53:10Z) - CARD: Semantic Segmentation with Efficient Class-Aware Regularized
Decoder [31.223271128719603]
We propose a universal Class-Aware Regularization (CAR) approach to optimize the intra-class variance and inter-class distance during feature learning.
CAR can be directly applied to most existing segmentation models during training, and can largely improve their accuracy at no additional inference overhead.
arXiv Detail & Related papers (2023-01-11T01:41:37Z) - DcnnGrasp: Towards Accurate Grasp Pattern Recognition with Adaptive
Regularizer Learning [13.08779945306727]
Current state-of-the-art methods ignore category information of objects which is crucial for grasp pattern recognition.
This paper presents a novel dual-branch convolutional neural network (DcnnGrasp) to achieve joint learning of object category classification and grasp pattern recognition.
arXiv Detail & Related papers (2022-05-11T00:34:27Z) - The Overlooked Classifier in Human-Object Interaction Recognition [82.20671129356037]
We encode the semantic correlation among classes into the classification head by initializing the weights with language embeddings of HOIs.
We propose a new loss named LSE-Sign to enhance multi-label learning on a long-tailed dataset.
Our simple yet effective method enables detection-free HOI classification, outperforming the state-of-the-arts that require object detection and human pose by a clear margin.
arXiv Detail & Related papers (2022-03-10T23:35:00Z) - Dual Prototypical Contrastive Learning for Few-shot Semantic
Segmentation [55.339405417090084]
We propose a dual prototypical contrastive learning approach tailored to the few-shot semantic segmentation (FSS) task.
The main idea is to encourage the prototypes more discriminative by increasing inter-class distance while reducing intra-class distance in prototype feature space.
We demonstrate that the proposed dual contrastive learning approach outperforms state-of-the-art FSS methods on PASCAL-5i and COCO-20i datasets.
arXiv Detail & Related papers (2021-11-09T08:14:50Z) - Single-stage Keypoint-based Category-level Object Pose Estimation from
an RGB Image [27.234658117816103]
We propose a single-stage, keypoint-based approach for category-level object pose estimation.
The proposed network performs 2D object detection, detects 2D keypoints, estimates 6-DoF pose, and regresses relative bounding cuboid dimensions.
We conduct extensive experiments on the challenging Objectron benchmark, outperforming state-of-the-art methods on the 3D IoU metric.
arXiv Detail & Related papers (2021-09-13T17:55:00Z) - Objectness-Aware Few-Shot Semantic Segmentation [31.13009111054977]
We show how to increase overall model capacity to achieve improved performance.
We introduce objectness, which is class-agnostic and so not prone to overfitting.
Given only one annotated example of an unseen category, experiments show that our method outperforms state-of-art methods with respect to mIoU.
arXiv Detail & Related papers (2020-04-06T19:12:08Z) - Learning What to Learn for Video Object Segmentation [157.4154825304324]
We introduce an end-to-end trainable VOS architecture that integrates a differentiable few-shot learning module.
This internal learner is designed to predict a powerful parametric model of the target.
We set a new state-of-the-art on the large-scale YouTube-VOS 2018 dataset by achieving an overall score of 81.5.
arXiv Detail & Related papers (2020-03-25T17:58:43Z) - Pairwise Similarity Knowledge Transfer for Weakly Supervised Object
Localization [53.99850033746663]
We study the problem of learning localization model on target classes with weakly supervised image labels.
In this work, we argue that learning only an objectness function is a weak form of knowledge transfer.
Experiments on the COCO and ILSVRC 2013 detection datasets show that the performance of the localization model improves significantly with the inclusion of pairwise similarity function.
arXiv Detail & Related papers (2020-03-18T17:53:33Z) - Triangle-Net: Towards Robustness in Point Cloud Learning [0.0]
We propose a novel approach for 3D classification that can simultaneously achieve invariance towards rotation, positional shift, scaling, and is robust to point sparsity.
We show that our approach outperforms PointNet and 3DmFV by 35.0% and 28.1% respectively in ModelNet 40 classification tasks.
arXiv Detail & Related papers (2020-02-27T20:42:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.