Improving Viewpoint-Independent Object-Centric Representations through Active Viewpoint Selection
- URL: http://arxiv.org/abs/2411.00402v1
- Date: Fri, 01 Nov 2024 07:01:44 GMT
- Title: Improving Viewpoint-Independent Object-Centric Representations through Active Viewpoint Selection
- Authors: Yinxuan Huang, Chengmin Gao, Bin Li, Xiangyang Xue,
- Abstract summary: We propose a novel active viewpoint selection strategy for object-centric learning.
It predicts images from unknown viewpoints based on information from observation images for each scene.
Our method can accurately predict images from unknown viewpoints.
- Score: 41.419853273742746
- License:
- Abstract: Given the complexities inherent in visual scenes, such as object occlusion, a comprehensive understanding often requires observation from multiple viewpoints. Existing multi-viewpoint object-centric learning methods typically employ random or sequential viewpoint selection strategies. While applicable across various scenes, these strategies may not always be ideal, as certain scenes could benefit more from specific viewpoints. To address this limitation, we propose a novel active viewpoint selection strategy. This strategy predicts images from unknown viewpoints based on information from observation images for each scene. It then compares the object-centric representations extracted from both viewpoints and selects the unknown viewpoint with the largest disparity, indicating the greatest gain in information, as the next observation viewpoint. Through experiments on various datasets, we demonstrate the effectiveness of our active viewpoint selection strategy, significantly enhancing segmentation and reconstruction performance compared to random viewpoint selection. Moreover, our method can accurately predict images from unknown viewpoints.
Related papers
- Which Viewpoint Shows it Best? Language for Weakly Supervising View Selection in Multi-view Videos [66.1935609072708]
Key hypothesis is that the more accurately an individual view can predict a view-agnostic text summary, the more informative it is.
We propose a framework that uses the relative accuracy of view-dependent caption predictions as a proxy for best view pseudo-labels.
During inference, our model takes as input only a multi-view video -- no language or camera poses -- and returns the best viewpoint to watch at each timestep.
arXiv Detail & Related papers (2024-11-13T16:31:08Z) - Freeview Sketching: View-Aware Fine-Grained Sketch-Based Image Retrieval [85.73149096516543]
We address the choice of viewpoint during sketch creation in Fine-Grained Sketch-Based Image Retrieval (FG-SBIR)
A pilot study highlights the system's struggle when query-sketches differ in viewpoint from target instances.
To reconcile this, we advocate for a view-aware system, seamlessly accommodating both view-agnostic and view-specific tasks.
arXiv Detail & Related papers (2024-07-01T21:20:44Z) - Unsupervised Object-Centric Learning from Multiple Unspecified
Viewpoints [45.88397367354284]
We consider a novel problem of learning compositional scene representations from multiple unspecified viewpoints without using any supervision.
We propose a deep generative model which separates latent representations into a viewpoint-independent part and a viewpoint-dependent part to solve this problem.
Experiments on several specifically designed synthetic datasets have shown that the proposed method can effectively learn from multiple unspecified viewpoints.
arXiv Detail & Related papers (2024-01-03T15:09:25Z) - Learning to Select Camera Views: Efficient Multiview Understanding at
Few Glances [59.34619548026885]
We propose a view selection approach that analyzes the target object or scenario from given views and selects the next best view for processing.
Our approach features a reinforcement learning based camera selection module, MVSelect, that not only selects views but also facilitates joint training with the task network.
arXiv Detail & Related papers (2023-03-10T18:59:10Z) - A Study on Self-Supervised Object Detection Pretraining [14.38896715041483]
We study different approaches to self-supervised pretraining of object detection models.
We first design a general framework to learn a spatially consistent dense representation from an image.
We study existing design choices in the literature, such as box generation, feature extraction strategies, and using multiple views.
arXiv Detail & Related papers (2022-07-09T03:30:44Z) - Unsupervised Learning of Compositional Scene Representations from
Multiple Unspecified Viewpoints [41.07379505694274]
We consider a novel problem of learning compositional scene representations from multiple unspecified viewpoints without using any supervision.
We propose a deep generative model which separates latent representations into a viewpoint-independent part and a viewpoint-dependent part to solve this problem.
Experiments on several specifically designed synthetic datasets have shown that the proposed method is able to effectively learn from multiple unspecified viewpoints.
arXiv Detail & Related papers (2021-12-07T08:45:21Z) - Weakly Supervised Keypoint Discovery [27.750244813890262]
We propose a method for keypoint discovery from a 2D image using image-level supervision.
Motivated by the weakly-supervised learning approach, our method exploits image-level supervision to identify discriminative parts.
Our approach achieves state-of-the-art performance for the task of keypoint estimation on the limited supervision scenarios.
arXiv Detail & Related papers (2021-09-28T01:26:53Z) - A Simple and Effective Use of Object-Centric Images for Long-Tailed
Object Detection [56.82077636126353]
We take advantage of object-centric images to improve object detection in scene-centric images.
We present a simple yet surprisingly effective framework to do so.
Our approach can improve the object detection (and instance segmentation) accuracy of rare objects by 50% (and 33%) relatively.
arXiv Detail & Related papers (2021-02-17T17:27:21Z) - Random Forest for Dissimilarity-based Multi-view Learning [8.185807285320553]
We show that the Random Forest proximity measure can be used to build the dissimilarity representations.
We then propose a Dynamic View Selection method to better combine the view-specific dissimilarity representations.
arXiv Detail & Related papers (2020-07-16T14:52:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.