Learning to Select Camera Views: Efficient Multiview Understanding at
Few Glances
- URL: http://arxiv.org/abs/2303.06145v1
- Date: Fri, 10 Mar 2023 18:59:10 GMT
- Title: Learning to Select Camera Views: Efficient Multiview Understanding at
Few Glances
- Authors: Yunzhong Hou, Stephen Gould, Liang Zheng
- Abstract summary: We propose a view selection approach that analyzes the target object or scenario from given views and selects the next best view for processing.
Our approach features a reinforcement learning based camera selection module, MVSelect, that not only selects views but also facilitates joint training with the task network.
- Score: 59.34619548026885
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multiview camera setups have proven useful in many computer vision
applications for reducing ambiguities, mitigating occlusions, and increasing
field-of-view coverage. However, the high computational cost associated with
multiple views poses a significant challenge for end devices with limited
computational resources. To address this issue, we propose a view selection
approach that analyzes the target object or scenario from given views and
selects the next best view for processing. Our approach features a
reinforcement learning based camera selection module, MVSelect, that not only
selects views but also facilitates joint training with the task network.
Experimental results on multiview classification and detection tasks show that
our approach achieves promising performance while using only 2 or 3 out of N
available views, significantly reducing computational costs. Furthermore,
analysis on the selected views reveals that certain cameras can be shut off
with minimal performance impact, shedding light on future camera layout
optimization for multiview systems. Code is available at
https://github.com/hou-yz/MVSelect.
Related papers
- Which Viewpoint Shows it Best? Language for Weakly Supervising View Selection in Multi-view Videos [66.1935609072708]
Key hypothesis is that the more accurately an individual view can predict a view-agnostic text summary, the more informative it is.
We propose a framework that uses the relative accuracy of view-dependent caption predictions as a proxy for best view pseudo-labels.
During inference, our model takes as input only a multi-view video -- no language or camera poses -- and returns the best viewpoint to watch at each timestep.
arXiv Detail & Related papers (2024-11-13T16:31:08Z) - Multi-View People Detection in Large Scenes via Supervised View-Wise Contribution Weighting [44.48514301889318]
This paper focuses on improving multi-view people detection by developing a supervised view-wise contribution weighting approach.
A large synthetic dataset is adopted to enhance the model's generalization ability.
Experimental results demonstrate the effectiveness of our approach in achieving promising cross-scene multi-view people detection performance.
arXiv Detail & Related papers (2024-05-30T11:03:27Z) - What is Point Supervision Worth in Video Instance Segmentation? [119.71921319637748]
Video instance segmentation (VIS) is a challenging vision task that aims to detect, segment, and track objects in videos.
We reduce the human annotations to only one point for each object in a video frame during training, and obtain high-quality mask predictions close to fully supervised models.
Comprehensive experiments on three VIS benchmarks demonstrate competitive performance of the proposed framework, nearly matching fully supervised methods.
arXiv Detail & Related papers (2024-04-01T17:38:25Z) - Investigating and Mitigating the Side Effects of Noisy Views for Self-Supervised Clustering Algorithms in Practical Multi-View Scenarios [35.32285779434823]
Multi-view clustering (MVC) aims at exploring category structures among multi-view data in self-supervised manners.
noisy views might seriously degenerate when the views are noisy in practical multi-view scenarios.
We propose a theoretically grounded deep MVC method (namely MVCAN) to address this issue.
arXiv Detail & Related papers (2023-03-30T09:22:17Z) - A Simple Baseline for Multi-Camera 3D Object Detection [94.63944826540491]
3D object detection with surrounding cameras has been a promising direction for autonomous driving.
We present SimMOD, a Simple baseline for Multi-camera Object Detection.
We conduct extensive experiments on the 3D object detection benchmark of nuScenes to demonstrate the effectiveness of SimMOD.
arXiv Detail & Related papers (2022-08-22T03:38:01Z) - Cross-View Cross-Scene Multi-View Crowd Counting [56.83882084112913]
Multi-view crowd counting has been previously proposed to utilize multi-cameras to extend the field-of-view of a single camera.
We propose a cross-view cross-scene (CVCS) multi-view crowd counting paradigm, where the training and testing occur on different scenes with arbitrary camera layouts.
arXiv Detail & Related papers (2022-05-03T15:03:44Z) - Few-Shot Learning for Video Object Detection in a Transfer-Learning
Scheme [70.45901040613015]
We study the new problem of few-shot learning for video object detection.
We employ a transfer-learning framework to effectively train the video object detector on a large number of base-class objects and a few video clips of novel-class objects.
arXiv Detail & Related papers (2021-03-26T20:37:55Z) - Generalized Multi-view Shared Subspace Learning using View Bootstrapping [43.027427742165095]
Key objective in multi-view learning is to model the information common to multiple parallel views of a class of objects/events to improve downstream learning tasks.
We present a neural method based on multi-view correlation to capture the information shared across a large number of views by subsampling them in a view-agnostic manner during training.
Experiments on spoken word recognition, 3D object classification and pose-invariant face recognition demonstrate the robustness of view bootstrapping to model a large number of views.
arXiv Detail & Related papers (2020-05-12T20:35:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.