Related papers: IntelliCap: Intelligent Guidance for Consistent View Sampling

IntelliCap: Intelligent Guidance for Consistent View Sampling

URL: http://arxiv.org/abs/2508.13043v1
Date: Mon, 18 Aug 2025 16:00:31 GMT
Title: IntelliCap: Intelligent Guidance for Consistent View Sampling
Authors: Ayaka Yasunaga, Hideo Saito, Dieter Schmalstieg, Shohei Mori,
Abstract summary: High-quality view synthesis requires uniform and dense view sampling.<n>Existing approaches to guide humans during image acquisition concentrate on single objects.<n>We propose a novel situated visualization technique for scanning at multiple scales.
Score: 14.791526418738218
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Novel view synthesis from images, for example, with 3D Gaussian splatting, has made great progress. Rendering fidelity and speed are now ready even for demanding virtual reality applications. However, the problem of assisting humans in collecting the input images for these rendering algorithms has received much less attention. High-quality view synthesis requires uniform and dense view sampling. Unfortunately, these requirements are not easily addressed by human camera operators, who are in a hurry, impatient, or lack understanding of the scene structure and the photographic process. Existing approaches to guide humans during image acquisition concentrate on single objects or neglect view-dependent material characteristics. We propose a novel situated visualization technique for scanning at multiple scales. During the scanning of a scene, our method identifies important objects that need extended image coverage to properly represent view-dependent appearance. To this end, we leverage semantic segmentation and category identification, ranked by a vision-language model. Spherical proxies are generated around highly ranked objects to guide the user during scanning. Our results show superior performance in real scenes compared to conventional view sampling strategies.

Related papers

Exploiting Radiance Fields for Grasp Generation on Novel Synthetic Views [7.305342793164903]
We show initial results which indicate that novel view synthesis can provide additional context in generating grasp poses.<n>Our experiments on the Graspnet-1billion dataset show that novel views contributed force-closure grasps.<n>In the future we hope this work can be extended to improve grasp extraction from radiance fields constructed with a single input image.
arXiv Detail & Related papers (2025-05-16T17:23:09Z)
Knowledge-Guided Prompt Learning for Deepfake Facial Image Detection [54.26588902144298]
We propose a knowledge-guided prompt learning method for deepfake facial image detection.<n>Specifically, we retrieve forgery-related prompts from large language models as expert knowledge to guide the optimization of learnable prompts.<n>Our proposed approach notably outperforms state-of-the-art methods.
arXiv Detail & Related papers (2025-01-01T02:18:18Z)
Caption-Driven Explorations: Aligning Image and Text Embeddings through Human-Inspired Foveated Vision [3.3295510777293837]
We introduce CapMIT1003, a dataset with captions and click-contingent image explorations, to study human attention during the captioning task. We also present NevaClip, a zero-shot method for predicting visual scanpaths by combining CLIP models with NeVA algorithms.
arXiv Detail & Related papers (2024-08-19T12:41:46Z)
Sampling for View Synthesis: From Local Light Field Fusion to Neural Radiance Fields and Beyond [27.339452004523082]
Local light field fusion proposes an algorithm for practical view synthesis from an irregular grid of sampled views. We achieve the perceptual quality of Nyquist rate view sampling while using up to 4000x fewer views. We reprise some of the recent results on sparse and even single image view synthesis.
arXiv Detail & Related papers (2024-08-08T16:56:03Z)
MetaCap: Meta-learning Priors from Multi-View Imagery for Sparse-view Human Performance Capture and Rendering [91.76893697171117]
We propose a method for efficient and high-quality geometry recovery and novel view synthesis given very sparse or even a single view of the human. Our key idea is to meta-learn the radiance field weights solely from potentially sparse multi-view videos. We collect a new dataset, WildDynaCap, which contains subjects captured in, both, a dense camera dome and in-the-wild sparse camera rigs.
arXiv Detail & Related papers (2024-03-27T17:59:54Z)
HawkI: Homography & Mutual Information Guidance for 3D-free Single Image to Aerial View [67.8213192993001]
We present HawkI, for synthesizing aerial-view images from text and an exemplar image. HawkI blends the visual features from the input image within a pretrained text-to-2Dimage stable diffusion model. At inference, HawkI employs a unique mutual information guidance formulation to steer the generated image towards faithfully replicating the semantic details of the input-image.
arXiv Detail & Related papers (2023-11-27T01:41:25Z)
SAMPLING: Scene-adaptive Hierarchical Multiplane Images Representation for Novel View Synthesis from a Single Image [60.52991173059486]
We introduce SAMPLING, a Scene-adaptive Hierarchical Multiplane Images Representation for Novel View Synthesis from a Single Image. Our method demonstrates considerable performance gains in large-scale unbounded outdoor scenes using a single image on the KITTI dataset.
arXiv Detail & Related papers (2023-09-12T15:33:09Z)
Real-Time Neural Character Rendering with Pose-Guided Multiplane Images [75.62730144924566]
We propose pose-guided multiplane image (MPI) synthesis which can render an animatable character in real scenes with photorealistic quality. We use a portable camera rig to capture the multi-view images along with the driving signal for the moving subject.
arXiv Detail & Related papers (2022-04-25T17:51:38Z)
Modeling human visual search: A combined Bayesian searcher and saliency map approach for eye movement guidance in natural scenes [0.0]
We propose a unified Bayesian model for visual search guided by saliency maps as prior information. We show that state-of-the-art saliency models perform well in predicting the first two fixations in a visual search task, but their performance degrades to chance afterward. This suggests that saliency maps alone are good to model bottom-up first impressions, but are not enough to explain the scanpaths when top-down task information is critical.
arXiv Detail & Related papers (2020-09-17T15:38:23Z)
NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis [78.5281048849446]
We present a method that achieves state-of-the-art results for synthesizing novel views of complex scenes. Our algorithm represents a scene using a fully-connected (non-convolutional) deep network. Because volume rendering is naturally differentiable, the only input required to optimize our representation is a set of images with known camera poses.
arXiv Detail & Related papers (2020-03-19T17:57:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.