Multi-Round Region-Based Optimization for Scene Sketching
- URL: http://arxiv.org/abs/2410.04072v1
- Date: Sat, 5 Oct 2024 08:04:26 GMT
- Title: Multi-Round Region-Based Optimization for Scene Sketching
- Authors: Yiqi Liang, Ying Liu, Dandan Long, Ruihui Li,
- Abstract summary: Scene sketching requires semantic understanding of the scene and consideration of different regions within the scene.
We optimize the different regions of input scene in multiple rounds.
A novel CLIP-Based Semantic loss and a VGG-Based Feature loss are utilized to guide our multi-round optimization.
- Score: 7.281215486388827
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Scene sketching is to convert a scene into a simplified, abstract representation that captures the essential elements and composition of the original scene. It requires semantic understanding of the scene and consideration of different regions within the scene. Since scenes often contain diverse visual information across various regions, such as foreground objects, background elements, and spatial divisions, dealing with these different regions poses unique difficulties. In this paper, we define a sketch as some sets of Bezier curves. We optimize the different regions of input scene in multiple rounds. In each round of optimization, strokes sampled from the next region can seamlessly be integrated into the sketch generated in the previous round of optimization. We propose additional stroke initialization method to ensure the integrity of the scene and the convergence of optimization. A novel CLIP-Based Semantic loss and a VGG-Based Feature loss are utilized to guide our multi-round optimization. Extensive experimental results on the quality and quantity of the generated sketches confirm the effectiveness of our method.
Related papers
- SplatLoc: 3D Gaussian Splatting-based Visual Localization for Augmented Reality [50.179377002092416]
We propose an efficient visual localization method capable of high-quality rendering with fewer parameters.
Our method achieves superior or comparable rendering and localization performance to state-of-the-art implicit-based visual localization approaches.
arXiv Detail & Related papers (2024-09-21T08:46:16Z) - Efficient Scene Appearance Aggregation for Level-of-Detail Rendering [42.063285161104474]
We present a novel volumetric representation for the aggregated appearance of complex scenes.
We tackle the challenge of capturing the correlation existing locally within a voxel and globally across different parts of the scene.
arXiv Detail & Related papers (2024-08-19T01:01:12Z) - Boosting Self-Supervision for Single-View Scene Completion via Knowledge Distillation [39.08243715525956]
Inferring scene geometry from images via Structure from Motion is a long-standing and fundamental problem in computer vision.
With the popularity of neural radiance fields (NeRFs), implicit representations also became popular for scene completion.
We propose to fuse the scene reconstruction from multiple images and distill this knowledge into a more accurate single-view scene reconstruction.
arXiv Detail & Related papers (2024-04-11T17:30:24Z) - Adaptive Region Selection for Active Learning in Whole Slide Image
Semantic Segmentation [3.1392713791311766]
Region-based active learning (AL) involves training the model on a limited number of annotated image regions.
We introduce a novel technique for selecting annotation regions adaptively, mitigating the reliance on this AL hyper parameter.
We evaluate our method using the task of breast cancer segmentation on the public CAMELYON16 dataset.
arXiv Detail & Related papers (2023-07-14T05:34:13Z) - Differentiable Blocks World: Qualitative 3D Decomposition by Rendering
Primitives [70.32817882783608]
We present an approach that produces a simple, compact, and actionable 3D world representation by means of 3D primitives.
Unlike existing primitive decomposition methods that rely on 3D input data, our approach operates directly on images.
We show that the resulting textured primitives faithfully reconstruct the input images and accurately model the visible 3D points.
arXiv Detail & Related papers (2023-07-11T17:58:31Z) - Partially Does It: Towards Scene-Level FG-SBIR with Partial Input [106.59164595640704]
A significant portion of scene sketches are "partial"
We propose a set-based approach to model cross-modal region associativity in a partially-aware fashion.
Our proposed method is not only robust to partial scene-sketches but also yields state-of-the-art performance on existing datasets.
arXiv Detail & Related papers (2022-03-28T14:44:45Z) - IBRNet: Learning Multi-View Image-Based Rendering [67.15887251196894]
We present a method that synthesizes novel views of complex scenes by interpolating a sparse set of nearby views.
By drawing on source views at render time, our method hearkens back to classic work on image-based rendering.
arXiv Detail & Related papers (2021-02-25T18:56:21Z) - Image Stitching Based on Planar Region Consensus [22.303750435673752]
We propose a new image stitching method which stitches images by allowing for the alignment of a set of matched dominant planar regions.
We use rich semantic information directly from RGB images to extract planar image regions with a deep Convolutional Neural Network (CNN)
Our method can deal with different situations and outperforms the state-of-the-arts on challenging scenes.
arXiv Detail & Related papers (2020-07-06T13:07:20Z) - Multi-View Optimization of Local Feature Geometry [70.18863787469805]
We address the problem of refining the geometry of local image features from multiple views without known scene or camera geometry.
Our proposed method naturally complements the traditional feature extraction and matching paradigm.
We show that our method consistently improves the triangulation and camera localization performance for both hand-crafted and learned local features.
arXiv Detail & Related papers (2020-03-18T17:22:11Z) - Depth Based Semantic Scene Completion with Position Importance Aware
Loss [52.06051681324545]
PALNet is a novel hybrid network for semantic scene completion.
It extracts both 2D and 3D features from multi-stages using fine-grained depth information.
It is beneficial for recovering key details like the boundaries of objects and the corners of the scene.
arXiv Detail & Related papers (2020-01-29T07:05:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.