Related papers: Object-level Geometric Structure Preserving for Natural Image Stitching

Object-level Geometric Structure Preserving for Natural Image Stitching

URL: http://arxiv.org/abs/2402.12677v3
Date: Fri, 9 Aug 2024 13:59:42 GMT
Title: Object-level Geometric Structure Preserving for Natural Image Stitching
Authors: Wenxiao Cai, Wankou Yang,
Abstract summary: We endeavour to safeguard the overall OBJect-level structures within images based on Global Similarity Prior (OBJ-GSP) Triangular meshes are employed in image transformation to protect the overall shapes of objects within images. We propose StitchBench, the most comprehensive image stitching benchmark by far.
Score: 11.884195814743249
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The topic of stitching images with globally natural structures holds paramount significance, with two main goals: alignment and distortion prevention. The existing approaches exhibit the ability to align well, yet fall short in maintaining object structures. In this paper, we endeavour to safeguard the overall OBJect-level structures within images based on Global Similarity Prior (OBJ-GSP), on the basis of good alignment performance. Our approach leverages semantic segmentation models like the family of Segment Anything Model to extract the contours of any objects in a scene. Triangular meshes are employed in image transformation to protect the overall shapes of objects within images. The balance between alignment and distortion prevention is achieved by allowing the object meshes to strike a balance between similarity and projective transformation. We also demonstrate the importance of segmentation in low-altitude aerial image stitching. Additionally, we propose StitchBench, the most comprehensive image stitching benchmark by far. Extensive experimental results demonstrate that OBJ-GSP outperforms existing methods in both alignment and shape preservation. Code and dataset is publicly available at \url{https://github.com/RussRobin/OBJ-GSP}.

Related papers

Zero-shot Inexact CAD Model Alignment from a Single Image [53.37898107159792]
A practical approach to infer 3D scene structure from a single image is to retrieve a closely matching 3D model from a database and align it with the object in the image.<n>Existing methods rely on supervised training with images and pose annotations, which limits them to a narrow set of object categories.<n>We propose a weakly supervised 9-DoF alignment method for inexact 3D models that requires no pose annotations and generalizes to unseen categories.
arXiv Detail & Related papers (2025-07-04T04:46:59Z)
Refine Any Object in Any Scene [39.109559659959]
Refine Any object In any ScenE (RAISE) is a novel 3D enhancement framework that recovers fine-grained object geometry and appearance under missing views.<n>RAISE progressively refines geometry and texture by aligning each proxy to its degraded counterpart in 7-DOF pose.<n> experiments on challenging benchmarks show that RAISE significantly outperforms state-of-the-art methods in both novel view synthesis and geometry completion tasks.
arXiv Detail & Related papers (2025-06-30T13:26:21Z)
Orient Anything: Learning Robust Object Orientation Estimation from Rendering 3D Models [79.96917782423219]
Orient Anything is the first expert and foundational model designed to estimate object orientation in a single image. By developing a pipeline to annotate the front face of 3D objects, we collect 2M images with precise orientation annotations. Our model achieves state-of-the-art orientation estimation accuracy in both rendered and real images.
arXiv Detail & Related papers (2024-12-24T18:58:43Z)
SINGAPO: Single Image Controlled Generation of Articulated Parts in Objects [20.978091381109294]
We propose a method to generate articulated objects from a single image. Our method generates an articulated object that is visually consistent with the input image. Our experiments show that our method outperforms the state-of-the-art in articulated object creation.
arXiv Detail & Related papers (2024-10-21T20:41:32Z)
FoundPose: Unseen Object Pose Estimation with Foundation Features [11.32559845631345]
FoundPose is a model-based method for 6D pose estimation of unseen objects from a single RGB image. The method can quickly onboard new objects using their 3D models without requiring any object- or task-specific training.
arXiv Detail & Related papers (2023-11-30T18:52:29Z)
Generative Category-Level Shape and Pose Estimation with Semantic Primitives [27.692997522812615]
We propose a novel framework for category-level object shape and pose estimation from a single RGB-D image. To handle the intra-category variation, we adopt a semantic primitive representation that encodes diverse shapes into a unified latent space. We show that the proposed method achieves SOTA pose estimation performance and better generalization in the real-world dataset.
arXiv Detail & Related papers (2022-10-03T17:51:54Z)
Learning Object Placement via Dual-path Graph Completion [28.346027247882354]
Object placement aims to place a foreground object over a background image with a suitable location and size. In this work, we treat object placement as a graph completion problem and propose a novel graph completion module (GCM) The foreground object is encoded as a special node that should be inserted at a reasonable place in this graph.
arXiv Detail & Related papers (2022-07-23T08:39:39Z)
Towards Self-Supervised Category-Level Object Pose and Size Estimation [121.28537953301951]
This work presents a self-supervised framework for category-level object pose and size estimation from a single depth image. We leverage the geometric consistency residing in point clouds of the same shape for self-supervision.
arXiv Detail & Related papers (2022-03-06T06:02:30Z)
ELLIPSDF: Joint Object Pose and Shape Optimization with a Bi-level Ellipsoid and Signed Distance Function Description [9.734266860544663]
This paper proposes an expressive yet compact model for joint object pose and shape optimization. It infers an object-level map from multi-view RGB-D camera observations. Our approach is evaluated on the large-scale real-world ScanNet dataset and compared against state-of-the-art methods.
arXiv Detail & Related papers (2021-08-01T03:07:31Z)
DONet: Learning Category-Level 6D Object Pose and Size Estimation from Depth Observation [53.55300278592281]
We propose a method of Category-level 6D Object Pose and Size Estimation (COPSE) from a single depth image. Our framework makes inferences based on the rich geometric information of the object in the depth channel alone. Our framework competes with state-of-the-art approaches that require labeled real-world images.
arXiv Detail & Related papers (2021-06-27T10:41:50Z)
Scene Graph to Image Generation with Contextualized Object Layout Refinement [92.85331019618332]
We propose a novel method to generate images from scene graphs. Our approach improves the layout coverage by almost 20 points and drops object overlap to negligible amounts.
arXiv Detail & Related papers (2020-09-23T06:27:54Z)
Improving Semantic Segmentation via Decoupled Body and Edge Supervision [89.57847958016981]
Existing semantic segmentation approaches either aim to improve the object's inner consistency by modeling the global context, or refine objects detail along their boundaries by multi-scale feature fusion. In this paper, a new paradigm for semantic segmentation is proposed. Our insight is that appealing performance of semantic segmentation requires textitexplicitly modeling the object textitbody and textitedge, which correspond to the high and low frequency of the image. We show that the proposed framework with various baselines or backbone networks leads to better object inner consistency and object boundaries.
arXiv Detail & Related papers (2020-07-20T12:11:22Z)
Perspective Plane Program Induction from a Single Image [85.28956922100305]
We study the inverse graphics problem of inferring a holistic representation for natural images. We formulate this problem as jointly finding the camera pose and scene structure that best describe the input image. Our proposed framework, Perspective Plane Program Induction (P3I), combines search-based and gradient-based algorithms to efficiently solve the problem.
arXiv Detail & Related papers (2020-06-25T21:18:58Z)
UCLID-Net: Single View Reconstruction in Object Space [60.046383053211215]
We show that building a geometry preserving 3-dimensional latent space helps the network concurrently learn global shape regularities and local reasoning in the object coordinate space. We demonstrate both on ShapeNet synthetic images, which are often used for benchmarking purposes, and on real-world images that our approach outperforms state-of-the-art ones.
arXiv Detail & Related papers (2020-06-06T09:15:56Z)
Object-Centric Image Generation from Layouts [93.10217725729468]
We develop a layout-to-image-generation method to generate complex scenes with multiple objects. Our method learns representations of the spatial relationships between objects in the scene, which lead to our model's improved layout-fidelity. We introduce SceneFID, an object-centric adaptation of the popular Fr'echet Inception Distance metric, that is better suited for multi-object images.
arXiv Detail & Related papers (2020-03-16T21:40:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.