Related papers: World-Shaper: A Unified Framework for 360° Panoramic Editing

World-Shaper: A Unified Framework for 360° Panoramic Editing

URL: http://arxiv.org/abs/2602.00265v1
Date: Fri, 30 Jan 2026 19:38:54 GMT
Title: World-Shaper: A Unified Framework for 360° Panoramic Editing
Authors: Dong Liang, Yuhao Liu, Jinyuan Jia, Youjun Zhao, Rynson W. H. Lau,
Abstract summary: Existing perspective-based image editing methods fail to model the spatial structure of panoramas.<n>We present World-Shaper, a unified geometry-aware framework that bridges panoramic generation and editing within a single editing-centric design.<n>Our method achieves superior geometric consistency, editing fidelity, and text controllability compared to SOTA methods.
Score: 57.174341220144605
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Being able to edit panoramic images is crucial for creating realistic 360° visual experiences. However, existing perspective-based image editing methods fail to model the spatial structure of panoramas. Conventional cube-map decompositions attempt to overcome this problem but inevitably break global consistency due to their mismatch with spherical geometry. Motivated by this insight, we reformulate panoramic editing directly in the equirectangular projection (ERP) domain and present World-Shaper, a unified geometry-aware framework that bridges panoramic generation and editing within a single editing-centric design. To overcome the scarcity of paired data, we adopt a generate-then-edit paradigm, where controllable panoramic generation serves as an auxiliary stage to synthesize diverse paired examples for supervised editing learning. To address geometric distortion, we introduce a geometry-aware learning strategy that explicitly enforces position-aware shape supervision and implicitly internalizes panoramic priors through progressive training. Extensive experiments on our new benchmark, PEBench, demonstrate that our method achieves superior geometric consistency, editing fidelity, and text controllability compared to SOTA methods, enabling coherent and flexible 360° visual world creation with unified editing control. Code, model, and data will be released at our project page: https://world-shaper-project.github.io/

Related papers

Geometric Image Editing via Effects-Sensitive In-Context Inpainting with Diffusion Transformers [41.08668138583002]
GeoEdit is a framework that integrates geometric transformations for precise object edits.<n>Effects-context Attention enhances the modeling of intricate lighting and shadow effects for improved realism.<n> RS-Objects is a large-scale geometric editing dataset containing over 120,000 high-quality image pairs.
arXiv Detail & Related papers (2026-02-09T08:39:47Z)
AnchoredDream: Zero-Shot 360° Indoor Scene Generation from a Single View via Geometric Grounding [58.90269958632018]
Single-view indoor scene generation plays a crucial role in a range of real-world applications.<n>Recent approaches have made progress by leveraging diffusion models and depth estimation networks.<n>We propose AnchoredDream, a novel zero-shot pipeline that anchors 360 scene generation on high-fidelity geometry.
arXiv Detail & Related papers (2026-01-23T08:08:12Z)
SE360: Semantic Edit in 360$^\circ$ Panoramas via Hierarchical Data Construction [14.137976445056466]
SE360 is a novel framework for multi-condition guided object editing in 360$circ$ panoramas.<n>At its core is a novel coarse-to-fine autonomous data generation pipeline without manual intervention.<n>Our experiments demonstrate that our method outperforms existing methods in both visual quality and semantic accuracy.
arXiv Detail & Related papers (2025-12-23T00:24:46Z)
DiT360: High-Fidelity Panoramic Image Generation via Hybrid Training [76.82789568988557]
DiT360 is a DiT-based framework that performs hybrid training on perspective and panoramic data for panoramic image generation.<n>Our method achieves better boundary consistency and image fidelity across eleven quantitative metrics.
arXiv Detail & Related papers (2025-10-13T17:59:15Z)
Dragging with Geometry: From Pixels to Geometry-Guided Image Editing [42.176957681367185]
We propose a novel geometry-guided drag-based image editing method - GeoDrag.<n>Built upon a unified displacement field that jointly encodes 3D geometry and 2D spatial priors, GeoDrag enables coherent, high-fidelity, and structure-consistent editing.
arXiv Detail & Related papers (2025-09-30T03:53:11Z)
Follow-Your-Shape: Shape-Aware Image Editing via Trajectory-Guided Region Control [52.87568958372421]
Follow-Your-Shape is a training-free and mask-free framework that supports precise and controllable editing of object shapes.<n>We compute a Trajectory Divergence Map (TDM) by comparing token-wise velocity differences between the inversion and denoising paths.<n>Our method achieves superior editability and visual fidelity, particularly in tasks requiring large-scale shape replacement.
arXiv Detail & Related papers (2025-08-11T16:10:00Z)
SphereDrag: Spherical Geometry-Aware Panoramic Image Editing [53.87789202723925]
We propose SphereDrag, a novel panoramic editing framework utilizing spherical geometry knowledge for accurate and controllable editing.<n>Specifically, adaptive reprojection (AR) uses adaptive spherical rotation to deal with discontinuity; great-circle trajectory adjustment (GCTA) tracks the movement trajectory more accurate.<n>Also, we construct PanoBench, a panoramic editing benchmark, including complex editing tasks involving multiple objects and diverse styles, which provides a standardized evaluation framework.
arXiv Detail & Related papers (2025-06-13T15:13:09Z)
VASE: Object-Centric Appearance and Shape Manipulation of Real Videos [108.60416277357712]
In this work, we introduce a framework that is object-centric and is designed to control both the object's appearance and, notably, to execute precise and explicit structural modifications on the object. We build our framework on a pre-trained image-conditioned diffusion model, integrate layers to handle the temporal dimension, and propose training strategies and architectural modifications to enable shape control. We evaluate our method on the image-driven video editing task showing similar performance to the state-of-the-art, and showcasing novel shape-editing capabilities.
arXiv Detail & Related papers (2024-01-04T18:59:24Z)
SINE: Semantic-driven Image-based NeRF Editing with Prior-guided Editing Field [37.8162035179377]
We present a novel semantic-driven NeRF editing approach, which enables users to edit a neural radiance field with a single image. To achieve this goal, we propose a prior-guided editing field to encode fine-grained geometric and texture editing in 3D space. Our method achieves photo-realistic 3D editing using only a single edited image, pushing the bound of semantic-driven editing in 3D real-world scenes.
arXiv Detail & Related papers (2023-03-23T13:58:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.