Related papers: SE360: Semantic Edit in 360$^\circ$ Panoramas via Hierarchical Data Construction

SE360: Semantic Edit in 360$^\circ$ Panoramas via Hierarchical Data Construction

URL: http://arxiv.org/abs/2512.19943v1
Date: Tue, 23 Dec 2025 00:24:46 GMT
Title: SE360: Semantic Edit in 360$^\circ$ Panoramas via Hierarchical Data Construction
Authors: Haoyi Zhong, Fang-Lue Zhang, Andrew Chalmers, Taehyun Rhee,
Abstract summary: SE360 is a novel framework for multi-condition guided object editing in 360$circ$ panoramas.<n>At its core is a novel coarse-to-fine autonomous data generation pipeline without manual intervention.<n>Our experiments demonstrate that our method outperforms existing methods in both visual quality and semantic accuracy.
Score: 14.137976445056466
License: http://creativecommons.org/licenses/by/4.0/
Abstract: While instruction-based image editing is emerging, extending it to 360$^\circ$ panoramas introduces additional challenges. Existing methods often produce implausible results in both equirectangular projections (ERP) and perspective views. To address these limitations, we propose SE360, a novel framework for multi-condition guided object editing in 360$^\circ$ panoramas. At its core is a novel coarse-to-fine autonomous data generation pipeline without manual intervention. This pipeline leverages a Vision-Language Model (VLM) and adaptive projection adjustment for hierarchical analysis, ensuring the holistic segmentation of objects and their physical context. The resulting data pairs are both semantically meaningful and geometrically consistent, even when sourced from unlabeled panoramas. Furthermore, we introduce a cost-effective, two-stage data refinement strategy to improve data realism and mitigate model overfitting to erase artifacts. Based on the constructed dataset, we train a Transformer-based diffusion model to allow flexible object editing guided by text, mask, or reference image in 360$^\circ$ panoramas. Our experiments demonstrate that our method outperforms existing methods in both visual quality and semantic accuracy.

Related papers

World-Shaper: A Unified Framework for 360° Panoramic Editing [57.174341220144605]
Existing perspective-based image editing methods fail to model the spatial structure of panoramas.<n>We present World-Shaper, a unified geometry-aware framework that bridges panoramic generation and editing within a single editing-centric design.<n>Our method achieves superior geometric consistency, editing fidelity, and text controllability compared to SOTA methods.
arXiv Detail & Related papers (2026-01-30T19:38:54Z)
ChartE$^{3}$: A Comprehensive Benchmark for End-to-End Chart Editing [64.65742943745866]
ChartE$3$ is an End-to-End Chart Editing benchmark.<n>It directly evaluates models without relying on intermediate natural language programs or code-level supervision.<n>It contains over 1,200 high-quality samples constructed via a well-designed data pipeline with human curation.
arXiv Detail & Related papers (2026-01-29T13:29:27Z)
DiT360: High-Fidelity Panoramic Image Generation via Hybrid Training [76.82789568988557]
DiT360 is a DiT-based framework that performs hybrid training on perspective and panoramic data for panoramic image generation.<n>Our method achieves better boundary consistency and image fidelity across eleven quantitative metrics.
arXiv Detail & Related papers (2025-10-13T17:59:15Z)
Hallucinating 360°: Panoramic Street-View Generation via Local Scenes Diffusion and Probabilistic Prompting [20.14129939772052]
We propose the first panoramic generation method Percep360 for autonomous driving.<n>Percep360 enables coherent generation of panoramic data with control signals based on the stitched panoramic data.<n>We evaluate the effectiveness of the generated images from three perspectives.
arXiv Detail & Related papers (2025-07-09T16:01:41Z)
Leader360V: The Large-scale, Real-world 360 Video Dataset for Multi-task Learning in Diverse Environment [19.70383859926191]
Leader360V is the first large-scale, labeled real-world 360 video datasets for instance segmentation and tracking.<n>Our datasets enjoy high scene diversity, ranging from indoor and urban settings to natural and dynamic outdoor scenes.<n>Experiments confirm that Leader360V significantly enhances model performance for 360 video segmentation and tracking.
arXiv Detail & Related papers (2025-06-17T07:37:08Z)
DiffPano: Scalable and Consistent Text to Panorama Generation with Spherical Epipolar-Aware Diffusion [60.45000652592418]
We propose a novel text-driven panoramic generation framework, DiffPano, to achieve scalable, consistent, and diverse panoramic scene generation. We show that DiffPano can generate consistent, diverse panoramic images with given unseen text descriptions and camera poses.
arXiv Detail & Related papers (2024-10-31T17:57:02Z)
360 Layout Estimation via Orthogonal Planes Disentanglement and Multi-view Geometric Consistency Perception [56.84921040837699]
Existing panoramic layout estimation solutions tend to recover room boundaries from a vertically compressed sequence, yielding imprecise results. We propose an orthogonal plane disentanglement network (termed DOPNet) to distinguish ambiguous semantics. We also present an unsupervised adaptation technique tailored for horizon-depth and ratio representations. Our solution outperforms other SoTA models on both monocular layout estimation and multi-view layout estimation tasks.
arXiv Detail & Related papers (2023-12-26T12:16:03Z)
BiFuse++: Self-supervised and Efficient Bi-projection Fusion for 360 Depth Estimation [59.11106101006008]
We propose BiFuse++ to explore the combination of bi-projection fusion and the self-training scenario. We propose a new fusion module and Contrast-Aware Photometric Loss to improve the performance of BiFuse.
arXiv Detail & Related papers (2022-09-07T06:24:21Z)
DeepPanoContext: Panoramic 3D Scene Understanding with Holistic Scene Context Graph and Relation-based Optimization [66.25948693095604]
We propose a novel method for panoramic 3D scene understanding which recovers the 3D room layout and the shape, pose, position, and semantic category for each object from a single full-view panorama image. Experiments demonstrate that our method outperforms existing methods on panoramic scene understanding in terms of both geometry accuracy and object arrangement.
arXiv Detail & Related papers (2021-08-24T13:55:29Z)
A Fixation-based 360{\deg} Benchmark Dataset for Salient Object Detection [21.314578493964333]
Fixation prediction (FP) in panoramic contents has been widely investigated along with the booming trend of virtual reality (VR) applications. salient object detection (SOD) has been seldom explored in 360deg images due to the lack of datasets representative of real scenes.
arXiv Detail & Related papers (2020-01-22T11:16:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.