360-Degree Full-view Image Segmentation by Spherical Convolution compatible with Large-scale Planar Pre-trained Models
- URL: http://arxiv.org/abs/2507.09216v1
- Date: Sat, 12 Jul 2025 09:18:30 GMT
- Title: 360-Degree Full-view Image Segmentation by Spherical Convolution compatible with Large-scale Planar Pre-trained Models
- Authors: Jingguo Liu, Han Yu, Shigang Li, Jianfeng Li,
- Abstract summary: We introduce a novel spherical sampling method for panoramic images that enables the direct utilization of existing pre-trained models.<n>Our method employs spherical discrete sampling based on the weights of the pre-trained models, effectively mitigating distortions.<n>We apply the proposed sampling method to panoramic image segmentation, utilizing features obtained from the spherical model as masks for specific channel attentions.
- Score: 15.133436261774547
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Due to the current lack of large-scale datasets at the million-scale level, tasks involving panoramic images predominantly rely on existing two-dimensional pre-trained image benchmark models as backbone networks. However, these networks are not equipped to recognize the distortions and discontinuities inherent in panoramic images, which adversely affects their performance in such tasks. In this paper, we introduce a novel spherical sampling method for panoramic images that enables the direct utilization of existing pre-trained models developed for two-dimensional images. Our method employs spherical discrete sampling based on the weights of the pre-trained models, effectively mitigating distortions while achieving favorable initial training values. Additionally, we apply the proposed sampling method to panoramic image segmentation, utilizing features obtained from the spherical model as masks for specific channel attentions, which yields commendable results on commonly used indoor datasets, Stanford2D3D.
Related papers
- UniPre3D: Unified Pre-training of 3D Point Cloud Models with Cross-Modal Gaussian Splatting [64.31900521467362]
No existing pre-training method is equally effective for both object- and scene-level point clouds.<n>We introduce UniPre3D, the first unified pre-training method that can be seamlessly applied to point clouds of any scale and 3D models of any architecture.
arXiv Detail & Related papers (2025-06-11T17:23:21Z) - Estimating Depth of Monocular Panoramic Image with Teacher-Student Model Fusing Equirectangular and Spherical Representations [3.8240176158734194]
We propose a method of estimating the depth of monocular panoramic image with a teacher-student model fusing equirectangular and spherical representations.
In experiments, the proposed method is tested on several well-known 360 monocular depth estimation benchmark datasets.
arXiv Detail & Related papers (2024-05-27T06:11:16Z) - Deep Domain Adaptation: A Sim2Real Neural Approach for Improving Eye-Tracking Systems [80.62854148838359]
Eye image segmentation is a critical step in eye tracking that has great influence over the final gaze estimate.
We use dimensionality-reduction techniques to measure the overlap between the target eye images and synthetic training data.
Our methods result in robust, improved performance when tackling the discrepancy between simulation and real-world data samples.
arXiv Detail & Related papers (2024-03-23T22:32:06Z) - Enhancing Diffusion Models with 3D Perspective Geometry Constraints [10.21800236402905]
We introduce a novel geometric constraint in the training process of generative models to enforce perspective accuracy.
We show that outputs of models trained with this constraint both appear more realistic and improve performance of downstream models trained on generated images.
arXiv Detail & Related papers (2023-12-01T21:56:43Z) - Leveraging Large-Scale Pretrained Vision Foundation Models for
Label-Efficient 3D Point Cloud Segmentation [67.07112533415116]
We present a novel framework that adapts various foundational models for the 3D point cloud segmentation task.
Our approach involves making initial predictions of 2D semantic masks using different large vision models.
To generate robust 3D semantic pseudo labels, we introduce a semantic label fusion strategy that effectively combines all the results via voting.
arXiv Detail & Related papers (2023-11-03T15:41:15Z) - Interpretable 2D Vision Models for 3D Medical Images [47.75089895500738]
This study proposes a simple approach of adapting 2D networks with an intermediate feature representation for processing 3D images.
We show on all 3D MedMNIST datasets as benchmark and two real-world datasets consisting of several hundred high-resolution CT or MRI scans that our approach performs on par with existing methods.
arXiv Detail & Related papers (2023-07-13T08:27:09Z) - Flow-based GAN for 3D Point Cloud Generation from a Single Image [16.04710129379503]
We introduce a hybrid explicit-implicit generative modeling scheme, which inherits the flow-based explicit generative models for sampling point clouds with arbitrary resolutions.
We evaluate on the large-scale synthetic dataset ShapeNet, with the experimental results demonstrating the superior performance of the proposed method.
arXiv Detail & Related papers (2022-10-08T17:58:20Z) - DELAD: Deep Landweber-guided deconvolution with Hessian and sparse prior [0.22940141855172028]
We present a model for non-blind image deconvolution that incorporates the classic iterative method into a deep learning application.
We build our network based on the iterative Landweber deconvolution algorithm, which is integrated with trainable convolutional layers to enhance the recovered image structures and details.
arXiv Detail & Related papers (2022-09-30T11:15:03Z) - P2P: Tuning Pre-trained Image Models for Point Cloud Analysis with
Point-to-Pixel Prompting [94.11915008006483]
We propose a novel Point-to-Pixel prompting for point cloud analysis.
Our method attains 89.3% accuracy on the hardest setting of ScanObjectNN.
Our framework also exhibits very competitive performance on ModelNet classification and ShapeNet Part Code.
arXiv Detail & Related papers (2022-08-04T17:59:03Z) - Pixel2Mesh++: 3D Mesh Generation and Refinement from Multi-View Images [82.32776379815712]
We study the problem of shape generation in 3D mesh representation from a small number of color images with or without camera poses.
We adopt to further improve the shape quality by leveraging cross-view information with a graph convolution network.
Our model is robust to the quality of the initial mesh and the error of camera pose, and can be combined with a differentiable function for test-time optimization.
arXiv Detail & Related papers (2022-04-21T03:42:31Z) - An application of a pseudo-parabolic modeling to texture image
recognition [0.0]
We present a novel methodology for texture image recognition using a partial differential equation modeling.
We employ the pseudo-parabolic Buckley-Leverett equation to provide a dynamics to the digital image representation and collect local descriptors from those images evolving in time.
arXiv Detail & Related papers (2021-02-09T18:08:42Z) - Set Based Stochastic Subsampling [85.5331107565578]
We propose a set-based two-stage end-to-end neural subsampling model that is jointly optimized with an textitarbitrary downstream task network.
We show that it outperforms the relevant baselines under low subsampling rates on a variety of tasks including image classification, image reconstruction, function reconstruction and few-shot classification.
arXiv Detail & Related papers (2020-06-25T07:36:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.