Complementary Bi-directional Feature Compression for Indoor 360{\deg}
Semantic Segmentation with Self-distillation
- URL: http://arxiv.org/abs/2207.02437v1
- Date: Wed, 6 Jul 2022 05:05:54 GMT
- Title: Complementary Bi-directional Feature Compression for Indoor 360{\deg}
Semantic Segmentation with Self-distillation
- Authors: Zishuo Zheng, Chunyu Lin, Lang Nie, Kang Liao, Zhijie Shen, Yao Zhao
- Abstract summary: We propose a novel 360deg semantic segmentation solution from a complementary perspective.
Our approach outperforms the state-of-the-art solutions with at least 10% improvement on quantitative evaluations.
- Score: 37.82642960470551
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, horizontal representation-based panoramic semantic segmentation
approaches outperform projection-based solutions, because the distortions can
be effectively removed by compressing the spherical data in the vertical
direction. However, these methods ignore the distortion distribution prior and
are limited to unbalanced receptive fields, e.g., the receptive fields are
sufficient in the vertical direction and insufficient in the horizontal
direction. Differently, a vertical representation compressed in another
direction can offer implicit distortion prior and enlarge horizontal receptive
fields. In this paper, we combine the two different representations and propose
a novel 360{\deg} semantic segmentation solution from a complementary
perspective. Our network comprises three modules: a feature extraction module,
a bi-directional compression module, and an ensemble decoding module. First, we
extract multi-scale features from a panorama. Then, a bi-directional
compression module is designed to compress features into two complementary
low-dimensional representations, which provide content perception and
distortion prior. Furthermore, to facilitate the fusion of bi-directional
features, we design a unique self distillation strategy in the ensemble
decoding module to enhance the interaction of different features and further
improve the performance. Experimental results show that our approach
outperforms the state-of-the-art solutions with at least 10\% improvement on
quantitative evaluations while displaying the best performance on visual
appearance.
Related papers
- Bidirectional Stereo Image Compression with Cross-Dimensional Entropy Model [11.959608742884408]
BiSIC is a symmetric stereo image compression architecture.
We propose a 3D convolution based backbone to capture local features and incorporate bidirectional attention blocks to exploit global features.
Our proposed BiSIC outperforms conventional image/video compression standards.
arXiv Detail & Related papers (2024-07-15T11:36:22Z) - 360 Layout Estimation via Orthogonal Planes Disentanglement and Multi-view Geometric Consistency Perception [56.84921040837699]
Existing panoramic layout estimation solutions tend to recover room boundaries from a vertically compressed sequence, yielding imprecise results.
We propose an orthogonal plane disentanglement network (termed DOPNet) to distinguish ambiguous semantics.
We also present an unsupervised adaptation technique tailored for horizon-depth and ratio representations.
Our solution outperforms other SoTA models on both monocular layout estimation and multi-view layout estimation tasks.
arXiv Detail & Related papers (2023-12-26T12:16:03Z) - Dynamic Kernel-Based Adaptive Spatial Aggregation for Learned Image
Compression [63.56922682378755]
We focus on extending spatial aggregation capability and propose a dynamic kernel-based transform coding.
The proposed adaptive aggregation generates kernel offsets to capture valid information in the content-conditioned range to help transform.
Experimental results demonstrate that our method achieves superior rate-distortion performance on three benchmarks compared to the state-of-the-art learning-based methods.
arXiv Detail & Related papers (2023-08-17T01:34:51Z) - Multi-Projection Fusion and Refinement Network for Salient Object
Detection in 360{\deg} Omnidirectional Image [141.10227079090419]
We propose a Multi-Projection Fusion and Refinement Network (MPFR-Net) to detect the salient objects in 360deg omnidirectional image.
MPFR-Net uses the equirectangular projection image and four corresponding cube-unfolding images as inputs.
Experimental results on two omnidirectional datasets demonstrate that the proposed approach outperforms the state-of-the-art methods both qualitatively and quantitatively.
arXiv Detail & Related papers (2022-12-23T14:50:40Z) - CSformer: Bridging Convolution and Transformer for Compressive Sensing [65.22377493627687]
This paper proposes a hybrid framework that integrates the advantages of leveraging detailed spatial information from CNN and the global context provided by transformer for enhanced representation learning.
The proposed approach is an end-to-end compressive image sensing method, composed of adaptive sampling and recovery.
The experimental results demonstrate the effectiveness of the dedicated transformer-based architecture for compressive sensing.
arXiv Detail & Related papers (2021-12-31T04:37:11Z) - Pseudocylindrical Convolutions for Learned Omnidirectional Image
Compression [42.15877732557837]
We make one of the first attempts to learn deep neural networks for omnidirectional image compression.
Under reasonable constraints on the parametric representation, the pseudocylindrical convolution can be efficiently implemented by standard convolution.
Experimental results show that our method consistently achieves better rate-distortion performance than competing methods.
arXiv Detail & Related papers (2021-12-25T12:18:32Z) - UniFuse: Unidirectional Fusion for 360$^{\circ}$ Panorama Depth
Estimation [11.680475784102308]
This paper introduces a new framework to fuse features from the two projections, unidirectionally feeding the cubemap features to the equirectangular features only at the decoding stage.
Experiments verify the effectiveness of our proposed fusion strategy and module, and our model achieves state-of-the-art performance on four popular datasets.
arXiv Detail & Related papers (2021-02-06T10:01:09Z) - Invariant Deep Compressible Covariance Pooling for Aerial Scene
Categorization [80.55951673479237]
We propose a novel invariant deep compressible covariance pooling (IDCCP) to solve nuisance variations in aerial scene categorization.
We conduct extensive experiments on the publicly released aerial scene image data sets and demonstrate the superiority of this method compared with state-of-the-art methods.
arXiv Detail & Related papers (2020-11-11T11:13:07Z) - Improving Inference for Neural Image Compression [31.999462074510305]
State-of-the-art methods build on hierarchical variational autoencoders to predict a compressible latent representation of each data point.
We identify three approximation gaps which limit performance in the conventional approach.
We propose remedies for each of these three limitations based on ideas related to iterative inference.
arXiv Detail & Related papers (2020-06-07T19:26:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.