3D-LMVIC: Learning-based Multi-View Image Coding with 3D Gaussian Geometric Priors
- URL: http://arxiv.org/abs/2409.04013v2
- Date: Tue, 18 Mar 2025 08:54:41 GMT
- Title: 3D-LMVIC: Learning-based Multi-View Image Coding with 3D Gaussian Geometric Priors
- Authors: Yujun Huang, Bin Chen, Niu Lian, Baoyi An, Shu-Tao Xia,
- Abstract summary: We propose 3D-LMVIC, a novel learning-based multi-view image compression framework.<n>We show that 3D-LMVIC achieves superior performance compared to both traditional and learning-based methods.<n>It significantly improves disparity estimation accuracy over existing two-view approaches.
- Score: 42.27461884229915
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Existing multi-view image compression methods often rely on 2D projection-based similarities between views to estimate disparities. While effective for small disparities, such as those in stereo images, these methods struggle with the more complex disparities encountered in wide-baseline multi-camera systems, commonly found in virtual reality and autonomous driving applications. To address this limitation, we propose 3D-LMVIC, a novel learning-based multi-view image compression framework that leverages 3D Gaussian Splatting to derive geometric priors for accurate disparity estimation. Furthermore, we introduce a depth map compression model to minimize geometric redundancy across views, along with a multi-view sequence ordering strategy based on a defined distance measure between views to enhance correlations between adjacent views. Experimental results demonstrate that 3D-LMVIC achieves superior performance compared to both traditional and learning-based methods. Additionally, it significantly improves disparity estimation accuracy over existing two-view approaches.
Related papers
- CDI3D: Cross-guided Dense-view Interpolation for 3D Reconstruction [25.468907201804093]
Large Reconstruction Models (LRMs) have shown great promise in leveraging multi-view images generated by 2D diffusion models to extract 3D content.
However, 2D diffusion models often struggle to produce dense images with strong multi-view consistency.
We present CDI3D, a feed-forward framework designed for efficient, high-quality image-to-3D generation with view.
arXiv Detail & Related papers (2025-03-11T03:08:43Z) - G-NeRF: Geometry-enhanced Novel View Synthesis from Single-View Images [45.66479596827045]
We propose a Geometry-enhanced NeRF (G-NeRF), which seeks to enhance the geometry priors by a geometry-guided multi-view synthesis approach.
To tackle the absence of multi-view supervision for single-view images, we design the depth-aware training approach.
arXiv Detail & Related papers (2024-04-11T04:58:18Z) - MVD-Fusion: Single-view 3D via Depth-consistent Multi-view Generation [54.27399121779011]
We present MVD-Fusion: a method for single-view 3D inference via generative modeling of multi-view-consistent RGB-D images.
We show that our approach can yield more accurate synthesis compared to recent state-of-the-art, including distillation-based 3D inference and prior multi-view generation methods.
arXiv Detail & Related papers (2024-04-04T17:59:57Z) - RadOcc: Learning Cross-Modality Occupancy Knowledge through Rendering
Assisted Distillation [50.35403070279804]
3D occupancy prediction is an emerging task that aims to estimate the occupancy states and semantics of 3D scenes using multi-view images.
We propose RadOcc, a Rendering assisted distillation paradigm for 3D Occupancy prediction.
arXiv Detail & Related papers (2023-12-19T03:39:56Z) - LFM-3D: Learnable Feature Matching Across Wide Baselines Using 3D
Signals [9.201550006194994]
Learnable matchers often underperform when there exists only small regions of co-visibility between image pairs.
We propose LFM-3D, a Learnable Feature Matching framework that uses models based on graph neural networks.
We show that the resulting improved correspondences lead to much higher relative posing accuracy for in-the-wild image pairs.
arXiv Detail & Related papers (2023-03-22T17:46:27Z) - PointMCD: Boosting Deep Point Cloud Encoders via Multi-view Cross-modal
Distillation for 3D Shape Recognition [55.38462937452363]
We propose a unified multi-view cross-modal distillation architecture, including a pretrained deep image encoder as the teacher and a deep point encoder as the student.
By pair-wise aligning multi-view visual and geometric descriptors, we can obtain more powerful deep point encoders without exhausting and complicated network modification.
arXiv Detail & Related papers (2022-07-07T07:23:20Z) - Multi-View Consistent Generative Adversarial Networks for 3D-aware Image
Synthesis [48.33860286920389]
3D-aware image synthesis aims to generate images of objects from multiple views by learning a 3D representation.
Existing approaches lack geometry constraints, hence usually fail to generate multi-view consistent images.
We propose Multi-View Consistent Generative Adrial Networks (MVCGAN) for high-quality 3D-aware image synthesis with geometry constraints.
arXiv Detail & Related papers (2022-04-13T11:23:09Z) - Graph-Based 3D Multi-Person Pose Estimation Using Multi-View Images [79.70127290464514]
We decompose the task into two stages, i.e. person localization and pose estimation.
And we propose three task-specific graph neural networks for effective message passing.
Our approach achieves state-of-the-art performance on CMU Panoptic and Shelf datasets.
arXiv Detail & Related papers (2021-09-13T11:44:07Z) - 3D Human Pose, Shape and Texture from Low-Resolution Images and Videos [107.36352212367179]
We propose RSC-Net, which consists of a Resolution-aware network, a Self-supervision loss, and a Contrastive learning scheme.
The proposed method is able to learn 3D body pose and shape across different resolutions with one single model.
We extend the RSC-Net to handle low-resolution videos and apply it to reconstruct textured 3D pedestrians from low-resolution input.
arXiv Detail & Related papers (2021-03-11T06:52:12Z) - Multi-view Depth Estimation using Epipolar Spatio-Temporal Networks [87.50632573601283]
We present a novel method for multi-view depth estimation from a single video.
Our method achieves temporally coherent depth estimation results by using a novel Epipolar Spatio-Temporal (EST) transformer.
To reduce the computational cost, inspired by recent Mixture-of-Experts models, we design a compact hybrid network.
arXiv Detail & Related papers (2020-11-26T04:04:21Z) - Lightweight Multi-View 3D Pose Estimation through Camera-Disentangled
Representation [57.11299763566534]
We present a solution to recover 3D pose from multi-view images captured with spatially calibrated cameras.
We exploit 3D geometry to fuse input images into a unified latent representation of pose, which is disentangled from camera view-points.
Our architecture then conditions the learned representation on camera projection operators to produce accurate per-view 2d detections.
arXiv Detail & Related papers (2020-04-05T12:52:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.