Depth Estimation Algorithm Based on Transformer-Encoder and Feature
Fusion
- URL: http://arxiv.org/abs/2403.01370v1
- Date: Sun, 3 Mar 2024 02:10:00 GMT
- Title: Depth Estimation Algorithm Based on Transformer-Encoder and Feature
Fusion
- Authors: Linhan Xia, Junbang Liu, Tong Wu
- Abstract summary: This research adopts a transformer model, initially renowned for its success in natural language processing, to capture intricate spatial relationships in visual data for depth estimation tasks.
A significant innovation of the research is the integration of a composite loss function that combines Structural Similarity Index Measure (SSIM) with Mean Squared Error (MSE).
This research approach addresses the challenges of over-smoothing often seen in MSE-based losses and enhances the model's ability to predict depth maps that are not only accurate but also maintain structural coherence with the input images.
- Score: 3.490784807576072
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This research presents a novel depth estimation algorithm based on a
Transformer-encoder architecture, tailored for the NYU and KITTI Depth Dataset.
This research adopts a transformer model, initially renowned for its success in
natural language processing, to capture intricate spatial relationships in
visual data for depth estimation tasks. A significant innovation of the
research is the integration of a composite loss function that combines
Structural Similarity Index Measure (SSIM) with Mean Squared Error (MSE). This
combined loss function is designed to ensure the structural integrity of the
predicted depth maps relative to the original images (via SSIM) while
minimizing pixel-wise estimation errors (via MSE). This research approach
addresses the challenges of over-smoothing often seen in MSE-based losses and
enhances the model's ability to predict depth maps that are not only accurate
but also maintain structural coherence with the input images. Through rigorous
training and evaluation using the NYU Depth Dataset, the model demonstrates
superior performance, marking a significant advancement in single-image depth
estimation, particularly in complex indoor and traffic environments.
Related papers
- Depth Estimation using Weighted-loss and Transfer Learning [2.428301619698667]
We propose a simplified and adaptable approach to improve depth estimation accuracy using transfer learning and an optimized loss function.
In this study, we propose a simplified and adaptable approach to improve depth estimation accuracy using transfer learning and an optimized loss function.
The results indicate significant improvements in accuracy and robustness, with EfficientNet being the most successful architecture.
arXiv Detail & Related papers (2024-04-11T12:25:54Z) - Single Image Depth Prediction Made Better: A Multivariate Gaussian Take [163.14849753700682]
We introduce an approach that performs continuous modeling of per-pixel depth.
Our method's accuracy (named MG) is among the top on the KITTI depth-prediction benchmark leaderboard.
arXiv Detail & Related papers (2023-03-31T16:01:03Z) - DeepRM: Deep Recurrent Matching for 6D Pose Refinement [77.34726150561087]
DeepRM is a novel recurrent network architecture for 6D pose refinement.
The architecture incorporates LSTM units to propagate information through each refinement step.
DeepRM achieves state-of-the-art performance on two widely accepted challenging datasets.
arXiv Detail & Related papers (2022-05-28T16:18:08Z) - End-to-end Learning for Joint Depth and Image Reconstruction from
Diffracted Rotation [10.896567381206715]
We propose a novel end-to-end learning approach for depth from diffracted rotation.
Our approach requires a significantly less complex model and less training data, yet it is superior to existing methods in the task of monocular depth estimation.
arXiv Detail & Related papers (2022-04-14T16:14:37Z) - DepthFormer: Exploiting Long-Range Correlation and Local Information for
Accurate Monocular Depth Estimation [50.08080424613603]
Long-range correlation is essential for accurate monocular depth estimation.
We propose to leverage the Transformer to model this global context with an effective attention mechanism.
Our proposed model, termed DepthFormer, surpasses state-of-the-art monocular depth estimation methods with prominent margins.
arXiv Detail & Related papers (2022-03-27T05:03:56Z) - Robust Depth Completion with Uncertainty-Driven Loss Functions [60.9237639890582]
We introduce uncertainty-driven loss functions to improve the robustness of depth completion and handle the uncertainty in depth completion.
Our method has been tested on KITTI Depth Completion Benchmark and achieved the state-of-the-art robustness performance in terms of MAE, IMAE, and IRMSE metrics.
arXiv Detail & Related papers (2021-12-15T05:22:34Z) - Towards Comprehensive Monocular Depth Estimation: Multiple Heads Are
Better Than One [32.01675089157679]
We propose to integrate the strengths of multiple weak depth predictor to build a comprehensive and accurate depth predictor.
Specifically, we construct multiple base (weak) depth predictors by utilizing different Transformer-based and convolutional neural network (CNN)-based architectures.
The resultant model, which we refer to as Transformer-assisted depth ensembles (TEDepth), achieves better results than previous state-of-the-art approaches.
arXiv Detail & Related papers (2021-11-16T09:09:05Z) - Robust lEarned Shrinkage-Thresholding (REST): Robust unrolling for
sparse recover [87.28082715343896]
We consider deep neural networks for solving inverse problems that are robust to forward model mis-specifications.
We design a new robust deep neural network architecture by applying algorithm unfolding techniques to a robust version of the underlying recovery problem.
The proposed REST network is shown to outperform state-of-the-art model-based and data-driven algorithms in both compressive sensing and radar imaging problems.
arXiv Detail & Related papers (2021-10-20T06:15:45Z) - Improved Point Transformation Methods For Self-Supervised Depth
Prediction [4.103701929881022]
Given stereo or egomotion image pairs, a popular and successful method for unsupervised learning of monocular depth estimation is to measure the quality of image reconstructions resulting from the learned depth predictions.
This paper introduces a z-buffering algorithm that correctly and efficiently handles points occluded after transformation to a novel viewpoint.
Because our algorithm is implemented with operators typical of machine learning libraries, it can be incorporated into any existing unsupervised depth learning framework with automatic support for differentiation.
arXiv Detail & Related papers (2021-02-18T03:42:40Z) - Towards Better Generalization: Joint Depth-Pose Learning without PoseNet [36.414471128890284]
We tackle the essential problem of scale inconsistency for self-supervised joint depth-pose learning.
Most existing methods assume that a consistent scale of depth and pose can be learned across all input samples.
We propose a novel system that explicitly disentangles scale from the network estimation.
arXiv Detail & Related papers (2020-04-03T00:28:09Z) - Augmented Parallel-Pyramid Net for Attention Guided Pose-Estimation [90.28365183660438]
This paper proposes an augmented parallel-pyramid net with attention partial module and differentiable auto-data augmentation.
We define a new pose search space where the sequences of data augmentations are formulated as a trainable and operational CNN component.
Notably, our method achieves the top-1 accuracy on the challenging COCO keypoint benchmark and the state-of-the-art results on the MPII datasets.
arXiv Detail & Related papers (2020-03-17T03:52:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.