Multi-task Learning for Monocular Depth and Defocus Estimations with
Real Images
- URL: http://arxiv.org/abs/2208.09848v1
- Date: Sun, 21 Aug 2022 08:59:56 GMT
- Title: Multi-task Learning for Monocular Depth and Defocus Estimations with
Real Images
- Authors: Renzhi He, Hualin Hong, Boya Fu, Fei Liu
- Abstract summary: Most existing methods treat depth estimation and defocus estimation as two separate tasks, ignoring the strong connection between them.
We propose a multi-task learning network consisting of an encoder with two decoders to estimate the depth and defocus map from a single focused image.
Our depth and defocus estimations achieve significantly better performance than other state-of-art algorithms.
- Score: 3.682618267671887
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Monocular depth estimation and defocus estimation are two fundamental tasks
in computer vision. Most existing methods treat depth estimation and defocus
estimation as two separate tasks, ignoring the strong connection between them.
In this work, we propose a multi-task learning network consisting of an encoder
with two decoders to estimate the depth and defocus map from a single focused
image. Through the multi-task network, the depth estimation facilitates the
defocus estimation to get better results in the weak texture region and the
defocus estimation facilitates the depth estimation by the strong physical
connection between the two maps. We set up a dataset (named ALL-in-3D dataset)
which is the first all-real image dataset consisting of 100K sets of
all-in-focus images, focused images with focus depth, depth maps, and defocus
maps. It enables the network to learn features and solid physical connections
between the depth and real defocus images. Experiments demonstrate that the
network learns more solid features from the real focused images than the
synthetic focused images. Benefiting from this multi-task structure where
different tasks facilitate each other, our depth and defocus estimations
achieve significantly better performance than other state-of-art algorithms.
The code and dataset will be publicly available at
https://github.com/cubhe/MDDNet.
Related papers
- Towards Real-World Focus Stacking with Deep Learning [97.34754533628322]
We introduce a new dataset consisting of 94 high-resolution bursts of raw images with focus bracketing.
This dataset is used to train the first deep learning algorithm for focus stacking capable of handling bursts of sufficient length for real-world applications.
arXiv Detail & Related papers (2023-11-29T17:49:33Z) - Depth Estimation and Image Restoration by Deep Learning from Defocused
Images [2.6599014990168834]
Two-headed Depth Estimation and Deblurring Network (2HDED:NET) extends a conventional Depth from Defocus (DFD) networks with a deblurring branch that shares the same encoder as the depth branch.
The proposed method has been successfully tested on two benchmarks, one for indoor and the other for outdoor scenes: NYU-v2 and Make3D.
arXiv Detail & Related papers (2023-02-21T15:28:42Z) - Learning Depth from Focus in the Wild [16.27391171541217]
We present a convolutional neural network-based depth estimation from single focal stacks.
Our method allows depth maps to be inferred in an end-to-end manner even with image alignment.
For the generalization of the proposed network, we develop a simulator to realistically reproduce the features of commercial cameras.
arXiv Detail & Related papers (2022-07-20T05:23:29Z) - Learning to Deblur using Light Field Generated and Real Defocus Images [4.926805108788465]
Defocus deblurring is a challenging task due to the spatially varying nature of defocus blur.
We propose a novel deep defocus deblurring network that leverages the strength and overcomes the shortcoming of light fields.
arXiv Detail & Related papers (2022-04-01T11:35:51Z) - Bridging Unsupervised and Supervised Depth from Focus via All-in-Focus
Supervision [10.547816678110417]
The proposed method can be trained either supervisedly with ground truth depth, or emphunsupervisedly with AiF images as supervisory signals.
We show in various experiments that our method outperforms the state-of-the-art methods both quantitatively and qualitatively.
arXiv Detail & Related papers (2021-08-24T17:09:13Z) - Sparse Auxiliary Networks for Unified Monocular Depth Prediction and
Completion [56.85837052421469]
Estimating scene geometry from data obtained with cost-effective sensors is key for robots and self-driving cars.
In this paper, we study the problem of predicting dense depth from a single RGB image with optional sparse measurements from low-cost active depth sensors.
We introduce Sparse Networks (SANs), a new module enabling monodepth networks to perform both the tasks of depth prediction and completion.
arXiv Detail & Related papers (2021-03-30T21:22:26Z) - Defocus Blur Detection via Depth Distillation [64.78779830554731]
We introduce depth information into DBD for the first time.
In detail, we learn the defocus blur from ground truth and the depth distilled from a well-trained depth estimation network.
Our approach outperforms 11 other state-of-the-art methods on two popular datasets.
arXiv Detail & Related papers (2020-07-16T04:58:09Z) - Real-MFF: A Large Realistic Multi-focus Image Dataset with Ground Truth [58.226535803985804]
We introduce a large and realistic multi-focus dataset called Real-MFF.
The dataset contains 710 pairs of source images with corresponding ground truth images.
We evaluate 10 typical multi-focus algorithms on this dataset for the purpose of illustration.
arXiv Detail & Related papers (2020-03-28T12:33:46Z) - Learning Depth With Very Sparse Supervision [57.911425589947314]
This paper explores the idea that perception gets coupled to 3D properties of the world via interaction with the environment.
We train a specialized global-local network architecture with what would be available to a robot interacting with the environment.
Experiments on several datasets show that, when ground truth is available even for just one of the image pixels, the proposed network can learn monocular dense depth estimation up to 22.5% more accurately than state-of-the-art approaches.
arXiv Detail & Related papers (2020-03-02T10:44:13Z) - Single Image Depth Estimation Trained via Depth from Defocus Cues [105.67073923825842]
Estimating depth from a single RGB image is a fundamental task in computer vision.
In this work, we rely, instead of different views, on depth from focus cues.
We present results that are on par with supervised methods on KITTI and Make3D datasets and outperform unsupervised learning approaches.
arXiv Detail & Related papers (2020-01-14T20:22:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.