Estimating Image Depth in the Comics Domain
- URL: http://arxiv.org/abs/2110.03575v1
- Date: Thu, 7 Oct 2021 15:54:27 GMT
- Title: Estimating Image Depth in the Comics Domain
- Authors: Deblina Bhattacharjee, Martin Everaert, Mathieu Salzmann, Sabine
S\"usstrunk
- Abstract summary: We use an off-the-shelf unsupervised image to image translation method to translate the comics images to natural ones.
We then use an attention-guided monocular depth estimator to predict their depth.
Our model learns to distinguish between text and images in the comics panels to reduce text-based artefacts in the depth estimates.
- Score: 59.275961069130304
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Estimating the depth of comics images is challenging as such images a) are
monocular; b) lack ground-truth depth annotations; c) differ across different
artistic styles; d) are sparse and noisy. We thus, use an off-the-shelf
unsupervised image to image translation method to translate the comics images
to natural ones and then use an attention-guided monocular depth estimator to
predict their depth. This lets us leverage the depth annotations of existing
natural images to train the depth estimator. Furthermore, our model learns to
distinguish between text and images in the comics panels to reduce text-based
artefacts in the depth estimates. Our method consistently outperforms the
existing state-ofthe-art approaches across all metrics on both the DCM and
eBDtheque images. Finally, we introduce a dataset to evaluate depth prediction
on comics.
Related papers
- Marigold-DC: Zero-Shot Monocular Depth Completion with Guided Diffusion [51.69876947593144]
Existing methods for depth completion operate in tightly constrained settings.
Inspired by advances in monocular depth estimation, we reframe depth completion as an image-conditional depth map generation.
Marigold-DC builds on a pretrained latent diffusion model for monocular depth estimation and injects the depth observations as test-time guidance.
arXiv Detail & Related papers (2024-12-18T00:06:41Z) - ScaleDepth: Decomposing Metric Depth Estimation into Scale Prediction and Relative Depth Estimation [62.600382533322325]
We propose a novel monocular depth estimation method called ScaleDepth.
Our method decomposes metric depth into scene scale and relative depth, and predicts them through a semantic-aware scale prediction module.
Our method achieves metric depth estimation for both indoor and outdoor scenes in a unified framework.
arXiv Detail & Related papers (2024-07-11T05:11:56Z) - Dense Multitask Learning to Reconfigure Comics [63.367664789203936]
We develop a MultiTask Learning (MTL) model to achieve dense predictions for comics panels.
Our method can successfully identify the semantic units as well as the notion of 3D in comic panels.
arXiv Detail & Related papers (2023-07-16T15:10:34Z) - Understanding Depth Map Progressively: Adaptive Distance Interval
Separation for Monocular 3d Object Detection [38.96129204108353]
Several monocular 3D detection techniques rely on auxiliary depth maps from the depth estimation task.
We propose a framework named the Adaptive Distance Interval Separation Network (ADISN) that adopts a novel perspective on understanding depth maps.
arXiv Detail & Related papers (2023-06-19T13:32:53Z) - Depth Completion with Twin Surface Extrapolation at Occlusion Boundaries [16.773787000535645]
We propose a multi-hypothesis depth representation that explicitly models both foreground and background depths.
Key to our method is the use of an asymmetric loss function that operates on a novel twin-surface representation.
We validate our method on three different datasets.
arXiv Detail & Related papers (2021-04-06T02:36:35Z) - S2R-DepthNet: Learning a Generalizable Depth-specific Structural
Representation [63.58891781246175]
Human can infer the 3D geometry of a scene from a sketch instead of a realistic image, which indicates that the spatial structure plays a fundamental role in understanding the depth of scenes.
We are the first to explore the learning of a depth-specific structural representation, which captures the essential feature for depth estimation and ignores irrelevant style information.
Our S2R-DepthNet can be well generalized to unseen real-world data directly even though it is only trained on synthetic data.
arXiv Detail & Related papers (2021-04-02T03:55:41Z) - Learning Depth via Leveraging Semantics: Self-supervised Monocular Depth
Estimation with Both Implicit and Explicit Semantic Guidance [34.62415122883441]
We propose a Semantic-aware Spatial Feature Alignment scheme to align implicit semantic features with depth features for scene-aware depth estimation.
We also propose a semantic-guided ranking loss to explicitly constrain the estimated depth maps to be consistent with real scene contextual properties.
Our method produces high quality depth maps which are consistently superior either on complex scenes or diverse semantic categories.
arXiv Detail & Related papers (2021-02-11T14:29:51Z) - SAFENet: Self-Supervised Monocular Depth Estimation with Semantic-Aware
Feature Extraction [27.750031877854717]
We propose SAFENet that is designed to leverage semantic information to overcome the limitations of the photometric loss.
Our key idea is to exploit semantic-aware depth features that integrate the semantic and geometric knowledge.
Experiments on KITTI dataset demonstrate that our methods compete or even outperform the state-of-the-art methods.
arXiv Detail & Related papers (2020-10-06T17:22:25Z) - Single Image Depth Estimation Trained via Depth from Defocus Cues [105.67073923825842]
Estimating depth from a single RGB image is a fundamental task in computer vision.
In this work, we rely, instead of different views, on depth from focus cues.
We present results that are on par with supervised methods on KITTI and Make3D datasets and outperform unsupervised learning approaches.
arXiv Detail & Related papers (2020-01-14T20:22:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.