Unsupervised HDR Image and Video Tone Mapping via Contrastive Learning
- URL: http://arxiv.org/abs/2303.07327v2
- Date: Mon, 26 Jun 2023 13:56:52 GMT
- Title: Unsupervised HDR Image and Video Tone Mapping via Contrastive Learning
- Authors: Cong Cao, Huanjing Yue, Xin Liu, Jingyu Yang
- Abstract summary: We propose a unified framework (IVTMNet) for unsupervised image and video tone mapping.
For video tone mapping, we propose a temporal-feature-replaced (TFR) module to efficiently utilize the temporal correlation.
Experimental results demonstrate that our method outperforms state-of-the-art image and video tone mapping methods.
- Score: 19.346284003982035
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Capturing high dynamic range (HDR) images (videos) is attractive because it
can reveal the details in both dark and bright regions. Since the mainstream
screens only support low dynamic range (LDR) content, tone mapping algorithm is
required to compress the dynamic range of HDR images (videos). Although image
tone mapping has been widely explored, video tone mapping is lagging behind,
especially for the deep-learning-based methods, due to the lack of HDR-LDR
video pairs. In this work, we propose a unified framework (IVTMNet) for
unsupervised image and video tone mapping. To improve unsupervised training, we
propose domain and instance based contrastive learning loss. Instead of using a
universal feature extractor, such as VGG to extract the features for similarity
measurement, we propose a novel latent code, which is an aggregation of the
brightness and contrast of extracted features, to measure the similarity of
different pairs. We totally construct two negative pairs and three positive
pairs to constrain the latent codes of tone mapped results. For the network
structure, we propose a spatial-feature-enhanced (SFE) module to enable
information exchange and transformation of nonlocal regions. For video tone
mapping, we propose a temporal-feature-replaced (TFR) module to efficiently
utilize the temporal correlation and improve the temporal consistency of video
tone-mapped results. We construct a large-scale unpaired HDR-LDR video dataset
to facilitate the unsupervised training process for video tone mapping.
Experimental results demonstrate that our method outperforms state-of-the-art
image and video tone mapping methods. Our code and dataset are available at
https://github.com/cao-cong/UnCLTMO.
Related papers
- Semantic Aware Diffusion Inverse Tone Mapping [5.65968650127342]
Inverse tone mapping attempts to boost captured Standard Dynamic Range (SDR) images back to High Dynamic Range ( HDR)
We present a novel inverse tone mapping approach for mapping SDR images to HDR that generates lost details in clipped regions through a semantic-aware diffusion based inpainting approach.
arXiv Detail & Related papers (2024-05-24T11:44:22Z) - Joint tone mapping and denoising of thermal infrared images via
multi-scale Retinex and multi-task learning [6.469120003158514]
Tone mapping algorithms for thermal infrared images with 16 bpp are investigated.
An optimized multi-scale Retinex algorithm is approximated with a deep learning approach based on the popular U-Net architecture.
The remaining noise in the images after tone mapping is reduced implicitly by utilizing a self-supervised deep learning approach.
arXiv Detail & Related papers (2023-05-01T07:14:32Z) - Multiscale Representation for Real-Time Anti-Aliasing Neural Rendering [84.37776381343662]
Mip-NeRF proposes a multiscale representation as a conical frustum to encode scale information.
We propose mip voxel grids (Mip-VoG), an explicit multiscale representation for real-time anti-aliasing rendering.
Our approach is the first to offer multiscale training and real-time anti-aliasing rendering simultaneously.
arXiv Detail & Related papers (2023-04-20T04:05:22Z) - Transform your Smartphone into a DSLR Camera: Learning the ISP in the
Wild [159.71025525493354]
We propose a trainable Image Signal Processing framework that produces DSLR quality images given RAW images captured by a smartphone.
To address the color misalignments between training image pairs, we employ a color-conditional ISP network and optimize a novel parametric color mapping between each input RAW and reference DSLR image.
arXiv Detail & Related papers (2022-03-20T20:13:59Z) - PreViTS: Contrastive Pretraining with Video Tracking Supervision [53.73237606312024]
PreViTS is an unsupervised SSL framework for selecting clips containing the same object.
PreViTS spatially constrains the frame regions to learn from and trains the model to locate meaningful objects.
We train a momentum contrastive (MoCo) encoder on VGG-Sound and Kinetics-400 datasets with PreViTS.
arXiv Detail & Related papers (2021-12-01T19:49:57Z) - Invertible Tone Mapping with Selectable Styles [19.03179521805971]
In this paper, we propose an invertible tone mapping method that converts the multi-exposure HDR to a true LDR.
Our invertible LDR can mimic the appearance of a user-selected tone mapping style.
It can be shared over any existing social network platforms that may re-encode or format-convert the uploaded images.
arXiv Detail & Related papers (2021-10-09T07:32:36Z) - MFGNet: Dynamic Modality-Aware Filter Generation for RGB-T Tracking [72.65494220685525]
We propose a new dynamic modality-aware filter generation module (named MFGNet) to boost the message communication between visible and thermal data.
We generate dynamic modality-aware filters with two independent networks. The visible and thermal filters will be used to conduct a dynamic convolutional operation on their corresponding input feature maps respectively.
To address issues caused by heavy occlusion, fast motion, and out-of-view, we propose to conduct a joint local and global search by exploiting a new direction-aware target-driven attention mechanism.
arXiv Detail & Related papers (2021-07-22T03:10:51Z) - End-to-end Multi-modal Video Temporal Grounding [105.36814858748285]
We propose a multi-modal framework to extract complementary information from videos.
We adopt RGB images for appearance, optical flow for motion, and depth maps for image structure.
We conduct experiments on the Charades-STA and ActivityNet Captions datasets, and show that the proposed method performs favorably against state-of-the-art approaches.
arXiv Detail & Related papers (2021-07-12T17:58:10Z) - Attention-guided Temporal Coherent Video Object Matting [78.82835351423383]
We propose a novel deep learning-based object matting method that can achieve temporally coherent matting results.
Its key component is an attention-based temporal aggregation module that maximizes image matting networks' strength.
We show how to effectively solve the trimap generation problem by fine-tuning a state-of-the-art video object segmentation network.
arXiv Detail & Related papers (2021-05-24T17:34:57Z) - HDR Video Reconstruction: A Coarse-to-fine Network and A Real-world
Benchmark Dataset [30.249052175655606]
We introduce a coarse-to-fine deep learning framework for HDR video reconstruction.
Firstly, we perform coarse alignment and pixel blending in the image space to estimate the coarse HDR video.
Secondly, we conduct more sophisticated alignment and temporal fusion in the feature space of the coarse HDR video to produce better reconstruction.
arXiv Detail & Related papers (2021-03-27T16:40:05Z) - Deep Reformulated Laplacian Tone Mapping [6.078183247169192]
Wide dynamic range (WDR) images contain more scene details and contrast when compared to common images.
The details of WDR images can diminish during the tone mapping process.
In this work, we address the problem by combining a novel reformulated Laplacian pyramid and deep learning.
arXiv Detail & Related papers (2021-01-31T01:18:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.