SufrinNet: Toward Sufficient Cross-View Interaction for Stereo Image
Enhancement in The Dark
- URL: http://arxiv.org/abs/2211.00859v2
- Date: Fri, 4 Nov 2022 09:02:20 GMT
- Title: SufrinNet: Toward Sufficient Cross-View Interaction for Stereo Image
Enhancement in The Dark
- Authors: Huan Zheng, Zhao Zhang, Jicong Fan, Richang Hong, Yi Yang, Shuicheng
Yan
- Abstract summary: Low-light stereo image enhancement (LLSIE) is a relatively new task to enhance the quality of visually unpleasant stereo images captured in dark conditions.
Current methods clearly suffer from two shortages: 1) insufficient cross-view interaction; 2) lacking long-range dependency for intra-view learning.
We propose a novel LLSIE model, termed underlineSufficient Cunderlineross-View underlineInteraction Network (SufrinNet)
- Score: 119.01585302856103
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Low-light stereo image enhancement (LLSIE) is a relatively new task to
enhance the quality of visually unpleasant stereo images captured in dark
conditions. So far, very few studies on deep LLSIE have been explored due to
certain challenging issues, i.e., the task has not been well addressed, and
current methods clearly suffer from two shortages: 1) insufficient cross-view
interaction; 2) lacking long-range dependency for intra-view learning. In this
paper, we therefore propose a novel LLSIE model, termed \underline{Suf}ficient
C\underline{r}oss-View \underline{In}teraction Network (SufrinNet). To be
specific, we present sufficient inter-view interaction module (SIIM) to enhance
the information exchange across views. SIIM not only discovers the cross-view
correlations at different scales, but also explores the cross-scale information
interaction. Besides, we present a spatial-channel information mining block
(SIMB) for intra-view feature extraction, and the benefits are twofold. One is
the long-range dependency capture to build spatial long-range relationship, and
the other is expanded channel information refinement that enhances information
flow in channel dimension. Extensive experiments on Flickr1024, KITTI 2012,
KITTI 2015 and Middlebury datasets show that our method obtains better
illumination adjustment and detail recovery, and achieves SOTA performance
compared to other related methods. Our codes, datasets and models will be
publicly available.
Related papers
- SDI-Net: Toward Sufficient Dual-View Interaction for Low-light Stereo Image Enhancement [38.66838623890922]
Most low-light image enhancement methods only consider information from a single view.
We propose a model called Toward Sufficient Dual-View Interaction for Low-light Stereo Image Enhancement (Sufficient-Net)
We design a module named Cross-View Sufficient Interaction Module (CSIM) aiming to fully exploit the correlations between the binocular views via the attention mechanism.
arXiv Detail & Related papers (2024-08-20T15:17:11Z) - ML-CrAIST: Multi-scale Low-high Frequency Information-based Cross black Attention with Image Super-resolving Transformer [3.686808512438363]
This work proposes a transformer-based super-resolution architecture called ML-CrAIST.
We operate spatial and channel self-attention, which concurrently model pixel interaction from both spatial and channel dimensions.
We devise a cross-attention block for super-resolution, which explores the correlations between low and high-frequency information.
arXiv Detail & Related papers (2024-08-19T12:23:15Z) - Learning Accurate and Enriched Features for Stereo Image Super-Resolution [0.0]
Stereo image super-resolution (stereoSR) aims to enhance the quality of super-resolution results by incorporating complementary information from an alternative view.
We propose a mixed-scale selective fusion network (MSSFNet) to preserve precise spatial details and incorporate abundant contextual information.
MSSFNet achieves significant improvements over state-of-the-art approaches on both quantitative and qualitative evaluations.
arXiv Detail & Related papers (2024-06-23T03:34:17Z) - ECAFormer: Low-light Image Enhancement using Cross Attention [11.554554006307836]
Low-light image enhancement (LLIE) is critical in computer vision.
We design a hierarchical mutual Enhancement via a Cross Attention transformer (ECAFormer)
We show that ECAFormer reaches competitive performance across multiple benchmarks, yielding nearly a 3% improvement in PSNR over the suboptimal method.
arXiv Detail & Related papers (2024-06-19T07:21:31Z) - Rotated Multi-Scale Interaction Network for Referring Remote Sensing Image Segmentation [63.15257949821558]
Referring Remote Sensing Image (RRSIS) is a new challenge that combines computer vision and natural language processing.
Traditional Referring Image (RIS) approaches have been impeded by the complex spatial scales and orientations found in aerial imagery.
We introduce the Rotated Multi-Scale Interaction Network (RMSIN), an innovative approach designed for the unique demands of RRSIS.
arXiv Detail & Related papers (2023-12-19T08:14:14Z) - Mutual Information-driven Triple Interaction Network for Efficient Image
Dehazing [54.168567276280505]
We propose a novel Mutual Information-driven Triple interaction Network (MITNet) for image dehazing.
The first stage, named amplitude-guided haze removal, aims to recover the amplitude spectrum of the hazy images for haze removal.
The second stage, named phase-guided structure refined, devotes to learning the transformation and refinement of the phase spectrum.
arXiv Detail & Related papers (2023-08-14T08:23:58Z) - COTS: Collaborative Two-Stream Vision-Language Pre-Training Model for
Cross-Modal Retrieval [59.15034487974549]
We propose a novel COllaborative Two-Stream vision-language pretraining model termed COTS for image-text retrieval.
Our COTS achieves the highest performance among all two-stream methods and comparable performance with 10,800X faster in inference.
Importantly, our COTS is also applicable to text-to-video retrieval, yielding new state-ofthe-art on the widely-used MSR-VTT dataset.
arXiv Detail & Related papers (2022-04-15T12:34:47Z) - Spatial-Temporal Correlation and Topology Learning for Person
Re-Identification in Videos [78.45050529204701]
We propose a novel framework to pursue discriminative and robust representation by modeling cross-scale spatial-temporal correlation.
CTL utilizes a CNN backbone and a key-points estimator to extract semantic local features from human body.
It explores a context-reinforced topology to construct multi-scale graphs by considering both global contextual information and physical connections of human body.
arXiv Detail & Related papers (2021-04-15T14:32:12Z) - X-ModalNet: A Semi-Supervised Deep Cross-Modal Network for
Classification of Remote Sensing Data [69.37597254841052]
We propose a novel cross-modal deep-learning framework called X-ModalNet.
X-ModalNet generalizes well, owing to propagating labels on an updatable graph constructed by high-level features on the top of the network.
We evaluate X-ModalNet on two multi-modal remote sensing datasets (HSI-MSI and HSI-SAR) and achieve a significant improvement in comparison with several state-of-the-art methods.
arXiv Detail & Related papers (2020-06-24T15:29:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.