SDI-Net: Toward Sufficient Dual-View Interaction for Low-light Stereo Image Enhancement
- URL: http://arxiv.org/abs/2408.10934v1
- Date: Tue, 20 Aug 2024 15:17:11 GMT
- Title: SDI-Net: Toward Sufficient Dual-View Interaction for Low-light Stereo Image Enhancement
- Authors: Linlin Hu, Ao Sun, Shijie Hao, Richang Hong, Meng Wang,
- Abstract summary: Most low-light image enhancement methods only consider information from a single view.
We propose a model called Toward Sufficient Dual-View Interaction for Low-light Stereo Image Enhancement (Sufficient-Net)
We design a module named Cross-View Sufficient Interaction Module (CSIM) aiming to fully exploit the correlations between the binocular views via the attention mechanism.
- Score: 38.66838623890922
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Currently, most low-light image enhancement methods only consider information from a single view, neglecting the correlation between cross-view information. Therefore, the enhancement results produced by these methods are often unsatisfactory. In this context, there have been efforts to develop methods specifically for low-light stereo image enhancement. These methods take into account the cross-view disparities and enable interaction between the left and right views, leading to improved performance. However, these methods still do not fully exploit the interaction between left and right view information. To address this issue, we propose a model called Toward Sufficient Dual-View Interaction for Low-light Stereo Image Enhancement (SDI-Net). The backbone structure of SDI-Net is two encoder-decoder pairs, which are used to learn the mapping function from low-light images to normal-light images. Among the encoders and the decoders, we design a module named Cross-View Sufficient Interaction Module (CSIM), aiming to fully exploit the correlations between the binocular views via the attention mechanism. The quantitative and visual results on public datasets validate the superiority of our method over other related methods. Ablation studies also demonstrate the effectiveness of the key elements in our model.
Related papers
- RCNet: Deep Recurrent Collaborative Network for Multi-View Low-Light Image Enhancement [19.751696790765635]
We make the first attempt to investigate multi-view low-light image enhancement.
We propose a deep multi-view enhancement framework based on the Recurrent Collaborative Network (RCNet)
Experimental results demonstrate that our RCNet significantly outperforms other state-of-the-art methods.
arXiv Detail & Related papers (2024-09-06T15:49:49Z) - Multi-view Aggregation Network for Dichotomous Image Segmentation [76.75904424539543]
Dichotomous Image (DIS) has recently emerged towards high-precision object segmentation from high-resolution natural images.
Existing methods rely on tedious multiple encoder-decoder streams and stages to gradually complete the global localization and local refinement.
Inspired by it, we model DIS as a multi-view object perception problem and provide a parsimonious multi-view aggregation network (MVANet)
Experiments on the popular DIS-5K dataset show that our MVANet significantly outperforms state-of-the-art methods in both accuracy and speed.
arXiv Detail & Related papers (2024-04-11T03:00:00Z) - Chain-of-Spot: Interactive Reasoning Improves Large Vision-Language Models [81.71651422951074]
Chain-of-Spot (CoS) method is a novel approach that enhances feature extraction by focusing on key regions of interest.
This technique allows LVLMs to access more detailed visual information without altering the original image resolution.
Our empirical findings demonstrate a significant improvement in LVLMs' ability to understand and reason about visual content.
arXiv Detail & Related papers (2024-03-19T17:59:52Z) - Low-light Stereo Image Enhancement and De-noising in the Low-frequency
Information Enhanced Image Space [5.1569866461097185]
Methods are proposed to perform enhancement and de-noising simultaneously.
Low-frequency information enhanced module (IEM) is proposed to suppress noise and produce a new image space.
Cross-channel and spatial context information mining module (CSM) is proposed to encode long-range spatial dependencies.
An encoder-decoder structure is constructed, incorporating cross-view and cross-scale feature interactions.
arXiv Detail & Related papers (2024-01-15T15:03:32Z) - Improving Human-Object Interaction Detection via Virtual Image Learning [68.56682347374422]
Human-Object Interaction (HOI) detection aims to understand the interactions between humans and objects.
In this paper, we propose to alleviate the impact of such an unbalanced distribution via Virtual Image Leaning (VIL)
A novel label-to-image approach, Multiple Steps Image Creation (MUSIC), is proposed to create a high-quality dataset that has a consistent distribution with real images.
arXiv Detail & Related papers (2023-08-04T10:28:48Z) - Cross-View Hierarchy Network for Stereo Image Super-Resolution [14.574538513341277]
Stereo image super-resolution aims to improve the quality of high-resolution stereo image pairs by exploiting complementary information across views.
We propose a novel method, named Cross-View-Hierarchy Network for Stereo Image Super-Resolution (CVHSSR)
CVHSSR achieves the best stereo image super-resolution performance than other state-of-the-art methods while using fewer parameters.
arXiv Detail & Related papers (2023-04-13T03:11:30Z) - SufrinNet: Toward Sufficient Cross-View Interaction for Stereo Image
Enhancement in The Dark [119.01585302856103]
Low-light stereo image enhancement (LLSIE) is a relatively new task to enhance the quality of visually unpleasant stereo images captured in dark conditions.
Current methods clearly suffer from two shortages: 1) insufficient cross-view interaction; 2) lacking long-range dependency for intra-view learning.
We propose a novel LLSIE model, termed underlineSufficient Cunderlineross-View underlineInteraction Network (SufrinNet)
arXiv Detail & Related papers (2022-11-02T04:01:30Z) - COTS: Collaborative Two-Stream Vision-Language Pre-Training Model for
Cross-Modal Retrieval [59.15034487974549]
We propose a novel COllaborative Two-Stream vision-language pretraining model termed COTS for image-text retrieval.
Our COTS achieves the highest performance among all two-stream methods and comparable performance with 10,800X faster in inference.
Importantly, our COTS is also applicable to text-to-video retrieval, yielding new state-ofthe-art on the widely-used MSR-VTT dataset.
arXiv Detail & Related papers (2022-04-15T12:34:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.