Related papers: Multi-View Learning with Context-Guided Receptance for Image Denoising

Multi-View Learning with Context-Guided Receptance for Image Denoising

URL: http://arxiv.org/abs/2505.02705v1
Date: Mon, 05 May 2025 14:57:43 GMT
Title: Multi-View Learning with Context-Guided Receptance for Image Denoising
Authors: Binghong Chen, Tingting Chai, Wei Jiang, Yuanrong Xu, Guanglu Zhou, Xiangqian Wu,
Abstract summary: Image denoising is essential in low-level vision applications such as photography and automated driving.<n>Existing methods struggle with distinguishing complex noise patterns in real-world scenes and consume significant computational resources.<n>In this work, a Context-guided Receptance Weighted Key-Value (M) model is proposed, combining enhanced multi-view feature integration with efficient sequence modeling.<n>The model is validated on multiple real-world image denoising datasets, outperforming the existing state-of-the-art methods quantitatively and reducing inference time up to 40%.
Score: 18.175992709188026
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Image denoising is essential in low-level vision applications such as photography and automated driving. Existing methods struggle with distinguishing complex noise patterns in real-world scenes and consume significant computational resources due to reliance on Transformer-based models. In this work, the Context-guided Receptance Weighted Key-Value (\M) model is proposed, combining enhanced multi-view feature integration with efficient sequence modeling. Our approach introduces the Context-guided Token Shift (CTS) paradigm, which effectively captures local spatial dependencies and enhance the model's ability to model real-world noise distributions. Additionally, the Frequency Mix (FMix) module extracting frequency-domain features is designed to isolate noise in high-frequency spectra, and is integrated with spatial representations through a multi-view learning process. To improve computational efficiency, the Bidirectional WKV (BiWKV) mechanism is adopted, enabling full pixel-sequence interaction with linear complexity while overcoming the causal selection constraints. The model is validated on multiple real-world image denoising datasets, outperforming the existing state-of-the-art methods quantitatively and reducing inference time up to 40\%. Qualitative results further demonstrate the ability of our model to restore fine details in various scenes.

Related papers

Learning Multi-scale Spatial-frequency Features for Image Denoising [58.883244886588336]
We propose a novel multi-scale adaptive dual-domain network (MADNet) for image denoising.<n>We use image pyramid inputs to restore noise-free results from low-resolution images.<n>In order to realize the interaction of high-frequency and low-frequency information, we design an adaptive spatial-frequency learning unit.
arXiv Detail & Related papers (2025-06-19T13:28:09Z)
CTLformer: A Hybrid Denoising Model Combining Convolutional Layers and Self-Attention for Enhanced CT Image Reconstruction [0.21847754147782888]
Low-dose CT (LDCT) images are often accompanied by significant noise, which negatively impacts image quality and subsequent diagnostic accuracy.<n>This paper introduces an innovative model, CTLformer, which combines convolutional structures with transformer architecture.<n>Two key innovations are proposed: a multi-scale attention mechanism and a dynamic attention control mechanism.
arXiv Detail & Related papers (2025-05-18T02:37:50Z)
Global Semantic-Guided Sub-image Feature Weight Allocation in High-Resolution Large Vision-Language Models [50.98559225639266]
Sub-images with higher semantic relevance to the entire image encapsulate richer visual information for preserving the model's visual understanding ability.<n>Global Semantic-guided Weight Allocator (GSWA) module allocates weights to sub-images based on their relative information density.<n>SleighVL, a lightweight yet high-performing model, outperforms models with comparable parameters and remains competitive with larger models.
arXiv Detail & Related papers (2025-01-24T06:42:06Z)
Unifying Visual and Semantic Feature Spaces with Diffusion Models for Enhanced Cross-Modal Alignment [20.902935570581207]
We introduce a Multimodal Alignment and Reconstruction Network (MARNet) to enhance the model's resistance to visual noise. MARNet includes a cross-modal diffusion reconstruction module for smoothly and stably blending information across different domains. Experiments conducted on two benchmark datasets, Vireo-Food172 and Ingredient-101, demonstrate that MARNet effectively improves the quality of image information extracted by the model.
arXiv Detail & Related papers (2024-07-26T16:30:18Z)
Deep Learning CT Image Restoration using System Blur and Noise Models [2.2530496464901106]
We present a method that leverages both degraded image inputs and a characterization of the system blur and noise to combine modeling and deep learning approaches. Results demonstrate the efficacy of providing a deep learning model with auxiliary inputs, representing system blur and noise characteristics.
arXiv Detail & Related papers (2024-07-20T21:17:35Z)
SeNM-VAE: Semi-Supervised Noise Modeling with Hierarchical Variational Autoencoder [13.453138169497903]
SeNM-VAE is a semi-supervised noise modeling method that leverages both paired and unpaired datasets to generate realistic degraded data. We employ our method to generate paired training samples for real-world image denoising and super-resolution tasks. Our approach excels in the quality of synthetic degraded images compared to other unpaired and paired noise modeling methods.
arXiv Detail & Related papers (2024-03-26T09:03:40Z)
Hybrid Convolutional and Attention Network for Hyperspectral Image Denoising [54.110544509099526]
Hyperspectral image (HSI) denoising is critical for the effective analysis and interpretation of hyperspectral data. We propose a hybrid convolution and attention network (HCANet) to enhance HSI denoising. Experimental results on mainstream HSI datasets demonstrate the rationality and effectiveness of the proposed HCANet.
arXiv Detail & Related papers (2024-03-15T07:18:43Z)
Blue noise for diffusion models [50.99852321110366]
We introduce a novel and general class of diffusion models taking correlated noise within and across images into account. Our framework allows introducing correlation across images within a single mini-batch to improve gradient flow. We perform both qualitative and quantitative evaluations on a variety of datasets using our method.
arXiv Detail & Related papers (2024-02-07T14:59:25Z)
Low-light Stereo Image Enhancement and De-noising in the Low-frequency Information Enhanced Image Space [5.1569866461097185]
Methods are proposed to perform enhancement and de-noising simultaneously. Low-frequency information enhanced module (IEM) is proposed to suppress noise and produce a new image space. Cross-channel and spatial context information mining module (CSM) is proposed to encode long-range spatial dependencies. An encoder-decoder structure is constructed, incorporating cross-view and cross-scale feature interactions.
arXiv Detail & Related papers (2024-01-15T15:03:32Z)
DiffiT: Diffusion Vision Transformers for Image Generation [88.08529836125399]
Vision Transformer (ViT) has demonstrated strong modeling capabilities and scalability, especially for recognition tasks. We study the effectiveness of ViTs in diffusion-based generative learning and propose a new model denoted as Diffusion Vision Transformers (DiffiT) DiffiT is surprisingly effective in generating high-fidelity images with significantly better parameter efficiency.
arXiv Detail & Related papers (2023-12-04T18:57:01Z)
Realistic Noise Synthesis with Diffusion Models [44.404059914652194]
Deep denoising models require extensive real-world training data, which is challenging to acquire.<n>We propose a novel Realistic Noise Synthesis Diffusor (RNSD) method using diffusion models to address these challenges.
arXiv Detail & Related papers (2023-05-23T12:56:01Z)
Learning Spatial and Spatio-Temporal Pixel Aggregations for Image and Video Denoising [104.59305271099967]
We present a pixel aggregation network and learn the pixel sampling and averaging strategies for image denoising. We develop a pixel aggregation network for video denoising to sample pixels across the spatial-temporal space. Our method is able to solve the misalignment issues caused by large motion in dynamic scenes.
arXiv Detail & Related papers (2021-01-26T13:00:46Z)
Variational Denoising Network: Toward Blind Noise Modeling and Removal [59.36166491196973]
Blind image denoising is an important yet very challenging problem in computer vision. We propose a new variational inference method, which integrates both noise estimation and image denoising.
arXiv Detail & Related papers (2019-08-29T15:54:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.