Towards Optimal Aggregation of Varying Range Dependencies in Haze Removal
- URL: http://arxiv.org/abs/2408.12317v2
- Date: Mon, 10 Mar 2025 01:14:24 GMT
- Title: Towards Optimal Aggregation of Varying Range Dependencies in Haze Removal
- Authors: Xiaozhe Zhang, Fengying Xie, Haidong Ding, Linpeng Pan, Zhenwei Shi,
- Abstract summary: Haze removal aims to restore a clear image from a hazy input.<n>Existing methods have shown significant efficacy by capturing either short-range dependencies for local detail preservation or long-range dependencies for global context modeling.<n>We propose bfDehazeMatic, which captures both short- and long-range dependencies through dual-path design for improved restoration.
- Score: 17.29370328189668
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Haze removal aims to restore a clear image from a hazy input. Existing methods have shown significant efficacy by capturing either short-range dependencies for local detail preservation or long-range dependencies for global context modeling. Given the complementary strengths of both approaches, a intuitive advancement is to explicitly integrate them into a unified framework. However, this potential remains underexplored in current research. In this paper, we propose \textbf{DehazeMatic}, which leverages the proposed Transformer-Mamba Dual Aggregation block to simultaneously and explicitly captures both short- and long-range dependencies through dual-path design for improved restoration. To ensure that dependencies at varying ranges contribute optimally to performance, we conduct extensive experiments to identify key influencing factors and determine that an effective aggregation mechanism should be guided by the joint consideration of haze density and semantic information. Building on these insights, we introduce the CLIP-enhanced Dual-path Aggregator, which utilizes the rich semantic priors encapsulated in CLIP and the estimated haze density map, derived from its powerful generalization ability, to instruct the aggregation process. Extensive experiments demonstrate that DehazeMatic outperforms sort-of-the-art methods across various benchmarks.
Related papers
- Every SAM Drop Counts: Embracing Semantic Priors for Multi-Modality Image Fusion and Beyond [52.486290612938895]
We propose a novel method that leverages the semantic knowledge from the Segment Anything Model (SAM) to Grow the quality of fusion results and Enable downstream task adaptability.
Specifically, we design a Semantic Persistent Attention (SPA) Module that efficiently maintains source information via the persistent repository while extracting high-level semantic priors from SAM.
Our method achieves a balance between high-quality visual results and downstream task adaptability while maintaining practical deployment efficiency.
arXiv Detail & Related papers (2025-03-03T06:16:31Z) - Attention with Dependency Parsing Augmentation for Fine-Grained Attribution [26.603281615221505]
We develop a fine-grained attribution mechanism that provides supporting evidence from retrieved documents for every answer span.
Existing attribution methods rely on model-internal similarity metrics between responses and documents, such as saliency scores and hidden state similarity.
We propose two techniques applicable to all model-internals-based methods. First, we aggregate token-wise evidence through set union operations, preserving the granularity of representations.
Second, we enhance the attributor by integrating dependency parsing to enrich the semantic completeness of target spans.
arXiv Detail & Related papers (2024-12-16T03:12:13Z) - Continual Panoptic Perception: Towards Multi-modal Incremental Interpretation of Remote Sensing Images [16.0258685984844]
Continual learning (CL) breaks off the one-way training manner and enables a model to adapt to new data, semantics and tasks continuously.
We propose a unified continual learning model that leverages multi-task joint learning covering pixel-level classification, instance-level segmentation and image-level perception.
arXiv Detail & Related papers (2024-07-19T12:22:32Z) - Enhancing Fine-Grained Image Classifications via Cascaded Vision Language Models [0.0]
This paper introduces CascadeVLM, an innovative framework that overcomes the constraints of previous CLIP-based methods.
Experiments across various fine-grained image datasets demonstrate that CascadeVLM significantly outperforms existing models.
arXiv Detail & Related papers (2024-05-18T14:12:04Z) - Dual-Image Enhanced CLIP for Zero-Shot Anomaly Detection [58.228940066769596]
We introduce a Dual-Image Enhanced CLIP approach, leveraging a joint vision-language scoring system.
Our methods process pairs of images, utilizing each as a visual reference for the other, thereby enriching the inference process with visual context.
Our approach significantly exploits the potential of vision-language joint anomaly detection and demonstrates comparable performance with current SOTA methods across various datasets.
arXiv Detail & Related papers (2024-05-08T03:13:20Z) - Correlation-Decoupled Knowledge Distillation for Multimodal Sentiment Analysis with Incomplete Modalities [16.69453837626083]
We propose a Correlation-decoupled Knowledge Distillation (CorrKD) framework for the Multimodal Sentiment Analysis (MSA) task under uncertain missing modalities.
We present a sample-level contrastive distillation mechanism that transfers comprehensive knowledge containing cross-sample correlations to reconstruct missing semantics.
We design a response-disentangled consistency distillation strategy to optimize the sentiment decision boundaries of the student network.
arXiv Detail & Related papers (2024-04-25T09:35:09Z) - Unifying Feature and Cost Aggregation with Transformers for Semantic and Visual Correspondence [51.54175067684008]
This paper introduces a Transformer-based integrative feature and cost aggregation network designed for dense matching tasks.
We first show that feature aggregation and cost aggregation exhibit distinct characteristics and reveal the potential for substantial benefits stemming from the judicious use of both aggregation processes.
Our framework is evaluated on standard benchmarks for semantic matching, and also applied to geometric matching, where we show that our approach achieves significant improvements compared to existing methods.
arXiv Detail & Related papers (2024-03-17T07:02:55Z) - PosSAM: Panoptic Open-vocabulary Segment Anything [58.72494640363136]
PosSAM is an open-vocabulary panoptic segmentation model that unifies the strengths of the Segment Anything Model (SAM) with the vision-native CLIP model in an end-to-end framework.
We introduce a Mask-Aware Selective Ensembling (MASE) algorithm that adaptively enhances the quality of generated masks and boosts the performance of open-vocabulary classification during inference for each image.
arXiv Detail & Related papers (2024-03-14T17:55:03Z) - Dual-Context Aggregation for Universal Image Matting [16.59886660634162]
We propose a simple and universal matting framework, named Dual-Context Aggregation Matting (DCAM)
Specifically, DCAM first adopts a semantic backbone network to extract low-level features and context features from the input image and guidance.
By performing both global contour segmentation and local boundary refinement, DCAM exhibits robustness to diverse types of guidance and objects.
arXiv Detail & Related papers (2024-02-28T06:56:24Z) - S$^2$ME: Spatial-Spectral Mutual Teaching and Ensemble Learning for
Scribble-supervised Polyp Segmentation [21.208071679259604]
We develop a framework of spatial-Spectral Dual-branch Mutual Teaching and Entropy-guided Pseudo Label Ensemble Learning.
We produce reliable mixed pseudo labels, which enhance the effectiveness of ensemble learning.
Our strategy efficiently mitigates the deleterious effects of uncertainty and noise present in pseudo labels.
arXiv Detail & Related papers (2023-06-01T08:47:58Z) - High-fidelity Pseudo-labels for Boosting Weakly-Supervised Segmentation [17.804090651425955]
Image-level weakly-supervised segmentation (WSSS) reduces the usually vast data annotation cost by surrogate segmentation masks during training.
Our work is based on two techniques for improving CAMs; importance sampling, which is a substitute for GAP, and the feature similarity loss.
We reformulate both techniques based on binomial posteriors of multiple independent binary problems.
This has two benefits; their performance is improved and they become more general, resulting in an add-on method that can boost virtually any WSSS method.
arXiv Detail & Related papers (2023-04-05T17:43:57Z) - Understanding and Constructing Latent Modality Structures in Multi-modal
Representation Learning [53.68371566336254]
We argue that the key to better performance lies in meaningful latent modality structures instead of perfect modality alignment.
Specifically, we design 1) a deep feature separation loss for intra-modality regularization; 2) a Brownian-bridge loss for inter-modality regularization; and 3) a geometric consistency loss for both intra- and inter-modality regularization.
arXiv Detail & Related papers (2023-03-10T14:38:49Z) - CLIPood: Generalizing CLIP to Out-of-Distributions [73.86353105017076]
Contrastive language-image pre-training (CLIP) models have shown impressive zero-shot ability, but the further adaptation of CLIP on downstream tasks undesirably degrades OOD performances.
We propose CLIPood, a fine-tuning method that can adapt CLIP models to OOD situations where both domain shifts and open classes may occur on unseen test data.
Experiments on diverse datasets with different OOD scenarios show that CLIPood consistently outperforms existing generalization techniques.
arXiv Detail & Related papers (2023-02-02T04:27:54Z) - One-Shot Adaptation of GAN in Just One CLIP [51.188396199083336]
We present a novel single-shot GAN adaptation method through unified CLIP space manipulations.
Specifically, our model employs a two-step training strategy: reference image search in the source generator using a CLIP-guided latent optimization.
We show that our model generates diverse outputs with the target texture and outperforms the baseline models both qualitatively and quantitatively.
arXiv Detail & Related papers (2022-03-17T13:03:06Z) - Semi-supervised Domain Adaptive Structure Learning [72.01544419893628]
Semi-supervised domain adaptation (SSDA) is a challenging problem requiring methods to overcome both 1) overfitting towards poorly annotated data and 2) distribution shift across domains.
We introduce an adaptive structure learning method to regularize the cooperation of SSL and DA.
arXiv Detail & Related papers (2021-12-12T06:11:16Z) - Consistency Regularization for Deep Face Anti-Spoofing [69.70647782777051]
Face anti-spoofing (FAS) plays a crucial role in securing face recognition systems.
Motivated by this exciting observation, we conjecture that encouraging feature consistency of different views may be a promising way to boost FAS models.
We enhance both Embedding-level and Prediction-level Consistency Regularization (EPCR) in FAS.
arXiv Detail & Related papers (2021-11-24T08:03:48Z) - SSA: Semantic Structure Aware Inference for Weakly Pixel-Wise Dense
Predictions without Cost [36.27226683586425]
The semantic structure aware inference (SSA) is proposed to explore the semantic structure information hidden in different stages of the CNN-based network to generate high-quality CAM in the model inference.
The proposed method has the advantage of no parameters and does not need to be trained. Therefore, it can be applied to a wide range of weakly-supervised pixel-wise dense prediction tasks.
arXiv Detail & Related papers (2021-11-05T11:07:21Z) - Inter-class Discrepancy Alignment for Face Recognition [55.578063356210144]
We propose a unified framework calledInter-class DiscrepancyAlignment(IDA)
IDA-DAO is used to align the similarity scores considering the discrepancy between the images and its neighbors.
IDA-SSE can provide convincing inter-class neighbors by introducing virtual candidate images generated with GAN.
arXiv Detail & Related papers (2021-03-02T08:20:08Z) - Cross-Supervised Joint-Event-Extraction with Heterogeneous Information
Networks [61.950353376870154]
Joint-event-extraction is a sequence-to-sequence labeling task with a tag set composed of tags of triggers and entities.
We propose a Cross-Supervised Mechanism (CSM) to alternately supervise the extraction of triggers or entities.
Our approach outperforms the state-of-the-art methods in both entity and trigger extraction.
arXiv Detail & Related papers (2020-10-13T11:51:17Z) - An unsupervised deep learning framework via integrated optimization of
representation learning and GMM-based modeling [31.334196673143257]
This paper introduces a new principle of joint learning on both deep representations and GMM-based deep modeling.
In comparison with the existing work in similar areas, our objective function has two learning targets, which are created to be jointly optimized.
The compactness of clusters is significantly enhanced by reducing the intra-cluster distances, and the separability is improved by increasing the inter-cluster distances.
arXiv Detail & Related papers (2020-09-11T04:57:03Z) - Deep Semi-supervised Knowledge Distillation for Overlapping Cervical
Cell Instance Segmentation [54.49894381464853]
We propose to leverage both labeled and unlabeled data for instance segmentation with improved accuracy by knowledge distillation.
We propose a novel Mask-guided Mean Teacher framework with Perturbation-sensitive Sample Mining.
Experiments show that the proposed method improves the performance significantly compared with the supervised method learned from labeled data only.
arXiv Detail & Related papers (2020-07-21T13:27:09Z) - Generalized Adversarially Learned Inference [42.40405470084505]
We develop methods of inference of latent variables in GANs by adversarially training an image generator along with an encoder to match two joint distributions of image and latent vector pairs.
We incorporate multiple layers of feedback on reconstructions, self-supervision, and other forms of supervision based on prior or learned knowledge about the desired solutions.
arXiv Detail & Related papers (2020-06-15T02:18:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.