Let Segment Anything Help Image Dehaze
- URL: http://arxiv.org/abs/2306.15870v1
- Date: Wed, 28 Jun 2023 02:02:19 GMT
- Title: Let Segment Anything Help Image Dehaze
- Authors: Zheyan Jin, Shiqi Chen, Yueting Chen, Zhihai Xu, Huajun Feng
- Abstract summary: We propose a framework to integrate large-model prior into low-level computer vision tasks.
We demonstrate the effectiveness and applicability of large models in guiding low-level visual tasks.
- Score: 12.163299570927302
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The large language model and high-level vision model have achieved impressive
performance improvements with large datasets and model sizes. However,
low-level computer vision tasks, such as image dehaze and blur removal, still
rely on a small number of datasets and small-sized models, which generally
leads to overfitting and local optima. Therefore, we propose a framework to
integrate large-model prior into low-level computer vision tasks. Just as with
the task of image segmentation, the degradation of haze is also
texture-related. So we propose to detect gray-scale coding, network channel
expansion, and pre-dehaze structures to integrate large-model prior knowledge
into any low-level dehazing network. We demonstrate the effectiveness and
applicability of large models in guiding low-level visual tasks through
different datasets and algorithms comparison experiments. Finally, we
demonstrate the effect of grayscale coding, network channel expansion, and
recurrent network structures through ablation experiments. Under the conditions
where additional data and training resources are not required, we successfully
prove that the integration of large-model prior knowledge will improve the
dehaze performance and save training time for low-level visual tasks.
Related papers
- Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge.
Existing methods struggle to balance high model performance with low resource consumption.
We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z) - On Efficient Language and Vision Assistants for Visually-Situated Natural Language Understanding: What Matters in Reading and Reasoning [33.89483627891117]
Recent advancements in language and vision assistants have showcased impressive capabilities but suffer from a lack of transparency.
Open-source models handle general image tasks effectively, but face challenges with the high computational demands of complex visually-situated text understanding.
This study aims to redefine the design of vision-language models by identifying key components and creating efficient models with constrained inference costs.
arXiv Detail & Related papers (2024-06-17T17:57:30Z) - Data-efficient Large Vision Models through Sequential Autoregression [58.26179273091461]
We develop an efficient, autoregression-based vision model on a limited dataset.
We demonstrate how this model achieves proficiency in a spectrum of visual tasks spanning both high-level and low-level semantic understanding.
Our empirical evaluations underscore the model's agility in adapting to various tasks, heralding a significant reduction in the parameter footprint.
arXiv Detail & Related papers (2024-02-07T13:41:53Z) - An Efficient General-Purpose Modular Vision Model via Multi-Task
Heterogeneous Training [79.78201886156513]
We present a model that can perform multiple vision tasks and can be adapted to other downstream tasks efficiently.
Our approach achieves comparable results to single-task state-of-the-art models and demonstrates strong generalization on downstream tasks.
arXiv Detail & Related papers (2023-06-29T17:59:57Z) - Bilevel Fast Scene Adaptation for Low-Light Image Enhancement [50.639332885989255]
Enhancing images in low-light scenes is a challenging but widely concerned task in the computer vision.
Main obstacle lies in the modeling conundrum from distribution discrepancy across different scenes.
We introduce the bilevel paradigm to model the above latent correspondence.
A bilevel learning framework is constructed to endow the scene-irrelevant generality of the encoder towards diverse scenes.
arXiv Detail & Related papers (2023-06-02T08:16:21Z) - GPT4Image: Can Large Pre-trained Models Help Vision Models on Perception
Tasks? [51.22096780511165]
We present a new learning paradigm in which the knowledge extracted from large pre-trained models are utilized to help models like CNN and ViT learn enhanced representations.
We feed detailed descriptions into a pre-trained encoder to extract text embeddings with rich semantic information that encodes the content of images.
arXiv Detail & Related papers (2023-06-01T14:02:45Z) - Advancing Plain Vision Transformer Towards Remote Sensing Foundation
Model [97.9548609175831]
We resort to plain vision transformers with about 100 million parameters and make the first attempt to propose large vision models customized for remote sensing tasks.
Specifically, to handle the large image size and objects of various orientations in RS images, we propose a new rotated varied-size window attention.
Experiments on detection tasks demonstrate the superiority of our model over all state-of-the-art models, achieving 81.16% mAP on the DOTA-V1.0 dataset.
arXiv Detail & Related papers (2022-08-08T09:08:40Z) - Hybrid BYOL-ViT: Efficient approach to deal with small Datasets [0.0]
In this paper, we investigate how self-supervision with strong and sufficient augmentation of unlabeled data can train effectively the first layers of a neural network.
We show that the low-level features derived from a self-supervised architecture can improve the robustness and the overall performance of this emergent architecture.
arXiv Detail & Related papers (2021-11-08T21:44:31Z) - Image Augmentation for Multitask Few-Shot Learning: Agricultural Domain
Use-Case [0.0]
This paper challenges small and imbalanced datasets based on the example of a plant phenomics domain.
We introduce an image augmentation framework, which enables us to extremely enlarge the number of training samples.
We prove that our augmentation method increases model performance when only a few training samples are available.
arXiv Detail & Related papers (2021-02-24T14:08:34Z) - Multi-task pre-training of deep neural networks for digital pathology [8.74883469030132]
We first assemble and transform many digital pathology datasets into a pool of 22 classification tasks and almost 900k images.
We show that our models used as feature extractors either improve significantly over ImageNet pre-trained models or provide comparable performance.
arXiv Detail & Related papers (2020-05-05T08:50:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.