Enhance Vision-Language Alignment with Noise
- URL: http://arxiv.org/abs/2412.10817v2
- Date: Tue, 17 Dec 2024 02:35:10 GMT
- Title: Enhance Vision-Language Alignment with Noise
- Authors: Sida Huang, Hongyuan Zhang, Xuelong Li,
- Abstract summary: We investigate whether the frozen model can be fine-tuned by customized noise.
We propose Positive-incentive Noise (PiNI) which can fine-tune CLIP via injecting noise into both visual and text encoders.
- Score: 59.2608298578913
- License:
- Abstract: With the advancement of pre-trained vision-language (VL) models, enhancing the alignment between visual and linguistic modalities in downstream tasks has emerged as a critical challenge. Different from existing fine-tuning methods that add extra modules to these two modalities, we investigate whether the frozen model can be fine-tuned by customized noise. Our approach is motivated by the scientific study of beneficial noise, namely Positive-incentive Noise (Pi-noise or $\pi$-noise) , which quantitatively analyzes the impact of noise. It therefore implies a new scheme to learn beneficial noise distribution that can be employed to fine-tune VL models. Focusing on few-shot classification tasks based on CLIP, we reformulate the inference process of CLIP and apply variational inference, demonstrating how to generate $\pi$-noise towards visual and linguistic modalities. Then, we propose Positive-incentive Noise Injector (PiNI), which can fine-tune CLIP via injecting noise into both visual and text encoders. Since the proposed method can learn the distribution of beneficial noise, we can obtain more diverse embeddings of vision and language to better align these two modalities for specific downstream tasks within limited computational resources. We evaluate different noise incorporation approaches and network architectures of PiNI. The evaluation across 11 datasets demonstrates its effectiveness.
Related papers
- Data Augmentation of Contrastive Learning is Estimating Positive-incentive Noise [54.24688963649581]
We scientifically investigate the connection between contrastive learning and $pi$-noise.
Inspired by the idea of Positive-incentive Noise (Pi-Noise or $pi$-Noise) that aims at learning the reliable noise beneficial to tasks, we develop a $pi$-noise generator.
arXiv Detail & Related papers (2024-08-19T12:07:42Z) - Understanding the Effect of Noise in LLM Training Data with Algorithmic
Chains of Thought [0.0]
We study how noise in chain of thought impacts task performance in highly-controlled setting.
We define two types of noise: textitstatic noise, a local form of noise which is applied after the CoT trace is computed, and textitdynamic noise, a global form of noise which propagates errors in the trace as it is computed.
We find fine-tuned models are extremely robust to high levels of static noise but struggle significantly more with lower levels of dynamic noise.
arXiv Detail & Related papers (2024-02-06T13:59:56Z) - Large Language Models are Efficient Learners of Noise-Robust Speech
Recognition [65.95847272465124]
Recent advances in large language models (LLMs) have promoted generative error correction (GER) for automatic speech recognition (ASR)
In this work, we extend the benchmark to noisy conditions and investigate if we can teach LLMs to perform denoising for GER.
Experiments on various latest LLMs demonstrate our approach achieves a new breakthrough with up to 53.9% correction improvement in terms of word error rate.
arXiv Detail & Related papers (2024-01-19T01:29:27Z) - NoisyNN: Exploring the Impact of Information Entropy Change in Learning Systems [20.2575859510473]
We show that specific noise can boost the performance of various deep models under certain conditions.
We categorize the noise into two types, positive noise (PN) and harmful noise (HN), based on whether the noise can help reduce the task complexity.
arXiv Detail & Related papers (2023-09-19T14:04:04Z) - NLIP: Noise-robust Language-Image Pre-training [95.13287735264937]
We propose a principled Noise-robust Language-Image Pre-training framework (NLIP) to stabilize pre-training via two schemes: noise-harmonization and noise-completion.
Our NLIP can alleviate the common noise effects during image-text pre-training in a more efficient way.
arXiv Detail & Related papers (2022-12-14T08:19:30Z) - Inference and Denoise: Causal Inference-based Neural Speech Enhancement [83.4641575757706]
This study addresses the speech enhancement (SE) task within the causal inference paradigm by modeling the noise presence as an intervention.
The proposed causal inference-based speech enhancement (CISE) separates clean and noisy frames in an intervened noisy speech using a noise detector and assigns both sets of frames to two mask-based enhancement modules (EMs) to perform noise-conditional SE.
arXiv Detail & Related papers (2022-11-02T15:03:50Z) - Robust Time Series Denoising with Learnable Wavelet Packet Transform [1.370633147306388]
In many applications, signal denoising is often the first pre-processing step before any subsequent analysis or learning task.
We propose to apply a deep learning denoising model inspired by a signal processing, a learnable version of wavelet packet transform.
We demonstrate how the proposed algorithm relates to the universality of signal processing methods and the learning capabilities of deep learning approaches.
arXiv Detail & Related papers (2022-06-13T13:05:58Z) - Improving Pre-trained Language Model Fine-tuning with Noise Stability
Regularization [94.4409074435894]
We propose a novel and effective fine-tuning framework, named Layerwise Noise Stability Regularization (LNSR)
Specifically, we propose to inject the standard Gaussian noise and regularize hidden representations of the fine-tuned model.
We demonstrate the advantages of the proposed method over other state-of-the-art algorithms including L2-SP, Mixout and SMART.
arXiv Detail & Related papers (2022-06-12T04:42:49Z) - CDLNet: Noise-Adaptive Convolutional Dictionary Learning Network for
Blind Denoising and Demosaicing [4.975707665155918]
Unrolled optimization networks present an interpretable alternative to constructing deep neural networks.
We propose an unrolled convolutional dictionary learning network (CDLNet) and demonstrate its competitive denoising and demosaicing (JDD) performance.
Specifically, we show that the proposed model outperforms state-of-the-art fully convolutional denoising and JDD models when scaled to a similar parameter count.
arXiv Detail & Related papers (2021-12-02T01:23:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.