Not All Diffusion Model Activations Have Been Evaluated as Discriminative Features
- URL: http://arxiv.org/abs/2410.03558v3
- Date: Fri, 18 Oct 2024 06:19:45 GMT
- Title: Not All Diffusion Model Activations Have Been Evaluated as Discriminative Features
- Authors: Benyuan Meng, Qianqian Xu, Zitai Wang, Xiaochun Cao, Qingming Huang,
- Abstract summary: Diffusion models are initially designed for image generation.
Recent research shows that the internal signals within their backbones, named activations, can also serve as dense features for various discriminative tasks.
- Score: 115.33889811527533
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Diffusion models are initially designed for image generation. Recent research shows that the internal signals within their backbones, named activations, can also serve as dense features for various discriminative tasks such as semantic segmentation. Given numerous activations, selecting a small yet effective subset poses a fundamental problem. To this end, the early study of this field performs a large-scale quantitative comparison of the discriminative ability of the activations. However, we find that many potential activations have not been evaluated, such as the queries and keys used to compute attention scores. Moreover, recent advancements in diffusion architectures bring many new activations, such as those within embedded ViT modules. Both combined, activation selection remains unresolved but overlooked. To tackle this issue, this paper takes a further step with a much broader range of activations evaluated. Considering the significant increase in activations, a full-scale quantitative comparison is no longer operational. Instead, we seek to understand the properties of these activations, such that the activations that are clearly inferior can be filtered out in advance via simple qualitative evaluation. After careful analysis, we discover three properties universal among diffusion models, enabling this study to go beyond specific models. On top of this, we present effective feature selection solutions for several popular diffusion models. Finally, the experiments across multiple discriminative tasks validate the superiority of our method over the SOTA competitors. Our code is available at https://github.com/Darkbblue/generic-diffusion-feature.
Related papers
- Sparsing Law: Towards Large Language Models with Greater Activation Sparsity [62.09617609556697]
Activation sparsity denotes the existence of substantial weakly-contributed elements within activation outputs that can be eliminated.
We propose PPL-$p%$ sparsity, a precise and performance-aware activation sparsity metric.
We show that ReLU is more efficient as the activation function than SiLU and can leverage more training data to improve activation sparsity.
arXiv Detail & Related papers (2024-11-04T17:59:04Z) - Decomposing the Neurons: Activation Sparsity via Mixture of Experts for Continual Test Time Adaptation [37.79819260918366]
Continual Test-Time Adaptation (CTTA) aims to adapt the pre-trained model to ever-evolving target domains.
We explore the integration of a Mixture-of-Activation-Sparsity-Experts (MoASE) as an adapter for the CTTA task.
arXiv Detail & Related papers (2024-05-26T08:51:39Z) - Massive Activations in Large Language Models [77.51561903918535]
We show the widespread existence of massive activations across various Large Language Models (LLMs)
Massive activations lead to the concentration of attention probabilities to their corresponding tokens, and implicit bias terms in the self-attention output.
arXiv Detail & Related papers (2024-02-27T18:55:17Z) - Enhancing Neural Subset Selection: Integrating Background Information into Set Representations [53.15923939406772]
We show that when the target value is conditioned on both the input set and subset, it is essential to incorporate an textitinvariant sufficient statistic of the superset into the subset of interest.
This ensures that the output value remains invariant to permutations of the subset and its corresponding superset, enabling identification of the specific superset from which the subset originated.
arXiv Detail & Related papers (2024-02-05T16:09:35Z) - Your Diffusion Model is Secretly a Zero-Shot Classifier [90.40799216880342]
We show that density estimates from large-scale text-to-image diffusion models can be leveraged to perform zero-shot classification.
Our generative approach to classification attains strong results on a variety of benchmarks.
Our results are a step toward using generative over discriminative models for downstream tasks.
arXiv Detail & Related papers (2023-03-28T17:59:56Z) - VRA: Variational Rectified Activation for Out-of-distribution Detection [45.804178022641764]
Out-of-distribution (OOD) detection is critical to building reliable machine learning systems in the open world.
ReAct is a typical and effective technique to deal with model overconfidence, which truncates high activations to increase the gap between in-distribution and OOD.
We propose a novel technique called Variational Rectified Activation (VRA)'', which simulates these suppression and amplification operations using piecewise functions.
arXiv Detail & Related papers (2023-02-23T00:45:14Z) - Stochastic Adaptive Activation Function [1.9199289015460212]
This study proposes a simple yet effective activation function that facilitates different thresholds and adaptive activations according to the positions of units and the contexts of inputs.
Experimental analysis demonstrates that our activation function can provide the benefits of more accurate prediction and earlier convergence in many deep learning applications.
arXiv Detail & Related papers (2022-10-21T01:57:25Z) - Less is More: Feature Selection for Adversarial Robustness with
Compressive Counter-Adversarial Attacks [7.5320132424481505]
We propose a novel approach to identify the important features by employing counter-adrial attacks.
We show that there exist a subset of features, classification based on which bridge the gap between the clean and robust accuracy.
We then select features by observing the consistency of the activation values at the penultimate layer.
arXiv Detail & Related papers (2021-06-18T17:39:05Z) - Adversarial Feature Hallucination Networks for Few-Shot Learning [84.31660118264514]
Adversarial Feature Hallucination Networks (AFHN) is based on conditional Wasserstein Generative Adversarial networks (cWGAN)
Two novel regularizers are incorporated into AFHN to encourage discriminability and diversity of the synthesized features.
arXiv Detail & Related papers (2020-03-30T02:43:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.