Latent Space Energy-based Model for Fine-grained Open Set Recognition
- URL: http://arxiv.org/abs/2309.10711v2
- Date: Mon, 30 Oct 2023 02:37:42 GMT
- Title: Latent Space Energy-based Model for Fine-grained Open Set Recognition
- Authors: Wentao Bao, Qi Yu, Yu Kong
- Abstract summary: Fine-grained open-set recognition (FineOSR) aims to recognize images belonging to classes with subtle appearance differences while rejecting images of unknown classes.
As a type of generative model, energy-based models (EBM) are the potential for hybrid modeling of generative and discriminative tasks.
In this paper, we explore the low-dimensional latent space with energy-based prior distribution for OSR in a fine-grained visual world.
- Score: 46.0388856095674
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Fine-grained open-set recognition (FineOSR) aims to recognize images
belonging to classes with subtle appearance differences while rejecting images
of unknown classes. A recent trend in OSR shows the benefit of generative
models to discriminative unknown detection. As a type of generative model,
energy-based models (EBM) are the potential for hybrid modeling of generative
and discriminative tasks. However, most existing EBMs suffer from density
estimation in high-dimensional space, which is critical to recognizing images
from fine-grained classes. In this paper, we explore the low-dimensional latent
space with energy-based prior distribution for OSR in a fine-grained visual
world. Specifically, based on the latent space EBM, we propose an
attribute-aware information bottleneck (AIB), a residual attribute feature
aggregation (RAFA) module, and an uncertainty-based virtual outlier synthesis
(UVOS) module to improve the expressivity, granularity, and density of the
samples in fine-grained classes, respectively. Our method is flexible to take
advantage of recent vision transformers for powerful visual classification and
generation. The method is validated on both fine-grained and general visual
classification datasets while preserving the capability of generating
photo-realistic fake images with high resolution.
Related papers
- MFCLIP: Multi-modal Fine-grained CLIP for Generalizable Diffusion Face Forgery Detection [64.29452783056253]
The rapid development of photo-realistic face generation methods has raised significant concerns in society and academia.
Although existing approaches mainly capture face forgery patterns using image modality, other modalities like fine-grained noises and texts are not fully explored.
We propose a novel multi-modal fine-grained CLIP (MFCLIP) model, which mines comprehensive and fine-grained forgery traces across image-noise modalities.
arXiv Detail & Related papers (2024-09-15T13:08:59Z) - Contrasting Deepfakes Diffusion via Contrastive Learning and Global-Local Similarities [88.398085358514]
Contrastive Deepfake Embeddings (CoDE) is a novel embedding space specifically designed for deepfake detection.
CoDE is trained via contrastive learning by additionally enforcing global-local similarities.
arXiv Detail & Related papers (2024-07-29T18:00:10Z) - Bi-LORA: A Vision-Language Approach for Synthetic Image Detection [14.448350657613364]
Deep image synthesis techniques, such as generative adversarial networks (GANs) and diffusion models (DMs) have ushered in an era of generating highly realistic images.
This paper takes inspiration from the potent convergence capabilities between vision and language, coupled with the zero-shot nature of vision-language models (VLMs)
We introduce an innovative method called Bi-LORA that leverages VLMs, combined with low-rank adaptation (LORA) tuning techniques, to enhance the precision of synthetic image detection for unseen model-generated images.
arXiv Detail & Related papers (2024-04-02T13:54:22Z) - GenFace: A Large-Scale Fine-Grained Face Forgery Benchmark and Cross Appearance-Edge Learning [50.7702397913573]
The rapid advancement of photorealistic generators has reached a critical juncture where the discrepancy between authentic and manipulated images is increasingly indistinguishable.
Although there have been a number of publicly available face forgery datasets, the forgery faces are mostly generated using GAN-based synthesis technology.
We propose a large-scale, diverse, and fine-grained high-fidelity dataset, namely GenFace, to facilitate the advancement of deepfake detection.
arXiv Detail & Related papers (2024-02-03T03:13:50Z) - DiAD: A Diffusion-based Framework for Multi-class Anomaly Detection [55.48770333927732]
We propose a Difusion-based Anomaly Detection (DiAD) framework for multi-class anomaly detection.
It consists of a pixel-space autoencoder, a latent-space Semantic-Guided (SG) network with a connection to the stable diffusion's denoising network, and a feature-space pre-trained feature extractor.
Experiments on MVTec-AD and VisA datasets demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2023-12-11T18:38:28Z) - Class-Prototype Conditional Diffusion Model with Gradient Projection for Continual Learning [20.175586324567025]
Mitigating catastrophic forgetting is a key hurdle in continual learning.
A major issue is the deterioration in the quality of generated data compared to the original.
We propose a GR-based approach for continual learning that enhances image quality in generators.
arXiv Detail & Related papers (2023-12-10T17:39:42Z) - Progressive Open Space Expansion for Open-Set Model Attribution [19.985618498466042]
We focus on a challenging task, namely Open-Set Model Attribution (OSMA), to simultaneously attribute images to known models and identify those from unknown ones.
Compared to existing open-set recognition (OSR) tasks, OSMA is more challenging as the distinction between images from known and unknown models may only lie in visually imperceptible traces.
We propose a Progressive Open Space Expansion (POSE) solution, which simulates open-set samples that maintain the same semantics as closed-set samples but embedded with different imperceptible traces.
arXiv Detail & Related papers (2023-03-13T05:53:11Z) - Generative Max-Mahalanobis Classifiers for Image Classification,
Generation and More [6.89001867562902]
Max-Mahalanobis (MMC) can be trained discriminatively, generatively, or jointly for image classification and generation.
We show that our Generative MMC (GMMC) can be trained discriminatively, generatively, or jointly for image classification and generation.
arXiv Detail & Related papers (2021-01-01T00:42:04Z) - Attention Model Enhanced Network for Classification of Breast Cancer
Image [54.83246945407568]
AMEN is formulated in a multi-branch fashion with pixel-wised attention model and classification submodular.
To focus more on subtle detail information, the sample image is enhanced by the pixel-wised attention map generated from former branch.
Experiments conducted on three benchmark datasets demonstrate the superiority of the proposed method under various scenarios.
arXiv Detail & Related papers (2020-10-07T08:44:21Z) - Generating Annotated High-Fidelity Images Containing Multiple Coherent
Objects [10.783993190686132]
We propose a multi-object generation framework that can synthesize images with multiple objects without explicitly requiring contextual information.
We demonstrate how coherency and fidelity are preserved with our method through experiments on the Multi-MNIST and CLEVR datasets.
arXiv Detail & Related papers (2020-06-22T11:33:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.