Generalized Relevance Learning Grassmann Quantization
- URL: http://arxiv.org/abs/2403.09183v1
- Date: Thu, 14 Mar 2024 08:53:01 GMT
- Title: Generalized Relevance Learning Grassmann Quantization
- Authors: M. Mohammadi, M. Babai, M. H. F. Wilkinson,
- Abstract summary: A popular way to model image sets is subspaces, which form a manifold called the Grassmann manifold.
We extend the application of Generalized Relevance Learning Vector Quantization to deal with Grassmann manifold.
We apply it to several recognition tasks including handwritten digit recognition, face recognition, activity recognition, and object recognition.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Due to advancements in digital cameras, it is easy to gather multiple images (or videos) from an object under different conditions. Therefore, image-set classification has attracted more attention, and different solutions were proposed to model them. A popular way to model image sets is subspaces, which form a manifold called the Grassmann manifold. In this contribution, we extend the application of Generalized Relevance Learning Vector Quantization to deal with Grassmann manifold. The proposed model returns a set of prototype subspaces and a relevance vector. While prototypes model typical behaviours within classes, the relevance factors specify the most discriminative principal vectors (or images) for the classification task. They both provide insights into the model's decisions by highlighting influential images and pixels for predictions. Moreover, due to learning prototypes, the model complexity of the new method during inference is independent of dataset size, unlike previous works. We applied it to several recognition tasks including handwritten digit recognition, face recognition, activity recognition, and object recognition. Experiments demonstrate that it outperforms previous works with lower complexity and can successfully model the variation, such as handwritten style or lighting conditions. Moreover, the presence of relevances makes the model robust to the selection of subspaces' dimensionality.
Related papers
- Interpretable Image Classification with Adaptive Prototype-based Vision Transformers [37.62530032165594]
We present ProtoViT, a method for interpretable image classification combining deep learning and case-based reasoning.
Our model integrates Vision Transformer (ViT) backbones into prototype based models, while offering spatially deformed prototypes.
Our experiments show that our model can generally achieve higher performance than the existing prototype based models.
arXiv Detail & Related papers (2024-10-28T04:33:28Z) - Human-Object Interaction Detection Collaborated with Large Relation-driven Diffusion Models [65.82564074712836]
We introduce DIFfusionHOI, a new HOI detector shedding light on text-to-image diffusion models.
We first devise an inversion-based strategy to learn the expression of relation patterns between humans and objects in embedding space.
These learned relation embeddings then serve as textual prompts, to steer diffusion models generate images that depict specific interactions.
arXiv Detail & Related papers (2024-10-26T12:00:33Z) - FUSE-ing Language Models: Zero-Shot Adapter Discovery for Prompt Optimization Across Tokenizers [55.2480439325792]
We propose FUSE, an approach to approximating an adapter layer that maps from one model's textual embedding space to another, even across different tokenizers.
We show the efficacy of our approach via multi-objective optimization over vision-language and causal language models for image captioning and sentiment-based image captioning.
arXiv Detail & Related papers (2024-08-09T02:16:37Z) - Reinforcing Pre-trained Models Using Counterfactual Images [54.26310919385808]
This paper proposes a novel framework to reinforce classification models using language-guided generated counterfactual images.
We identify model weaknesses by testing the model using the counterfactual image dataset.
We employ the counterfactual images as an augmented dataset to fine-tune and reinforce the classification model.
arXiv Detail & Related papers (2024-06-19T08:07:14Z) - When are Foundation Models Effective? Understanding the Suitability for Pixel-Level Classification Using Multispectral Imagery [23.464350453312584]
Foundation models, i.e. very large deep learning models, have demonstrated impressive performances in various language and vision tasks.
Are foundation models always a suitable choice for different remote sensing tasks, and when or when not?
This work aims to enhance the understanding of the status and suitability of foundation models for pixel-level classification.
arXiv Detail & Related papers (2024-04-17T23:30:48Z) - Self-Supervised Open-Ended Classification with Small Visual Language
Models [60.23212389067007]
We present Self-Context Adaptation (SeCAt), a self-supervised approach that unlocks few-shot abilities for open-ended classification with small visual language models.
By using models with approximately 1B parameters we outperform the few-shot abilities of much larger models, such as Frozen and FROMAGe.
arXiv Detail & Related papers (2023-09-30T21:41:21Z) - Diffusion Models Beat GANs on Image Classification [37.70821298392606]
Diffusion models have risen to prominence as a state-of-the-art method for image generation, denoising, inpainting, super-resolution, manipulation, etc.
We present our findings that these embeddings are useful beyond the noise prediction task, as they contain discriminative information and can also be leveraged for classification.
We find that with careful feature selection and pooling, diffusion models outperform comparable generative-discriminative methods for classification tasks.
arXiv Detail & Related papers (2023-07-17T17:59:40Z) - Unicom: Universal and Compact Representation Learning for Image
Retrieval [65.96296089560421]
We cluster the large-scale LAION400M into one million pseudo classes based on the joint textual and visual features extracted by the CLIP model.
To alleviate such conflict, we randomly select partial inter-class prototypes to construct the margin-based softmax loss.
Our method significantly outperforms state-of-the-art unsupervised and supervised image retrieval approaches on multiple benchmarks.
arXiv Detail & Related papers (2023-04-12T14:25:52Z) - A Generalist Framework for Panoptic Segmentation of Images and Videos [61.61453194912186]
We formulate panoptic segmentation as a discrete data generation problem, without relying on inductive bias of the task.
A diffusion model is proposed to model panoptic masks, with a simple architecture and generic loss function.
Our method is capable of modeling video (in a streaming setting) and thereby learns to track object instances automatically.
arXiv Detail & Related papers (2022-10-12T16:18:25Z) - Entropy optimized semi-supervised decomposed vector-quantized
variational autoencoder model based on transfer learning for multiclass text
classification and generation [3.9318191265352196]
We propose a semisupervised discrete latent variable model for multi-class text classification and text generation.
The proposed model employs the concept of transfer learning for training a quantized transformer model.
Experimental results indicate that the proposed model has surpassed the state-of-the-art models remarkably.
arXiv Detail & Related papers (2021-11-10T07:07:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.