Reconciliation of Statistical and Spatial Sparsity For Robust Image and
Image-Set Classification
- URL: http://arxiv.org/abs/2106.00256v1
- Date: Tue, 1 Jun 2021 06:33:24 GMT
- Title: Reconciliation of Statistical and Spatial Sparsity For Robust Image and
Image-Set Classification
- Authors: Hao Cheng, Kim-Hui Yap, and Bihan Wen
- Abstract summary: We propose a novel Joint Statistical and Spatial Sparse representation, dubbed textitJ3S, to model the image or image-set data for classification.
We propose to solve the joint sparse coding problem based on the J3S model, by coupling the local and global image representations using joint sparsity.
Experiments show that the proposed J3S-based image classification scheme outperforms the popular or state-of-the-art competing methods over FMD, UIUC, ETH-80 and YTC databases.
- Score: 27.319334479994787
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent image classification algorithms, by learning deep features from
large-scale datasets, have achieved significantly better results comparing to
the classic feature-based approaches. However, there are still various
challenges of image classifications in practice, such as classifying noisy
image or image-set queries and training deep image classification models over
the limited-scale dataset. Instead of applying generic deep features, the
model-based approaches can be more effective and data-efficient for robust
image and image-set classification tasks, as various image priors are exploited
for modeling the inter- and intra-set data variations while preventing
over-fitting. In this work, we propose a novel Joint Statistical and Spatial
Sparse representation, dubbed \textit{J3S}, to model the image or image-set
data for classification, by reconciling both their local patch structures and
global Gaussian distribution mapped into Riemannian manifold. To the best of
our knowledge, no work to date utilized both global statistics and local patch
structures jointly via joint sparse representation. We propose to solve the
joint sparse coding problem based on the J3S model, by coupling the local and
global image representations using joint sparsity. The learned J3S models are
used for robust image and image-set classification. Experiments show that the
proposed J3S-based image classification scheme outperforms the popular or
state-of-the-art competing methods over FMD, UIUC, ETH-80 and YTC databases.
Related papers
- DataDream: Few-shot Guided Dataset Generation [90.09164461462365]
We propose a framework for synthesizing classification datasets that more faithfully represents the real data distribution.
DataDream fine-tunes LoRA weights for the image generation model on the few real images before generating the training data using the adapted model.
We then fine-tune LoRA weights for CLIP using the synthetic data to improve downstream image classification over previous approaches on a large variety of datasets.
arXiv Detail & Related papers (2024-07-15T17:10:31Z) - Reinforcing Pre-trained Models Using Counterfactual Images [54.26310919385808]
This paper proposes a novel framework to reinforce classification models using language-guided generated counterfactual images.
We identify model weaknesses by testing the model using the counterfactual image dataset.
We employ the counterfactual images as an augmented dataset to fine-tune and reinforce the classification model.
arXiv Detail & Related papers (2024-06-19T08:07:14Z) - Multi-method Integration with Confidence-based Weighting for Zero-shot Image Classification [1.7265013728931]
This paper introduces a novel framework for zero-shot learning (ZSL) to recognize new categories that are unseen during training.
We propose three strategies to enhance the model's performance to handle ZSL.
arXiv Detail & Related papers (2024-05-03T15:02:41Z) - Enhance Image Classification via Inter-Class Image Mixup with Diffusion Model [80.61157097223058]
A prevalent strategy to bolster image classification performance is through augmenting the training set with synthetic images generated by T2I models.
In this study, we scrutinize the shortcomings of both current generative and conventional data augmentation techniques.
We introduce an innovative inter-class data augmentation method known as Diff-Mix, which enriches the dataset by performing image translations between classes.
arXiv Detail & Related papers (2024-03-28T17:23:45Z) - Fine-grained Recognition with Learnable Semantic Data Augmentation [68.48892326854494]
Fine-grained image recognition is a longstanding computer vision challenge.
We propose diversifying the training data at the feature-level to alleviate the discriminative region loss problem.
Our method significantly improves the generalization performance on several popular classification networks.
arXiv Detail & Related papers (2023-09-01T11:15:50Z) - Coarse-to-Fine: Learning Compact Discriminative Representation for
Single-Stage Image Retrieval [11.696941841000985]
Two-stage methods following retrieve-and-rerank paradigm have achieved excellent performance, but their separate local and global modules are inefficient to real-world applications.
We propose a mechanism which attentively selects prominent local descriptors and infuse fine-grained semantic relations into the global representation.
Our method achieves state-of-the-art single-stage image retrieval performance on benchmarks such as Revisited Oxford and Revisited Paris.
arXiv Detail & Related papers (2023-08-08T03:06:10Z) - Diffusion Models Beat GANs on Image Classification [37.70821298392606]
Diffusion models have risen to prominence as a state-of-the-art method for image generation, denoising, inpainting, super-resolution, manipulation, etc.
We present our findings that these embeddings are useful beyond the noise prediction task, as they contain discriminative information and can also be leveraged for classification.
We find that with careful feature selection and pooling, diffusion models outperform comparable generative-discriminative methods for classification tasks.
arXiv Detail & Related papers (2023-07-17T17:59:40Z) - CSP: Self-Supervised Contrastive Spatial Pre-Training for
Geospatial-Visual Representations [90.50864830038202]
We present Contrastive Spatial Pre-Training (CSP), a self-supervised learning framework for geo-tagged images.
We use a dual-encoder to separately encode the images and their corresponding geo-locations, and use contrastive objectives to learn effective location representations from images.
CSP significantly boosts the model performance with 10-34% relative improvement with various labeled training data sampling ratios.
arXiv Detail & Related papers (2023-05-01T23:11:18Z) - Multi-dataset Pretraining: A Unified Model for Semantic Segmentation [97.61605021985062]
We propose a unified framework, termed as Multi-Dataset Pretraining, to take full advantage of the fragmented annotations of different datasets.
This is achieved by first pretraining the network via the proposed pixel-to-prototype contrastive loss over multiple datasets.
In order to better model the relationship among images and classes from different datasets, we extend the pixel level embeddings via cross dataset mixing.
arXiv Detail & Related papers (2021-06-08T06:13:11Z) - Unifying Remote Sensing Image Retrieval and Classification with Robust
Fine-tuning [3.6526118822907594]
We aim at unifying remote sensing image retrieval and classification with a new large-scale training and testing dataset, SF300.
We show that our framework systematically achieves a boost of retrieval and classification performance on nine different datasets compared to an ImageNet pretrained baseline.
arXiv Detail & Related papers (2021-02-26T11:01:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.