Discriminative Dictionary Design for Action Classification in Still
Images and Videos
- URL: http://arxiv.org/abs/2005.10149v2
- Date: Sat, 6 Jun 2020 17:36:11 GMT
- Title: Discriminative Dictionary Design for Action Classification in Still
Images and Videos
- Authors: Abhinaba Roy, Biplab Banerjee, Amir Hussain, Soujanya Poria
- Abstract summary: We propose a novel discriminative method for identifying robust and category specific local features.
The framework is validated on the action recognition datasets based on still images and videos.
- Score: 29.930239762446217
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we address the problem of action recognition from still images
and videos. Traditional local features such as SIFT, STIP etc. invariably pose
two potential problems: 1) they are not evenly distributed in different
entities of a given category and 2) many of such features are not exclusive of
the visual concept the entities represent. In order to generate a dictionary
taking the aforementioned issues into account, we propose a novel
discriminative method for identifying robust and category specific local
features which maximize the class separability to a greater extent.
Specifically, we pose the selection of potent local descriptors as filtering
based feature selection problem which ranks the local features per category
based on a novel measure of distinctiveness. The underlying visual entities are
subsequently represented based on the learned dictionary and this stage is
followed by action classification using the random forest model followed by
label propagation refinement. The framework is validated on the action
recognition datasets based on still images (Stanford-40) as well as videos
(UCF-50) and exhibits superior performances than the representative methods
from the literature.
Related papers
- Vocabulary-free Image Classification and Semantic Segmentation [71.78089106671581]
We introduce the Vocabulary-free Image Classification (VIC) task, which aims to assign a class from an un-constrained language-induced semantic space to an input image without needing a known vocabulary.
VIC is challenging due to the vastness of the semantic space, which contains millions of concepts, including fine-grained categories.
We propose Category Search from External Databases (CaSED), a training-free method that leverages a pre-trained vision-language model and an external database.
arXiv Detail & Related papers (2024-04-16T19:27:21Z) - Evolving Interpretable Visual Classifiers with Large Language Models [34.4903887876357]
Multimodal pre-trained models, such as CLIP, are popular for zero-shot classification due to their open-vocabulary flexibility and high performance.
vision-language models, which compute similarity scores between images and class labels, are largely black-box, with limited interpretability, risk for bias, and inability to discover new visual concepts not written down.
We present a novel method that discovers interpretable yet discriminative sets of attributes for visual recognition.
arXiv Detail & Related papers (2024-04-15T17:09:53Z) - A Generative Approach for Wikipedia-Scale Visual Entity Recognition [56.55633052479446]
We address the task of mapping a given query image to one of the 6 million existing entities in Wikipedia.
We introduce a novel Generative Entity Recognition framework, which learns to auto-regressively decode a semantic and discriminative code'' identifying the target entity.
arXiv Detail & Related papers (2024-03-04T13:47:30Z) - Natural Scene Image Annotation Using Local Semantic Concepts and Spatial
Bag of Visual Words [0.0]
This paper introduces a framework for automatically annotating natural scene images with local semantic labels from a predefined vocabulary.
The framework is based on a hypothesis that assumes that, in natural scenes, intermediate semantic concepts are correlated with the local keypoints.
Based on this hypothesis, image regions can be efficiently represented by BOW model and using a machine learning approach, such as SVM, to label image regions with semantic annotations.
arXiv Detail & Related papers (2022-10-17T12:57:51Z) - Visual Classification via Description from Large Language Models [23.932495654407425]
Vision-language models (VLMs) have shown promising performance on a variety of recognition tasks.
We present an alternative framework for classification with VLMs, which we call classification by description.
arXiv Detail & Related papers (2022-10-13T17:03:46Z) - Few-shot Open-set Recognition Using Background as Unknowns [58.04165813493666]
Few-shot open-set recognition aims to classify both seen and novel images given only limited training data of seen classes.
Our proposed method not only outperforms multiple baselines but also sets new results on three popular benchmarks.
arXiv Detail & Related papers (2022-07-19T04:19:29Z) - Semantic Representation and Dependency Learning for Multi-Label Image
Recognition [76.52120002993728]
We propose a novel and effective semantic representation and dependency learning (SRDL) framework to learn category-specific semantic representation for each category.
Specifically, we design a category-specific attentional regions (CAR) module to generate channel/spatial-wise attention matrices to guide model.
We also design an object erasing (OE) module to implicitly learn semantic dependency among categories by erasing semantic-aware regions.
arXiv Detail & Related papers (2022-04-08T00:55:15Z) - On Guiding Visual Attention with Language Specification [76.08326100891571]
We use high-level language specification as advice for constraining the classification evidence to task-relevant features, instead of distractors.
We show that supervising spatial attention in this way improves performance on classification tasks with biased and noisy data.
arXiv Detail & Related papers (2022-02-17T22:40:19Z) - Prototypical Region Proposal Networks for Few-Shot Localization and
Classification [1.5100087942838936]
We develop a framework to unifysegmentation and classification into an end-to-end classification model -- PRoPnet.
We empirically demonstrate that our methods improve accuracy on image datasets with natural scenes containing multiple object classes.
arXiv Detail & Related papers (2021-04-08T04:03:30Z) - Hierarchical Image Classification using Entailment Cone Embeddings [68.82490011036263]
We first inject label-hierarchy knowledge into an arbitrary CNN-based classifier.
We empirically show that availability of such external semantic information in conjunction with the visual semantics from images boosts overall performance.
arXiv Detail & Related papers (2020-04-02T10:22:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.