Semi-Supervised Few-Shot Atomic Action Recognition
        - URL: http://arxiv.org/abs/2011.08410v1
- Date: Tue, 17 Nov 2020 03:59:05 GMT
- Title: Semi-Supervised Few-Shot Atomic Action Recognition
- Authors: Xiaoyuan Ni, Sizhe Song, Yu-Wing Tai, Chi-Keung Tang
- Abstract summary: We propose a novel model for semi-supervised few-shot atomic action recognition.
Our model features unsupervised and contrastive video embedding, loose action alignment, multi-head feature comparison, and attention-based aggregation.
Experiments show that our model can attain high accuracy on representative atomic action datasets outperforming their respective state-of-the-art classification accuracy in full supervision setting.
- Score: 59.587738451616495
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract:   Despite excellent progress has been made, the performance on action
recognition still heavily relies on specific datasets, which are difficult to
extend new action classes due to labor-intensive labeling. Moreover, the high
diversity in Spatio-temporal appearance requires robust and representative
action feature aggregation and attention. To address the above issues, we focus
on atomic actions and propose a novel model for semi-supervised few-shot atomic
action recognition. Our model features unsupervised and contrastive video
embedding, loose action alignment, multi-head feature comparison, and
attention-based aggregation, together of which enables action recognition with
only a few training examples through extracting more representative features
and allowing flexibility in spatial and temporal alignment and variations in
the action. Experiments show that our model can attain high accuracy on
representative atomic action datasets outperforming their respective
state-of-the-art classification accuracy in full supervision setting.
 
      
        Related papers
        - Stochastic Encodings for Active Feature Acquisition [100.47043816019888]
 Active Feature Acquisition is an instance-wise, sequential decision making problem.<n>The aim is to dynamically select which feature to measure based on current observations, independently for each test instance.<n>Common approaches either use Reinforcement Learning, which experiences training difficulties, or greedily maximize the conditional mutual information of the label and unobserved features, which makes myopic.<n>We introduce a latent variable model, trained in a supervised manner. Acquisitions are made by reasoning about the features across many possible unobserved realizations in a latent space.
 arXiv  Detail & Related papers  (2025-08-03T23:48:46Z)
- Multi-level and Multi-modal Action Anticipation [12.921307214813357]
 Action anticipation, the task of predicting future actions from partially observed videos, is crucial for advancing intelligent systems.<n>We introduce textitMulti-level and Multi-modal Action Anticipation (m&m-Ant), a novel multi-modal action anticipation approach.<n>Experiments on widely used datasets, including Breakfast, 50 Salads, and DARai, demonstrate the effectiveness of our approach.
 arXiv  Detail & Related papers  (2025-06-03T02:39:33Z)
- An Information Compensation Framework for Zero-Shot Skeleton-based   Action Recognition [49.45660055499103]
 Zero-shot human skeleton-based action recognition aims to construct a model that can recognize actions outside the categories seen during training.
Previous research has focused on aligning sequences' visual and semantic spatial distributions.
We introduce a new loss function sampling method to obtain a tight and robust representation.
 arXiv  Detail & Related papers  (2024-06-02T06:53:01Z)
- Action-slot: Visual Action-centric Representations for Multi-label   Atomic Activity Recognition in Traffic Scenes [23.284478293459856]
 Action-slot is a slot attention-based approach that learns visual action-centric representations.
Our key idea is to design action slots that are capable of paying attention to regions where atomic activities occur.
To address the limitation, we collect a synthetic dataset called TACO, which is four times larger than OATS.
 arXiv  Detail & Related papers  (2023-11-29T05:28:05Z)
- Exploiting Modality-Specific Features For Multi-Modal Manipulation
  Detection And Grounding [54.49214267905562]
 We construct a transformer-based framework for multi-modal manipulation detection and grounding tasks.
Our framework simultaneously explores modality-specific features while preserving the capability for multi-modal alignment.
We propose an implicit manipulation query (IMQ) that adaptively aggregates global contextual cues within each modality.
 arXiv  Detail & Related papers  (2023-09-22T06:55:41Z)
- Hierarchical Compositional Representations for Few-shot Action
  Recognition [51.288829293306335]
 We propose a novel hierarchical compositional representations (HCR) learning approach for few-shot action recognition.
We divide a complicated action into several sub-actions by carefully designed hierarchical clustering.
We also adopt the Earth Mover's Distance in the transportation problem to measure the similarity between video samples in terms of sub-action representations.
 arXiv  Detail & Related papers  (2022-08-19T16:16:59Z)
- Weakly-Supervised Temporal Action Detection for Fine-Grained Videos with
  Hierarchical Atomic Actions [13.665489987620724]
 We tackle the problem of weakly-supervised fine-grained temporal action detection in videos for the first time.
We propose to model actions as the combinations of reusable atomic actions which are automatically discovered from data.
Our approach constructs a visual representation hierarchy of four levels: clip level, atomic action level, fine action class level and coarse action class level, with supervision at each level.
 arXiv  Detail & Related papers  (2022-07-24T20:32:24Z)
- Few-Shot Fine-Grained Action Recognition via Bidirectional Attention and
  Contrastive Meta-Learning [51.03781020616402]
 Fine-grained action recognition is attracting increasing attention due to the emerging demand of specific action understanding in real-world applications.
We propose a few-shot fine-grained action recognition problem, aiming to recognize novel fine-grained actions with only few samples given for each class.
Although progress has been made in coarse-grained actions, existing few-shot recognition methods encounter two issues handling fine-grained actions.
 arXiv  Detail & Related papers  (2021-08-15T02:21:01Z)
- Learning to Represent Action Values as a Hypergraph on the Action
  Vertices [17.811355496708728]
 Action-value estimation is a critical component of reinforcement learning (RL) methods.
We conjecture that leveraging the structure of multi-dimensional action spaces is a key ingredient for learning good representations of action.
We show the effectiveness of our approach on a myriad of domains: illustrative prediction problems under minimal confounding effects, Atari 2600 games, and discretised physical control benchmarks.
 arXiv  Detail & Related papers  (2020-10-28T00:19:13Z)
- Inferring Temporal Compositions of Actions Using Probabilistic Automata [61.09176771931052]
 We propose to express temporal compositions of actions as semantic regular expressions and derive an inference framework using probabilistic automata.
Our approach is different from existing works that either predict long-range complex activities as unordered sets of atomic actions, or retrieve videos using natural language sentences.
 arXiv  Detail & Related papers  (2020-04-28T00:15:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.