SAFCAR: Structured Attention Fusion for Compositional Action Recognition
- URL: http://arxiv.org/abs/2012.02109v2
- Date: Thu, 17 Dec 2020 21:15:37 GMT
- Title: SAFCAR: Structured Attention Fusion for Compositional Action Recognition
- Authors: Tae Soo Kim, Gregory D. Hager
- Abstract summary: We develop and test a novel Structured Attention Fusion (SAF) self-attention mechanism to combine information from object detections.
We show that our approach recognizes novel verb-noun compositions more effectively than current state of the art systems.
We validate our approach on the challenging Something-Else tasks from the Something-Something-V2 dataset.
- Score: 47.43959215267547
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a general framework for compositional action recognition -- i.e.
action recognition where the labels are composed out of simpler components such
as subjects, atomic-actions and objects. The main challenge in compositional
action recognition is that there is a combinatorially large set of possible
actions that can be composed using basic components. However, compositionality
also provides a structure that can be exploited. To do so, we develop and test
a novel Structured Attention Fusion (SAF) self-attention mechanism to combine
information from object detections, which capture the time-series structure of
an action, with visual cues that capture contextual information. We show that
our approach recognizes novel verb-noun compositions more effectively than
current state of the art systems, and it generalizes to unseen action
categories quite efficiently from only a few labeled examples. We validate our
approach on the challenging Something-Else tasks from the
Something-Something-V2 dataset. We further show that our framework is flexible
and can generalize to a new domain by showing competitive results on the
Charades-Fewshot dataset.
Related papers
- Simultaneous Detection and Interaction Reasoning for Object-Centric Action Recognition [21.655278000690686]
We propose an end-to-end object-centric action recognition framework.
It simultaneously performs Detection And Interaction Reasoning in one stage.
We conduct experiments on two datasets, Something-Else and Ikea-Assembly.
arXiv Detail & Related papers (2024-04-18T05:06:12Z) - REACT: Recognize Every Action Everywhere All At Once [8.10024991952397]
Group Activity Decoder (GAR) is a fundamental problem in computer vision, with diverse applications in sports analysis, surveillance, and social scene understanding.
We present REACT, an architecture inspired by the transformer encoder-decoder model.
Our method outperforms state-of-the-art GAR approaches in extensive experiments, demonstrating superior accuracy in recognizing and understanding group activities.
arXiv Detail & Related papers (2023-11-27T20:48:54Z) - Learning Attention Propagation for Compositional Zero-Shot Learning [71.55375561183523]
We propose a novel method called Compositional Attention Propagated Embedding (CAPE)
CAPE learns to identify this structure and propagates knowledge between them to learn class embedding for all seen and unseen compositions.
We show that our method outperforms previous baselines to set a new state-of-the-art on three publicly available benchmarks.
arXiv Detail & Related papers (2022-10-20T19:44:11Z) - Self-Supervised Visual Representation Learning with Semantic Grouping [50.14703605659837]
We tackle the problem of learning visual representations from unlabeled scene-centric data.
We propose contrastive learning from data-driven semantic slots, namely SlotCon, for joint semantic grouping and representation learning.
arXiv Detail & Related papers (2022-05-30T17:50:59Z) - Nested and Balanced Entity Recognition using Multi-Task Learning [0.0]
This paper introduces a partly-layered network architecture that deals with the complexity of overlapping and nested cases.
We train and evaluate this architecture to recognise two kinds of entities - Concepts (CR) and Named Entities (NER)
Our approach achieves state-of-the-art NER performances, while it outperforms previous CR approaches.
arXiv Detail & Related papers (2021-06-11T07:52:32Z) - Home Action Genome: Cooperative Compositional Action Understanding [33.69990813932372]
Existing research on action recognition treats activities as monolithic events occurring in videos.
Cooperative Compositional Action Understanding (CCAU) is a cooperative learning framework for hierarchical action recognition.
We demonstrate the utility of co-learning compositions in few-shot action recognition by achieving 28.6% mAP with just a single sample.
arXiv Detail & Related papers (2021-05-11T17:42:47Z) - Inferring Temporal Compositions of Actions Using Probabilistic Automata [61.09176771931052]
We propose to express temporal compositions of actions as semantic regular expressions and derive an inference framework using probabilistic automata.
Our approach is different from existing works that either predict long-range complex activities as unordered sets of atomic actions, or retrieve videos using natural language sentences.
arXiv Detail & Related papers (2020-04-28T00:15:26Z) - Dynamic Feature Integration for Simultaneous Detection of Salient
Object, Edge and Skeleton [108.01007935498104]
In this paper, we solve three low-level pixel-wise vision problems, including salient object segmentation, edge detection, and skeleton extraction.
We first show some similarities shared by these tasks and then demonstrate how they can be leveraged for developing a unified framework.
arXiv Detail & Related papers (2020-04-18T11:10:11Z) - Look-into-Object: Self-supervised Structure Modeling for Object
Recognition [71.68524003173219]
We propose to "look into object" (explicitly yet intrinsically model the object structure) through incorporating self-supervisions.
We show the recognition backbone can be substantially enhanced for more robust representation learning.
Our approach achieves large performance gain on a number of benchmarks, including generic object recognition (ImageNet) and fine-grained object recognition tasks (CUB, Cars, Aircraft)
arXiv Detail & Related papers (2020-03-31T12:22:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.