Knowledge Guided Learning: Towards Open Domain Egocentric Action
Recognition with Zero Supervision
- URL: http://arxiv.org/abs/2009.07470v2
- Date: Sat, 12 Mar 2022 00:02:25 GMT
- Title: Knowledge Guided Learning: Towards Open Domain Egocentric Action
Recognition with Zero Supervision
- Authors: Sathyanarayanan N. Aakur, Sanjoy Kundu, Nikhil Gunti
- Abstract summary: We show that attention and commonsense knowledge can be used to enable the self-supervised discovery of novel actions in egocentric videos.
We show that our approach can infer and learn novel classes for open vocabulary classification in egocentric videos.
- Score: 5.28539620288341
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Advances in deep learning have enabled the development of models that have
exhibited a remarkable tendency to recognize and even localize actions in
videos. However, they tend to experience errors when faced with scenes or
examples beyond their initial training environment. Hence, they fail to adapt
to new domains without significant retraining with large amounts of annotated
data. In this paper, we propose to overcome these limitations by moving to an
open-world setting by decoupling the ideas of recognition and reasoning.
Building upon the compositional representation offered by Grenander's Pattern
Theory formalism, we show that attention and commonsense knowledge can be used
to enable the self-supervised discovery of novel actions in egocentric videos
in an open-world setting, where data from the observed environment (the target
domain) is open i.e., the vocabulary is partially known and training examples
(both labeled and unlabeled) are not available. We show that our approach can
infer and learn novel classes for open vocabulary classification in egocentric
videos and novel object detection with zero supervision. Extensive experiments
show its competitive performance on two publicly available egocentric action
recognition datasets (GTEA Gaze and GTEA Gaze+) under open-world conditions.
Related papers
- ALGO: Object-Grounded Visual Commonsense Reasoning for Open-World Egocentric Action Recognition [6.253919624802853]
We propose a neuro-symbolic framework called ALGO - Action Learning with Grounded Object recognition.
First, we propose a neuro-symbolic prompting approach that uses object-centric vision-language models as a noisy oracle to ground objects in the video.
Second, driven by prior commonsense knowledge, we discover plausible activities through an energy-based symbolic pattern theory framework.
arXiv Detail & Related papers (2024-06-09T10:30:04Z) - Generating Action-conditioned Prompts for Open-vocabulary Video Action
Recognition [63.95111791861103]
Existing methods typically adapt pretrained image-text models to the video domain.
We argue that augmenting text embeddings with human prior knowledge is pivotal for open-vocabulary video action recognition.
Our method not only sets new SOTA performance but also possesses excellent interpretability.
arXiv Detail & Related papers (2023-12-04T02:31:38Z) - Generalized Label-Efficient 3D Scene Parsing via Hierarchical Feature
Aligned Pre-Training and Region-Aware Fine-tuning [55.517000360348725]
This work presents a framework for dealing with 3D scene understanding when the labeled scenes are quite limited.
To extract knowledge for novel categories from the pre-trained vision-language models, we propose a hierarchical feature-aligned pre-training and knowledge distillation strategy.
Experiments with both indoor and outdoor scenes demonstrated the effectiveness of our approach in both data-efficient learning and open-world few-shot learning.
arXiv Detail & Related papers (2023-12-01T15:47:04Z) - Free-Form Composition Networks for Egocentric Action Recognition [97.02439848145359]
We propose a free-form composition network (FFCN) that can simultaneously learn disentangled verb, preposition, and noun representations.
The proposed FFCN can directly generate new training data samples for rare classes, hence significantly improve action recognition performance.
arXiv Detail & Related papers (2023-07-13T02:22:09Z) - Discovering Novel Actions from Open World Egocentric Videos with Object-Grounded Visual Commonsense Reasoning [6.253919624802853]
We propose a two-step, neuro-symbolic framework called ALGO to infer activities in egocentric videos with limited supervision.
First, we propose a neuro-symbolic prompting approach that uses object-centric vision-language models as a noisy oracle to ground objects in the video.
Second, driven by prior commonsense knowledge, we discover plausible activities through an energy-based symbolic pattern theory framework.
arXiv Detail & Related papers (2023-05-26T03:21:30Z) - Vocabulary-informed Zero-shot and Open-set Learning [128.83517181045815]
We propose vocabulary-informed learning to address problems of supervised, zero-shot, generalized zero-shot and open set recognition.
Specifically, we propose a weighted maximum margin framework for semantic manifold-based recognition that incorporates distance constraints from (both supervised and unsupervised) vocabulary atoms.
We illustrate that resulting model shows improvements in supervised, zero-shot, generalized zero-shot, and large open set recognition, with up to 310K class vocabulary on Animal with Attributes and ImageNet datasets.
arXiv Detail & Related papers (2023-01-03T08:19:22Z) - Open Long-Tailed Recognition in a Dynamic World [82.91025831618545]
Real world data often exhibits a long-tailed and open-ended (with unseen classes) distribution.
A practical recognition system must balance between majority (head) and minority (tail) classes, generalize across the distribution, and acknowledge novelty upon the instances of unseen classes (open classes)
We define Open Long-Tailed Recognition++ as learning from such naturally distributed data and optimizing for the classification accuracy over a balanced test set.
arXiv Detail & Related papers (2022-08-17T15:22:20Z) - Stochastic Coherence Over Attention Trajectory For Continuous Learning
In Video Streams [64.82800502603138]
This paper proposes a novel neural-network-based approach to progressively and autonomously develop pixel-wise representations in a video stream.
The proposed method is based on a human-like attention mechanism that allows the agent to learn by observing what is moving in the attended locations.
Our experiments leverage 3D virtual environments and they show that the proposed agents can learn to distinguish objects just by observing the video stream.
arXiv Detail & Related papers (2022-04-26T09:52:31Z) - Opening Deep Neural Networks with Generative Models [2.0962464943252934]
We propose GeMOS: simple and plug-and-play open set recognition modules that can be attached to pretrained Deep Neural Networks for visual recognition.
The GeMOS framework pairs pre-trained Convolutional Neural Networks with generative models for open set recognition to extract open set scores for each sample.
We conduct a thorough evaluation of the proposed method in comparison with state-of-the-art open set algorithms, finding that GeMOS either outperforms or is statistically indistinguishable from more complex and costly models.
arXiv Detail & Related papers (2021-05-20T20:02:29Z) - Task-Adaptive Negative Class Envision for Few-Shot Open-Set Recognition [36.53830822788852]
We study the problem of few-shot open-set recognition (FSOR), which learns a recognition system robust to queries from new sources.
We propose a novel task-adaptive negative class envision method (TANE) to model the open world.
Our approach significantly improves the state-of-the-art performance on few-shot open-set recognition.
arXiv Detail & Related papers (2020-12-24T02:30:18Z) - A Review of Open-World Learning and Steps Toward Open-World Learning
Without Labels [11.380522815465984]
In open-world learning, an agent starts with a set of known classes, detects, and manages things that it does not know, and learns them over time from a non-stationary stream of data.
This paper formalizes various open-world learning problems including open-world learning without labels.
arXiv Detail & Related papers (2020-11-25T17:41:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.