Egocentric zone-aware action recognition across environments
- URL: http://arxiv.org/abs/2409.14205v1
- Date: Sat, 21 Sep 2024 17:40:48 GMT
- Title: Egocentric zone-aware action recognition across environments
- Authors: Simone Alberto Peirone, Gabriele Goletto, Mirco Planamente, Andrea Bottino, Barbara Caputo, Giuseppe Averta,
- Abstract summary: Activity-centric zones can serve as a prior to favor vision models to recognize human activities.
The appearance of these zones is scene-specific, limiting the transferability of this prior information to unfamiliar areas and domains.
We show how the latter can improve the cross-domain transferability of Egocentric Action Recognition (EAR) models.
- Score: 17.67702928208351
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Human activities exhibit a strong correlation between actions and the places where these are performed, such as washing something at a sink. More specifically, in daily living environments we may identify particular locations, hereinafter named activity-centric zones, which may afford a set of homogeneous actions. Their knowledge can serve as a prior to favor vision models to recognize human activities. However, the appearance of these zones is scene-specific, limiting the transferability of this prior information to unfamiliar areas and domains. This problem is particularly relevant in egocentric vision, where the environment takes up most of the image, making it even more difficult to separate the action from the context. In this paper, we discuss the importance of decoupling the domain-specific appearance of activity-centric zones from their universal, domain-agnostic representations, and show how the latter can improve the cross-domain transferability of Egocentric Action Recognition (EAR) models. We validate our solution on the EPIC-Kitchens-100 and Argo1M datasets
Related papers
- Grounding 3D Scene Affordance From Egocentric Interactions [52.5827242925951]
Grounding 3D scene affordance aims to locate interactive regions in 3D environments.
We introduce a novel task: grounding 3D scene affordance from egocentric interactions.
arXiv Detail & Related papers (2024-09-29T10:46:19Z) - Grounded Affordance from Exocentric View [79.64064711636975]
Affordance grounding aims to locate objects' "action possibilities" regions.
Due to the diversity of interactive affordance, the uniqueness of different individuals leads to diverse interactions.
Human has the ability that transforms the various exocentric interactions into invariant egocentric affordance.
arXiv Detail & Related papers (2022-08-28T10:32:47Z) - Audio-Adaptive Activity Recognition Across Video Domains [112.46638682143065]
We leverage activity sounds for domain adaptation as they have less variance across domains and can reliably indicate which activities are not happening.
We propose an audio-adaptive encoder and associated learning methods that discriminatively adjust the visual feature representation.
We also introduce the new task of actor shift, with a corresponding audio-visual dataset, to challenge our method with situations where the activity appearance changes dramatically.
arXiv Detail & Related papers (2022-03-27T08:15:20Z) - Point-Level Region Contrast for Object Detection Pre-Training [147.47349344401806]
We present point-level region contrast, a self-supervised pre-training approach for the task of object detection.
Our approach performs contrastive learning by directly sampling individual point pairs from different regions.
Compared to an aggregated representation per region, our approach is more robust to the change in input region quality.
arXiv Detail & Related papers (2022-02-09T18:56:41Z) - Regional Attention Network (RAN) for Head Pose and Fine-grained Gesture
Recognition [9.131161856493486]
We propose a novel end-to-end textbfRegional Attention Network (RAN), which is a fully Convolutional Neural Network (CNN)
Our regions consist of one or more consecutive cells and are adapted from the strategies used in computing HOG (Histogram of Oriented Gradient) descriptor.
The proposed approach outperforms the state-of-the-art by a considerable margin in different metrics.
arXiv Detail & Related papers (2021-01-17T10:14:28Z) - Adversarial Graph Representation Adaptation for Cross-Domain Facial
Expression Recognition [86.25926461936412]
We propose a novel Adrialversa Graph Representation Adaptation (AGRA) framework that unifies graph representation propagation with adversarial learning for cross-domain holistic-local feature co-adaptation.
We conduct extensive and fair experiments on several popular benchmarks and show that the proposed AGRA framework achieves superior performance over previous state-of-the-art methods.
arXiv Detail & Related papers (2020-08-03T13:27:24Z) - EGO-TOPO: Environment Affordances from Egocentric Video [104.77363598496133]
We introduce a model for environment affordances that is learned directly from egocentric video.
Our approach decomposes a space into a topological map derived from first-person activity.
On EPIC-Kitchens and EGTEA+, we demonstrate our approach for learning scene affordances and anticipating future actions in long-form video.
arXiv Detail & Related papers (2020-01-14T01:20:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.