Bringing Online Egocentric Action Recognition into the wild
- URL: http://arxiv.org/abs/2211.03004v1
- Date: Sun, 6 Nov 2022 01:41:02 GMT
- Title: Bringing Online Egocentric Action Recognition into the wild
- Authors: Gabriele Goletto, Mirco Planamente, Barbara Caputo and Giuseppe Averta
- Abstract summary: We set the boundaries that egocentric vision models should consider for realistic applications.
We present a new model-agnostic technique that enables the rapid repurposing of existing architectures.
- Score: 18.02166620265241
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: To enable a safe and effective human-robot cooperation, it is crucial to
develop models for the identification of human activities. Egocentric vision
seems to be a viable solution to solve this problem, and therefore many works
provide deep learning solutions to infer human actions from first person
videos. However, although very promising, most of these do not consider the
major challenges that comes with a realistic deployment, such as the
portability of the model, the need for real-time inference, and the robustness
with respect to the novel domains (i.e., new spaces, users, tasks). With this
paper, we set the boundaries that egocentric vision models should consider for
realistic applications, defining a novel setting of egocentric action
recognition in the wild, which encourages researchers to develop novel,
applications-aware solutions. We also present a new model-agnostic technique
that enables the rapid repurposing of existing architectures in this new
context, demonstrating the feasibility to deploy a model on a tiny device
(Jetson Nano) and to perform the task directly on the edge with very low energy
consumption (2.4W on average at 50 fps).
Related papers
- Evaluating the Effectiveness of Video Anomaly Detection in the Wild: Online Learning and Inference for Real-world Deployment [2.1374208474242815]
Video Anomaly Detection (VAD) identifies unusual activities in video streams, a key technology with broad applications ranging from surveillance to healthcare.
Tackling VAD in real-life settings poses significant challenges due to the dynamic nature of human actions, environmental variations, and domain shifts.
Online learning is a potential strategy to mitigate this issue by allowing models to adapt to new information continuously.
arXiv Detail & Related papers (2024-04-29T14:47:32Z) - Smart Help: Strategic Opponent Modeling for Proactive and Adaptive Robot Assistance in Households [30.33911147366425]
Smart Help aims to provide proactive yet adaptive support to human agents with diverse disabilities.
We introduce an innovative opponent modeling module that provides a nuanced understanding of the main agent's capabilities and goals.
Our findings illustrate the potential of AI-imbued assistive robots in improving the well-being of vulnerable groups.
arXiv Detail & Related papers (2024-04-13T13:03:59Z) - Self-supervised novel 2D view synthesis of large-scale scenes with
efficient multi-scale voxel carving [77.07589573960436]
We introduce an efficient multi-scale voxel carving method to generate novel views of real scenes.
Our final high-resolution output is efficiently self-trained on data automatically generated by the voxel carving module.
We demonstrate the effectiveness of our method on highly complex and large-scale scenes in real environments.
arXiv Detail & Related papers (2023-06-26T13:57:05Z) - Transferring Foundation Models for Generalizable Robotic Manipulation [82.12754319808197]
We propose a novel paradigm that effectively leverages language-reasoning segmentation mask generated by internet-scale foundation models.
Our approach can effectively and robustly perceive object pose and enable sample-efficient generalization learning.
Demos can be found in our submitted video, and more comprehensive ones can be found in link1 or link2.
arXiv Detail & Related papers (2023-06-09T07:22:12Z) - WenLan 2.0: Make AI Imagine via a Multimodal Foundation Model [74.4875156387271]
We develop a novel foundation model pre-trained with huge multimodal (visual and textual) data.
We show that state-of-the-art results can be obtained on a wide range of downstream tasks.
arXiv Detail & Related papers (2021-10-27T12:25:21Z) - Action Transformer: A Self-Attention Model for Short-Time Human Action
Recognition [5.123810256000945]
Action Transformer (AcT) is a self-attentional architecture that consistently outperforms more elaborated networks that mix convolutional, recurrent, and attentive layers.
AcT exploits 2D pose representations over small temporal windows, providing a low latency solution for accurate and effective real-time performance.
arXiv Detail & Related papers (2021-07-01T16:53:16Z) - Online reinforcement learning with sparse rewards through an active
inference capsule [62.997667081978825]
This paper introduces an active inference agent which minimizes the novel free energy of the expected future.
Our model is capable of solving sparse-reward problems with a very high sample efficiency.
We also introduce a novel method for approximating the prior model from the reward function, which simplifies the expression of complex objectives.
arXiv Detail & Related papers (2021-06-04T10:03:36Z) - Model-Based Visual Planning with Self-Supervised Functional Distances [104.83979811803466]
We present a self-supervised method for model-based visual goal reaching.
Our approach learns entirely using offline, unlabeled data.
We find that this approach substantially outperforms both model-free and model-based prior methods.
arXiv Detail & Related papers (2020-12-30T23:59:09Z) - Real-time Active Vision for a Humanoid Soccer Robot Using Deep
Reinforcement Learning [0.8701566919381223]
We present an active vision method using a deep reinforcement learning approach for a humanoid soccer-playing robot.
The proposed method adaptively optimises the viewpoint of the robot to acquire the most useful landmarks for self-localisation.
arXiv Detail & Related papers (2020-11-27T17:29:48Z) - Goal-Aware Prediction: Learning to Model What Matters [105.43098326577434]
One of the fundamental challenges in using a learned forward dynamics model is the mismatch between the objective of the learned model and that of the downstream planner or policy.
We propose to direct prediction towards task relevant information, enabling the model to be aware of the current task and encouraging it to only model relevant quantities of the state space.
We find that our method more effectively models the relevant parts of the scene conditioned on the goal, and as a result outperforms standard task-agnostic dynamics models and model-free reinforcement learning.
arXiv Detail & Related papers (2020-07-14T16:42:59Z) - Active Reward Learning for Co-Robotic Vision Based Exploration in
Bandwidth Limited Environments [40.47144302684855]
We present a novel POMDP problem formulation for a robot that must autonomously decide where to go to collect new and scientifically relevant images.
We derive constraints and design principles for the observation model, reward model, and communication strategy of such a robot.
We introduce a novel active reward learning strategy based on making queries to help the robot minimize path "regret" online.
arXiv Detail & Related papers (2020-03-10T21:57:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.