Rethinking Online Action Detection in Untrimmed Videos: A Novel Online
Evaluation Protocol
- URL: http://arxiv.org/abs/2003.12041v1
- Date: Thu, 26 Mar 2020 17:13:55 GMT
- Title: Rethinking Online Action Detection in Untrimmed Videos: A Novel Online
Evaluation Protocol
- Authors: Marcos Baptista Rios, Roberto J. L\'opez-Sastre, Fabian Caba Heilbron,
Jan van Gemert, F. Javier Acevedo-Rodr\'iguez, and S. Maldonado-Basc\'on
- Abstract summary: Online Action Detection (OAD) problem needs to be revisited.
Unlike traditional offline action detection approaches, in the OAD setting we find very few works and no consensus on the evaluation protocols to be used.
In this work we propose to rethink the OAD scenario, clearly defining the problem itself and the main characteristics that the models which are considered online must comply with.
- Score: 9.3576825415122
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The Online Action Detection (OAD) problem needs to be revisited. Unlike
traditional offline action detection approaches, where the evaluation metrics
are clear and well established, in the OAD setting we find very few works and
no consensus on the evaluation protocols to be used. In this work we propose to
rethink the OAD scenario, clearly defining the problem itself and the main
characteristics that the models which are considered online must comply with.
We also introduce a novel metric: the Instantaneous Accuracy ($IA$). This new
metric exhibits an \emph{online} nature and solves most of the limitations of
the previous metrics. We conduct a thorough experimental evaluation on 3
challenging datasets, where the performance of various baseline methods is
compared to that of the state-of-the-art. Our results confirm the problems of
the previous evaluation protocols, and suggest that an IA-based protocol is
more adequate to the online scenario. The baselines models and a development
kit with the novel evaluation protocol are publicly available:
https://github.com/gramuah/ia.
Related papers
- Rethinking Affect Analysis: A Protocol for Ensuring Fairness and Consistency [24.737468736951374]
We propose a unified protocol for database partitioning that ensures fairness and comparability.
We provide detailed demographic annotations (in terms of race, gender and age), evaluation metrics, and a common framework for expression recognition.
We also rerun the methods with the new protocol and introduce a new leaderboards to encourage future research in affect recognition with a fairer comparison.
arXiv Detail & Related papers (2024-08-04T23:21:46Z) - Position: Quo Vadis, Unsupervised Time Series Anomaly Detection? [11.269007806012931]
The current state of machine learning scholarship in Timeseries Anomaly Detection (TAD) is plagued by the persistent use of flawed evaluation metrics.
Our paper presents a critical analysis of the status quo in TAD, revealing the misleading track of current research.
arXiv Detail & Related papers (2024-05-04T14:43:31Z) - PREGO: online mistake detection in PRocedural EGOcentric videos [49.72812518471056]
We propose PREGO, the first online one-class classification model for mistake detection in egocentric videos.
PREGO is based on an online action recognition component to model the current action, and a symbolic reasoning module to predict the next actions.
We evaluate PREGO on two procedural egocentric video datasets, Assembly101 and Epic-tent, which we adapt for online benchmarking of procedural mistake detection.
arXiv Detail & Related papers (2024-04-02T13:27:28Z) - Cobra Effect in Reference-Free Image Captioning Metrics [58.438648377314436]
A proliferation of reference-free methods, leveraging visual-language pre-trained models (VLMs), has emerged.
In this paper, we study if there are any deficiencies in reference-free metrics.
We employ GPT-4V as an evaluative tool to assess generated sentences and the result reveals that our approach achieves state-of-the-art (SOTA) performance.
arXiv Detail & Related papers (2024-02-18T12:36:23Z) - A Study of Unsupervised Evaluation Metrics for Practical and Automatic
Domain Adaptation [15.728090002818963]
Unsupervised domain adaptation (UDA) methods facilitate the transfer of models to target domains without labels.
In this paper, we aim to find an evaluation metric capable of assessing the quality of a transferred model without access to target validation labels.
arXiv Detail & Related papers (2023-08-01T05:01:05Z) - On Pitfalls of Test-Time Adaptation [82.8392232222119]
Test-Time Adaptation (TTA) has emerged as a promising approach for tackling the robustness challenge under distribution shifts.
We present TTAB, a test-time adaptation benchmark that encompasses ten state-of-the-art algorithms, a diverse array of distribution shifts, and two evaluation protocols.
arXiv Detail & Related papers (2023-06-06T09:35:29Z) - Offline Evaluation of Reward-Optimizing Recommender Systems: The Case of
Simulation [11.940733431087102]
In academic and industry-based research, online evaluation methods are seen as the golden standard for interactive applications like recommendation systems.
Online evaluation methods are costly for a number of reasons, and a clear need remains for reliable offline evaluation procedures.
In academic work, limited access to online systems makes offline metrics the de facto approach to validating novel methods.
arXiv Detail & Related papers (2022-09-18T20:03:32Z) - Towards Online Domain Adaptive Object Detection [79.89082006155135]
Existing object detection models assume both the training and test data are sampled from the same source domain.
We propose a novel unified adaptation framework that adapts and improves generalization on the target domain in online settings.
arXiv Detail & Related papers (2022-04-11T17:47:22Z) - Do Offline Metrics Predict Online Performance in Recommender Systems? [79.48653445643865]
We investigate the extent to which offline metrics predict online performance by evaluating recommenders across six simulated environments.
We observe that offline metrics are correlated with online performance over a range of environments.
We study the impact of adding exploration strategies, and observe that their effectiveness, when compared to greedy recommendation, is highly dependent on the recommendation algorithm.
arXiv Detail & Related papers (2020-11-07T01:41:13Z) - AliExpress Learning-To-Rank: Maximizing Online Model Performance without
Going Online [60.887637616379926]
This paper proposes an evaluator-generator framework for learning-to-rank.
It consists of an evaluator that generalizes to evaluate recommendations involving the context, and a generator that maximizes the evaluator score by reinforcement learning.
Our method achieves a significant improvement in terms of Conversion Rate (CR) over the industrial-level fine-tuned model in online A/B tests.
arXiv Detail & Related papers (2020-03-25T10:27:44Z) - The Instantaneous Accuracy: a Novel Metric for the Problem of Online
Human Behaviour Recognition in Untrimmed Videos [9.3576825415122]
We introduce a novel online metric, the Instantaneous Accuracy ($IA$), that exhibits an emphonline nature.
Our results confirm the problems of previous evaluation protocols, and suggest that an IA-based protocol is more adequate to the online scenario for human behaviour understanding.
arXiv Detail & Related papers (2020-03-22T19:04:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.