The Instantaneous Accuracy: a Novel Metric for the Problem of Online
Human Behaviour Recognition in Untrimmed Videos
- URL: http://arxiv.org/abs/2003.09970v2
- Date: Wed, 25 Mar 2020 10:06:37 GMT
- Title: The Instantaneous Accuracy: a Novel Metric for the Problem of Online
Human Behaviour Recognition in Untrimmed Videos
- Authors: Marcos Baptista Rios, Roberto J. L\'opez-Sastre, Fabian Caba Heilbron,
Jan van Gemert, Francisco Javier Acevedo-Rodr\'iguez, and Saturnino
Maldonado-Basc\'on
- Abstract summary: We introduce a novel online metric, the Instantaneous Accuracy ($IA$), that exhibits an emphonline nature.
Our results confirm the problems of previous evaluation protocols, and suggest that an IA-based protocol is more adequate to the online scenario for human behaviour understanding.
- Score: 9.3576825415122
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The problem of Online Human Behaviour Recognition in untrimmed videos, aka
Online Action Detection (OAD), needs to be revisited. Unlike traditional
offline action detection approaches, where the evaluation metrics are clear and
well established, in the OAD setting we find few works and no consensus on the
evaluation protocols to be used. In this paper we introduce a novel online
metric, the Instantaneous Accuracy ($IA$), that exhibits an \emph{online}
nature, solving most of the limitations of the previous (offline) metrics. We
conduct a thorough experimental evaluation on TVSeries dataset, comparing the
performance of various baseline methods to the state of the art. Our results
confirm the problems of previous evaluation protocols, and suggest that an
IA-based protocol is more adequate to the online scenario for human behaviour
understanding. Code of the metric available https://github.com/gramuah/ia
Related papers
- Towards Realistic Evaluation of Commit Message Generation by Matching Online and Offline Settings [77.20838441870151]
Commit message generation is a crucial task in software engineering that is challenging to evaluate correctly.
We use an online metric - the number of edits users introduce before committing the generated messages to the VCS - to select metrics for offline experiments.
Our results indicate that edit distance exhibits the highest correlation, whereas commonly used similarity metrics such as BLEU and METEOR demonstrate low correlation.
arXiv Detail & Related papers (2024-10-15T20:32:07Z) - Rethinking Affect Analysis: A Protocol for Ensuring Fairness and Consistency [24.737468736951374]
We propose a unified protocol for database partitioning that ensures fairness and comparability.
We provide detailed demographic annotations (in terms of race, gender and age), evaluation metrics, and a common framework for expression recognition.
We also rerun the methods with the new protocol and introduce a new leaderboards to encourage future research in affect recognition with a fairer comparison.
arXiv Detail & Related papers (2024-08-04T23:21:46Z) - PREGO: online mistake detection in PRocedural EGOcentric videos [49.72812518471056]
We propose PREGO, the first online one-class classification model for mistake detection in egocentric videos.
PREGO is based on an online action recognition component to model the current action, and a symbolic reasoning module to predict the next actions.
We evaluate PREGO on two procedural egocentric video datasets, Assembly101 and Epic-tent, which we adapt for online benchmarking of procedural mistake detection.
arXiv Detail & Related papers (2024-04-02T13:27:28Z) - Cobra Effect in Reference-Free Image Captioning Metrics [58.438648377314436]
A proliferation of reference-free methods, leveraging visual-language pre-trained models (VLMs), has emerged.
In this paper, we study if there are any deficiencies in reference-free metrics.
We employ GPT-4V as an evaluative tool to assess generated sentences and the result reveals that our approach achieves state-of-the-art (SOTA) performance.
arXiv Detail & Related papers (2024-02-18T12:36:23Z) - Rapid Adaptation in Online Continual Learning: Are We Evaluating It
Right? [135.71855998537347]
We revisit the common practice of evaluating adaptation of Online Continual Learning (OCL) algorithms through the metric of online accuracy.
We show that this metric is unreliable, as even vacuous blind classifiers can achieve unrealistically high online accuracy.
Existing OCL algorithms can also achieve high online accuracy, but perform poorly in retaining useful information.
arXiv Detail & Related papers (2023-05-16T08:29:33Z) - Offline Evaluation of Reward-Optimizing Recommender Systems: The Case of
Simulation [11.940733431087102]
In academic and industry-based research, online evaluation methods are seen as the golden standard for interactive applications like recommendation systems.
Online evaluation methods are costly for a number of reasons, and a clear need remains for reliable offline evaluation procedures.
In academic work, limited access to online systems makes offline metrics the de facto approach to validating novel methods.
arXiv Detail & Related papers (2022-09-18T20:03:32Z) - Do Offline Metrics Predict Online Performance in Recommender Systems? [79.48653445643865]
We investigate the extent to which offline metrics predict online performance by evaluating recommenders across six simulated environments.
We observe that offline metrics are correlated with online performance over a range of environments.
We study the impact of adding exploration strategies, and observe that their effectiveness, when compared to greedy recommendation, is highly dependent on the recommendation algorithm.
arXiv Detail & Related papers (2020-11-07T01:41:13Z) - Rethinking Online Action Detection in Untrimmed Videos: A Novel Online
Evaluation Protocol [9.3576825415122]
Online Action Detection (OAD) problem needs to be revisited.
Unlike traditional offline action detection approaches, in the OAD setting we find very few works and no consensus on the evaluation protocols to be used.
In this work we propose to rethink the OAD scenario, clearly defining the problem itself and the main characteristics that the models which are considered online must comply with.
arXiv Detail & Related papers (2020-03-26T17:13:55Z) - AliExpress Learning-To-Rank: Maximizing Online Model Performance without
Going Online [60.887637616379926]
This paper proposes an evaluator-generator framework for learning-to-rank.
It consists of an evaluator that generalizes to evaluate recommendations involving the context, and a generator that maximizes the evaluator score by reinforcement learning.
Our method achieves a significant improvement in terms of Conversion Rate (CR) over the industrial-level fine-tuned model in online A/B tests.
arXiv Detail & Related papers (2020-03-25T10:27:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.