The Instantaneous Accuracy: a Novel Metric for the Problem of Online
Human Behaviour Recognition in Untrimmed Videos
- URL: http://arxiv.org/abs/2003.09970v2
- Date: Wed, 25 Mar 2020 10:06:37 GMT
- Title: The Instantaneous Accuracy: a Novel Metric for the Problem of Online
Human Behaviour Recognition in Untrimmed Videos
- Authors: Marcos Baptista Rios, Roberto J. L\'opez-Sastre, Fabian Caba Heilbron,
Jan van Gemert, Francisco Javier Acevedo-Rodr\'iguez, and Saturnino
Maldonado-Basc\'on
- Abstract summary: We introduce a novel online metric, the Instantaneous Accuracy ($IA$), that exhibits an emphonline nature.
Our results confirm the problems of previous evaluation protocols, and suggest that an IA-based protocol is more adequate to the online scenario for human behaviour understanding.
- Score: 9.3576825415122
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The problem of Online Human Behaviour Recognition in untrimmed videos, aka
Online Action Detection (OAD), needs to be revisited. Unlike traditional
offline action detection approaches, where the evaluation metrics are clear and
well established, in the OAD setting we find few works and no consensus on the
evaluation protocols to be used. In this paper we introduce a novel online
metric, the Instantaneous Accuracy ($IA$), that exhibits an \emph{online}
nature, solving most of the limitations of the previous (offline) metrics. We
conduct a thorough experimental evaluation on TVSeries dataset, comparing the
performance of various baseline methods to the state of the art. Our results
confirm the problems of previous evaluation protocols, and suggest that an
IA-based protocol is more adequate to the online scenario for human behaviour
understanding. Code of the metric available https://github.com/gramuah/ia
Related papers
- PREGO: online mistake detection in PRocedural EGOcentric videos [49.72812518471056]
We propose PREGO, the first online one-class classification model for mistake detection in egocentric videos.
PREGO is based on an online action recognition component to model the current action, and a symbolic reasoning module to predict the next actions.
We evaluate PREGO on two procedural egocentric video datasets, Assembly101 and Epic-tent, which we adapt for online benchmarking of procedural mistake detection.
arXiv Detail & Related papers (2024-04-02T13:27:28Z) - Cobra Effect in Reference-Free Image Captioning Metrics [58.438648377314436]
A proliferation of reference-free methods, leveraging visual-language pre-trained models (VLMs), has emerged.
In this paper, we study if there are any deficiencies in reference-free metrics.
We employ GPT-4V as an evaluative tool to assess generated sentences and the result reveals that our approach achieves state-of-the-art (SOTA) performance.
arXiv Detail & Related papers (2024-02-18T12:36:23Z) - A Study of Unsupervised Evaluation Metrics for Practical and Automatic
Domain Adaptation [15.728090002818963]
Unsupervised domain adaptation (UDA) methods facilitate the transfer of models to target domains without labels.
In this paper, we aim to find an evaluation metric capable of assessing the quality of a transferred model without access to target validation labels.
arXiv Detail & Related papers (2023-08-01T05:01:05Z) - Beyond AUROC & co. for evaluating out-of-distribution detection
performance [50.88341818412508]
Given their relevance for safe(r) AI, it is important to examine whether the basis for comparing OOD detection methods is consistent with practical needs.
We propose a new metric - Area Under the Threshold Curve (AUTC), which explicitly penalizes poor separation between ID and OOD samples.
arXiv Detail & Related papers (2023-06-26T12:51:32Z) - Rapid Adaptation in Online Continual Learning: Are We Evaluating It
Right? [135.71855998537347]
We revisit the common practice of evaluating adaptation of Online Continual Learning (OCL) algorithms through the metric of online accuracy.
We show that this metric is unreliable, as even vacuous blind classifiers can achieve unrealistically high online accuracy.
Existing OCL algorithms can also achieve high online accuracy, but perform poorly in retaining useful information.
arXiv Detail & Related papers (2023-05-16T08:29:33Z) - Offline Evaluation of Reward-Optimizing Recommender Systems: The Case of
Simulation [11.940733431087102]
In academic and industry-based research, online evaluation methods are seen as the golden standard for interactive applications like recommendation systems.
Online evaluation methods are costly for a number of reasons, and a clear need remains for reliable offline evaluation procedures.
In academic work, limited access to online systems makes offline metrics the de facto approach to validating novel methods.
arXiv Detail & Related papers (2022-09-18T20:03:32Z) - Do Offline Metrics Predict Online Performance in Recommender Systems? [79.48653445643865]
We investigate the extent to which offline metrics predict online performance by evaluating recommenders across six simulated environments.
We observe that offline metrics are correlated with online performance over a range of environments.
We study the impact of adding exploration strategies, and observe that their effectiveness, when compared to greedy recommendation, is highly dependent on the recommendation algorithm.
arXiv Detail & Related papers (2020-11-07T01:41:13Z) - Rethinking Online Action Detection in Untrimmed Videos: A Novel Online
Evaluation Protocol [9.3576825415122]
Online Action Detection (OAD) problem needs to be revisited.
Unlike traditional offline action detection approaches, in the OAD setting we find very few works and no consensus on the evaluation protocols to be used.
In this work we propose to rethink the OAD scenario, clearly defining the problem itself and the main characteristics that the models which are considered online must comply with.
arXiv Detail & Related papers (2020-03-26T17:13:55Z) - AliExpress Learning-To-Rank: Maximizing Online Model Performance without
Going Online [60.887637616379926]
This paper proposes an evaluator-generator framework for learning-to-rank.
It consists of an evaluator that generalizes to evaluate recommendations involving the context, and a generator that maximizes the evaluator score by reinforcement learning.
Our method achieves a significant improvement in terms of Conversion Rate (CR) over the industrial-level fine-tuned model in online A/B tests.
arXiv Detail & Related papers (2020-03-25T10:27:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.