Deep Generative Multi-Agent Imitation Model as a Computational Benchmark
for Evaluating Human Performance in Complex Interactive Tasks: A Case Study
in Football
- URL: http://arxiv.org/abs/2303.13323v1
- Date: Thu, 23 Mar 2023 15:01:52 GMT
- Title: Deep Generative Multi-Agent Imitation Model as a Computational Benchmark
for Evaluating Human Performance in Complex Interactive Tasks: A Case Study
in Football
- Authors: Chaoyi Gu, Varuna De Silva
- Abstract summary: We train a generative model based on Conditional Variational Recurrent Neural Network (VRNN) Model on a large player and ball tracking dataset.
The trained Conditional VRNN Model is used as a benchmark to evaluate team performance.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Evaluating the performance of human is a common need across many
applications, such as in engineering and sports. When evaluating human
performance in completing complex and interactive tasks, the most common way is
to use a metric having been proved efficient for that context, or to use
subjective measurement techniques. However, this can be an error prone and
unreliable process since static metrics cannot capture all the complex contexts
associated with such tasks and biases exist in subjective measurement. The
objective of our research is to create data-driven AI agents as computational
benchmarks to evaluate human performance in solving difficult tasks involving
multiple humans and contextual factors. We demonstrate this within the context
of football performance analysis. We train a generative model based on
Conditional Variational Recurrent Neural Network (VRNN) Model on a large player
and ball tracking dataset. The trained model is used to imitate the
interactions between two teams and predict the performance from each team. Then
the trained Conditional VRNN Model is used as a benchmark to evaluate team
performance. The experimental results on Premier League football dataset
demonstrates the usefulness of our method to existing state-of-the-art static
metric used in football analytics.
Related papers
- Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models [115.501751261878]
Fine-tuning language models(LMs) on human-generated data remains a prevalent practice.
We investigate whether we can go beyond human data on tasks where we have access to scalar feedback.
We find that ReST$EM$ scales favorably with model size and significantly surpasses fine-tuning only on human data.
arXiv Detail & Related papers (2023-12-11T18:17:43Z) - An Information-Theoretic Approach for Estimating Scenario Generalization
in Crowd Motion Prediction [27.10815774845461]
We propose a novel scoring method, which characterizes generalization of models trained on source crowd scenarios and applied to target crowd scenarios.
The Interaction component aims to characterize the difficulty of scenario domains, while the diversity of a scenario domain is captured in the Diversity score.
Our experimental results validate the efficacy of the proposed method on several simulated and real-world (source,target) generalization tasks.
arXiv Detail & Related papers (2022-11-02T01:39:30Z) - JECC: Commonsense Reasoning Tasks Derived from Interactive Fictions [75.42526766746515]
We propose a new commonsense reasoning dataset based on human's Interactive Fiction (IF) gameplay walkthroughs.
Our dataset focuses on the assessment of functional commonsense knowledge rules rather than factual knowledge.
Experiments show that the introduced dataset is challenging to previous machine reading models as well as the new large language models.
arXiv Detail & Related papers (2022-10-18T19:20:53Z) - Benchmarking Learning Efficiency in Deep Reservoir Computing [23.753943709362794]
We introduce a benchmark of increasingly difficult tasks together with a data efficiency metric to measure how quickly machine learning models learn from training data.
We compare the learning speed of some established sequential supervised models, such as RNNs, LSTMs, or Transformers, with relatively less known alternative models based on reservoir computing.
arXiv Detail & Related papers (2022-09-29T08:16:52Z) - Active Learning of Ordinal Embeddings: A User Study on Football Data [4.856635699699126]
Humans innately measure distance between instances in an unlabeled dataset using an unknown similarity function.
This work uses deep metric learning to learn these user-defined similarity functions from few annotations for a large football trajectory dataset.
arXiv Detail & Related papers (2022-07-26T07:55:23Z) - HyperImpute: Generalized Iterative Imputation with Automatic Model
Selection [77.86861638371926]
We propose a generalized iterative imputation framework for adaptively and automatically configuring column-wise models.
We provide a concrete implementation with out-of-the-box learners, simulators, and interfaces.
arXiv Detail & Related papers (2022-06-15T19:10:35Z) - Evaluating natural language processing models with generalization
metrics that do not need access to any training or testing data [66.11139091362078]
We provide the first model selection results on large pretrained Transformers from Huggingface using generalization metrics.
Despite their niche status, we find that metrics derived from the heavy-tail (HT) perspective are particularly useful in NLP tasks.
arXiv Detail & Related papers (2022-02-06T20:07:35Z) - How much progress have we made in neural network training? A New
Evaluation Protocol for Benchmarking Optimizers [86.36020260204302]
We propose a new benchmarking protocol to evaluate both end-to-end efficiency and data-addition training efficiency.
A human study is conducted to show that our evaluation protocol matches human tuning behavior better than the random search.
We then apply the proposed benchmarking framework to 7s and various tasks, including computer vision, natural language processing, reinforcement learning, and graph mining.
arXiv Detail & Related papers (2020-10-19T21:46:39Z) - Human Trajectory Forecasting in Crowds: A Deep Learning Perspective [89.4600982169]
We present an in-depth analysis of existing deep learning-based methods for modelling social interactions.
We propose two knowledge-based data-driven methods to effectively capture these social interactions.
We develop a large scale interaction-centric benchmark TrajNet++, a significant yet missing component in the field of human trajectory forecasting.
arXiv Detail & Related papers (2020-07-07T17:19:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.