Evaluation metrics for behaviour modeling
- URL: http://arxiv.org/abs/2007.12298v1
- Date: Thu, 23 Jul 2020 23:47:24 GMT
- Title: Evaluation metrics for behaviour modeling
- Authors: Daniel Jiwoong Im, Iljung Kwak, Kristin Branson
- Abstract summary: We propose and investigate metrics for evaluating and comparing generative models of behavior learned using imitation learning.
These criteria look at longer temporal relationships in behavior, are relevant if behavior has some properties that are inherently unpredictable, and highlight biases in the overall distribution of behaviors produced by the model.
We show that the proposed metrics correspond with biologists' intuition about behavior, and allow us to evaluate models, understand their biases, and enable us to propose new research directions.
- Score: 2.616915680939834
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A primary difficulty with unsupervised discovery of structure in large data
sets is a lack of quantitative evaluation criteria. In this work, we propose
and investigate several metrics for evaluating and comparing generative models
of behavior learned using imitation learning. Compared to the commonly-used
model log-likelihood, these criteria look at longer temporal relationships in
behavior, are relevant if behavior has some properties that are inherently
unpredictable, and highlight biases in the overall distribution of behaviors
produced by the model. Pointwise metrics compare real to model-predicted
trajectories given true past information. Distribution metrics compare
statistics of the model simulating behavior in open loop, and are inspired by
how experimental biologists evaluate the effects of manipulations on animal
behavior. We show that the proposed metrics correspond with biologists'
intuitions about behavior, and allow us to evaluate models, understand their
biases, and enable us to propose new research directions.
Related papers
- Causal Estimation of Memorisation Profiles [58.20086589761273]
Understanding memorisation in language models has practical and societal implications.
Memorisation is the causal effect of training with an instance on the model's ability to predict that instance.
This paper proposes a new, principled, and efficient method to estimate memorisation based on the difference-in-differences design from econometrics.
arXiv Detail & Related papers (2024-06-06T17:59:09Z) - Faithful Model Evaluation for Model-Based Metrics [22.753929098534403]
We establish the mathematical foundation of significance testing for model-based metrics.
We show that considering metric model errors to calculate sample variances for model-based metrics changes the conclusions in certain experiments.
arXiv Detail & Related papers (2023-12-19T19:41:33Z) - Stubborn Lexical Bias in Data and Models [50.79738900885665]
We use a new statistical method to examine whether spurious patterns in data appear in models trained on the data.
We apply an optimization approach to *reweight* the training data, reducing thousands of spurious correlations.
Surprisingly, though this method can successfully reduce lexical biases in the training data, we still find strong evidence of corresponding bias in the trained models.
arXiv Detail & Related papers (2023-06-03T20:12:27Z) - Incorporating Heterogeneous User Behaviors and Social Influences for
Predictive Analysis [32.31161268928372]
We aim to incorporate heterogeneous user behaviors and social influences for behavior predictions.
This paper proposes a variant of Long-Short Term Memory (LSTM) which can consider context while a behavior sequence.
A residual learning-based decoder is designed to automatically construct multiple high-order cross features based on social behavior representation.
arXiv Detail & Related papers (2022-07-24T17:05:37Z) - A comprehensive comparative evaluation and analysis of Distributional
Semantic Models [61.41800660636555]
We perform a comprehensive evaluation of type distributional vectors, either produced by static DSMs or obtained by averaging the contextualized vectors generated by BERT.
The results show that the alleged superiority of predict based models is more apparent than real, and surely not ubiquitous.
We borrow from cognitive neuroscience the methodology of Representational Similarity Analysis (RSA) to inspect the semantic spaces generated by distributional models.
arXiv Detail & Related papers (2021-05-20T15:18:06Z) - Comparing merging behaviors observed in naturalistic data with behaviors
generated by a machine learned model [4.879725885276143]
We study highway driving as an example scenario, and introduce metrics to quantitatively demonstrate the presence of two familiar behavioral phenomena.
Applying the exact same metrics to the output of a state-of-the-art machine-learned model, we show that the model is capable of reproducing the former phenomenon, but not the latter.
arXiv Detail & Related papers (2021-04-21T12:31:29Z) - Goal-directed Generation of Discrete Structures with Conditional
Generative Models [85.51463588099556]
We introduce a novel approach to directly optimize a reinforcement learning objective, maximizing an expected reward.
We test our methodology on two tasks: generating molecules with user-defined properties and identifying short python expressions which evaluate to a given target value.
arXiv Detail & Related papers (2020-10-05T20:03:13Z) - Evaluating the Disentanglement of Deep Generative Models through
Manifold Topology [66.06153115971732]
We present a method for quantifying disentanglement that only uses the generative model.
We empirically evaluate several state-of-the-art models across multiple datasets.
arXiv Detail & Related papers (2020-06-05T20:54:11Z) - Performance metrics for intervention-triggering prediction models do not
reflect an expected reduction in outcomes from using the model [71.9860741092209]
Clinical researchers often select among and evaluate risk prediction models.
Standard metrics calculated from retrospective data are only related to model utility under certain assumptions.
When predictions are delivered repeatedly throughout time, the relationship between standard metrics and utility is further complicated.
arXiv Detail & Related papers (2020-06-02T16:26:49Z) - Symbolic Regression Driven by Training Data and Prior Knowledge [0.0]
In symbolic regression, the search for analytic models is driven purely by the prediction error observed on the training data samples.
We propose a multi-objective symbolic regression approach that is driven by both the training data and the prior knowledge of the properties the desired model should manifest.
arXiv Detail & Related papers (2020-04-24T19:15:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.