Social-Implicit: Rethinking Trajectory Prediction Evaluation and The
Effectiveness of Implicit Maximum Likelihood Estimation
- URL: http://arxiv.org/abs/2203.03057v1
- Date: Sun, 6 Mar 2022 21:28:40 GMT
- Title: Social-Implicit: Rethinking Trajectory Prediction Evaluation and The
Effectiveness of Implicit Maximum Likelihood Estimation
- Authors: Abduallah Mohamed, Deyao Zhu, Warren Vu, Mohamed Elhoseiny, Christian
Claudel
- Abstract summary: Average Mahalanobis Distance (AMD) is a metric that quantifies how close the whole generated samples are to the ground truth.
Average Maximum Eigenvalue (AMV) is a metric that quantifies the overall spread of the predictions.
We introduce the usage of Implicit Maximum Likelihood Estimation (IMLE) as a replacement for traditional generative models to train our model, Social-Implicit.
- Score: 21.643073517681973
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Best-of-N (BoN) Average Displacement Error (ADE)/ Final Displacement Error
(FDE) is the most used metric for evaluating trajectory prediction models. Yet,
the BoN does not quantify the whole generated samples, resulting in an
incomplete view of the model's prediction quality and performance. We propose a
new metric, Average Mahalanobis Distance (AMD) to tackle this issue. AMD is a
metric that quantifies how close the whole generated samples are to the ground
truth. We also introduce the Average Maximum Eigenvalue (AMV) metric that
quantifies the overall spread of the predictions. Our metrics are validated
empirically by showing that the ADE/FDE is not sensitive to distribution
shifts, giving a biased sense of accuracy, unlike the AMD/AMV metrics. We
introduce the usage of Implicit Maximum Likelihood Estimation (IMLE) as a
replacement for traditional generative models to train our model,
Social-Implicit. IMLE training mechanism aligns with AMD/AMV objective of
predicting trajectories that are close to the ground truth with a tight spread.
Social-Implicit is a memory efficient deep model with only 5.8K parameters that
runs in real time of about 580Hz and achieves competitive results. Interactive
demo of the problem can be seen here
\url{https://www.abduallahmohamed.com/social-implicit-amdamv-adefde-demo}. Code
is available at \url{https://github.com/abduallahmohamed/Social-Implicit}.
Related papers
- Joint Metrics Matter: A Better Standard for Trajectory Forecasting [67.1375677218281]
Multi-modal trajectory forecasting methods evaluate using single-agent metrics (marginal metrics)
Only focusing on marginal metrics can lead to unnatural predictions, such as colliding trajectories or diverging trajectories for people who are clearly walking together as a group.
We present the first comprehensive evaluation of state-of-the-art trajectory forecasting methods with respect to multi-agent metrics (joint metrics): JADE, JFDE, and collision rate.
arXiv Detail & Related papers (2023-05-10T16:27:55Z) - Maintaining Stability and Plasticity for Predictive Churn Reduction [8.971668467496055]
We propose a solution called Accumulated Model Combination (AMC)
AMC is a general technique and we propose several instances of it, each having their own advantages depending on the model and data properties.
arXiv Detail & Related papers (2023-05-06T20:56:20Z) - nanoLM: an Affordable LLM Pre-training Benchmark via Accurate Loss Prediction across Scales [65.01417261415833]
We present an approach to predict the pre-training loss based on our observations that Maximal Update Parametrization (muP) enables accurate fitting of scaling laws.
With around 14% of the one-time pre-training cost, we can accurately forecast the loss for models up to 52B.
Our goal with nanoLM is to empower researchers with limited resources to reach meaningful conclusions on large models.
arXiv Detail & Related papers (2023-04-14T00:45:01Z) - Exploring validation metrics for offline model-based optimisation with
diffusion models [50.404829846182764]
In model-based optimisation (MBO) we are interested in using machine learning to design candidates that maximise some measure of reward with respect to a black box function called the (ground truth) oracle.
While an approximation to the ground oracle can be trained and used in place of it during model validation to measure the mean reward over generated candidates, the evaluation is approximate and vulnerable to adversarial examples.
This is encapsulated under our proposed evaluation framework which is also designed to measure extrapolation.
arXiv Detail & Related papers (2022-11-19T16:57:37Z) - What Can Secondary Predictions Tell Us? An Exploration on
Question-Answering with SQuAD-v2.0 [0.0]
We define the Golden Rank (GR) of an example as the rank of its most confident prediction that exactly matches a ground truth.
For the 16 transformer models we analyzed, the majority of exactly matched golden answers in secondary prediction space hover very close to the top rank.
We derive a new aggregate statistic over entire test sets, named the Golden Rank Interpolated Median (GRIM) that quantifies the proximity of failed predictions to the top choice made by the model.
arXiv Detail & Related papers (2022-06-29T01:17:47Z) - On the Optimization Landscape of Maximum Mean Discrepancy [26.661542645011046]
Generative models have been successfully used for generating realistic signals.
Because the likelihood function is typically intractable in most of these models, the common practice is to "implicit" that avoid likelihood calculation.
In particular, it is not understood when they can minimize their non-repancy objectives globally.
arXiv Detail & Related papers (2021-10-26T07:32:37Z) - Learning to Estimate Without Bias [57.82628598276623]
Gauss theorem states that the weighted least squares estimator is a linear minimum variance unbiased estimation (MVUE) in linear models.
In this paper, we take a first step towards extending this result to non linear settings via deep learning with bias constraints.
A second motivation to BCE is in applications where multiple estimates of the same unknown are averaged for improved performance.
arXiv Detail & Related papers (2021-10-24T10:23:51Z) - Newer is not always better: Rethinking transferability metrics, their
peculiarities, stability and performance [5.650647159993238]
Fine-tuning of large pre-trained image and language models on small customized datasets has become increasingly popular.
We show that the statistical problems with covariance estimation drive the poor performance of H-score.
We propose a correction and recommend measuring correlation performance against relative accuracy in such settings.
arXiv Detail & Related papers (2021-10-13T17:24:12Z) - Mismatched No More: Joint Model-Policy Optimization for Model-Based RL [172.37829823752364]
We propose a single objective for jointly training the model and the policy, such that updates to either component increases a lower bound on expected return.
Our objective is a global lower bound on expected return, and this bound becomes tight under certain assumptions.
The resulting algorithm (MnM) is conceptually similar to a GAN.
arXiv Detail & Related papers (2021-10-06T13:43:27Z) - A Novel Regression Loss for Non-Parametric Uncertainty Optimization [7.766663822644739]
Quantification of uncertainty is one of the most promising approaches to establish safe machine learning.
One of the most commonly used approaches so far is Monte Carlo dropout, which is computationally cheap and easy to apply in practice.
We propose a new objective, referred to as second-moment loss ( UCI), to address this issue.
arXiv Detail & Related papers (2021-01-07T19:12:06Z) - Meta-Learned Confidence for Few-shot Learning [60.6086305523402]
A popular transductive inference technique for few-shot metric-based approaches, is to update the prototype of each class with the mean of the most confident query examples.
We propose to meta-learn the confidence for each query sample, to assign optimal weights to unlabeled queries.
We validate our few-shot learning model with meta-learned confidence on four benchmark datasets.
arXiv Detail & Related papers (2020-02-27T10:22:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.