A Performance-Explainability Framework to Benchmark Machine Learning
Methods: Application to Multivariate Time Series Classifiers
- URL: http://arxiv.org/abs/2005.14501v6
- Date: Fri, 19 Nov 2021 15:31:06 GMT
- Title: A Performance-Explainability Framework to Benchmark Machine Learning
Methods: Application to Multivariate Time Series Classifiers
- Authors: Kevin Fauvel, V\'eronique Masson, \'Elisa Fromont
- Abstract summary: We propose a new performance-explainability analytical framework to assess and benchmark machine learning methods.
The framework details a set of characteristics that systematize the performance-explainability assessment of existing machine learning methods.
- Score: 1.0015478733418846
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Our research aims to propose a new performance-explainability analytical
framework to assess and benchmark machine learning methods. The framework
details a set of characteristics that systematize the
performance-explainability assessment of existing machine learning methods. In
order to illustrate the use of the framework, we apply it to benchmark the
current state-of-the-art multivariate time series classifiers.
Related papers
- Improving the Evaluation and Actionability of Explanation Methods for Multivariate Time Series Classification [4.588028371034407]
We focus on analyzing InterpretTime, a recent evaluation methodology for attribution methods applied to MTSC.
We showcase some significant weaknesses of the original methodology and propose ideas to improve its accuracy and efficiency.
We find that perturbation-based methods such as SHAP and Feature Ablation work well across a set of datasets.
arXiv Detail & Related papers (2024-06-18T11:18:46Z) - Supervised Time Series Classification for Anomaly Detection in Subsea
Engineering [0.0]
We investigate the use of supervised machine learning classification algorithms on simulated data based on a physical system with two states: Intact and Broken.
We provide a comprehensive discussion of the preprocessing of temporal data, using measures of statistical dispersion and dimension reduction techniques.
We conclude with a comparison of the various methods based on different performance metrics, showing the advantage of using machine learning techniques as a tool in decision making.
arXiv Detail & Related papers (2024-03-12T18:25:10Z) - Shapelet-based Model-agnostic Counterfactual Local Explanations for Time
Series Classification [5.866975269666861]
We propose a model-agnostic instance-based post-hoc explainability method for time series classification.
The proposed algorithm, namely Time-CF, leverages shapelets and TimeGAN to provide counterfactual explanations for arbitrary time series classifiers.
arXiv Detail & Related papers (2024-02-02T11:57:53Z) - Matched Machine Learning: A Generalized Framework for Treatment Effect
Inference With Learned Metrics [87.05961347040237]
We introduce Matched Machine Learning, a framework that combines the flexibility of machine learning black boxes with the interpretability of matching.
Our framework uses machine learning to learn an optimal metric for matching units and estimating outcomes.
We show empirically that instances of Matched Machine Learning perform on par with black-box machine learning methods and better than existing matching methods for similar problems.
arXiv Detail & Related papers (2023-04-03T19:32:30Z) - Accounting for multiplicity in machine learning benchmark performance [0.0]
Using the highest-ranked performance as an estimate for state-of-the-art (SOTA) performance is a biased estimator, giving overly optimistic results.
In this article, we provide a probability distribution for the case of multiple classifiers so that known analyses methods can be engaged and a better SOTA estimate can be provided.
arXiv Detail & Related papers (2023-03-10T10:32:18Z) - MACE: An Efficient Model-Agnostic Framework for Counterfactual
Explanation [132.77005365032468]
We propose a novel framework of Model-Agnostic Counterfactual Explanation (MACE)
In our MACE approach, we propose a novel RL-based method for finding good counterfactual examples and a gradient-less descent method for improving proximity.
Experiments on public datasets validate the effectiveness with better validity, sparsity and proximity.
arXiv Detail & Related papers (2022-05-31T04:57:06Z) - An Empirical Investigation of Representation Learning for Imitation [76.48784376425911]
Recent work in vision, reinforcement learning, and NLP has shown that auxiliary representation learning objectives can reduce the need for large amounts of expensive, task-specific data.
We propose a modular framework for constructing representation learning algorithms, then use our framework to evaluate the utility of representation learning for imitation.
arXiv Detail & Related papers (2022-05-16T11:23:42Z) - Self-Attention Neural Bag-of-Features [103.70855797025689]
We build on the recently introduced 2D-Attention and reformulate the attention learning methodology.
We propose a joint feature-temporal attention mechanism that learns a joint 2D attention mask highlighting relevant information.
arXiv Detail & Related papers (2022-01-26T17:54:14Z) - Temporal Dependencies in Feature Importance for Time Series Predictions [4.082348823209183]
We propose WinIT, a framework for evaluating feature importance in time series prediction settings.
We demonstrate how the solution improves the appropriate attribution of features within time steps.
WinIT achieves 2.47x better performance than FIT and other feature importance methods on real-world clinical MIMIC-mortality task.
arXiv Detail & Related papers (2021-07-29T20:31:03Z) - The Benchmark Lottery [114.43978017484893]
"A benchmark lottery" describes the overall fragility of the machine learning benchmarking process.
We show that the relative performance of algorithms may be altered significantly simply by choosing different benchmark tasks.
arXiv Detail & Related papers (2021-07-14T21:08:30Z) - A Diagnostic Study of Explainability Techniques for Text Classification [52.879658637466605]
We develop a list of diagnostic properties for evaluating existing explainability techniques.
We compare the saliency scores assigned by the explainability techniques with human annotations of salient input regions to find relations between a model's performance and the agreement of its rationales with human ones.
arXiv Detail & Related papers (2020-09-25T12:01:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.