Related papers: Evaluating the Performance of Reinforcement Learning Algorithms

Evaluating the Performance of Reinforcement Learning Algorithms

URL: http://arxiv.org/abs/2006.16958v2
Date: Thu, 13 Aug 2020 16:05:28 GMT
Title: Evaluating the Performance of Reinforcement Learning Algorithms
Authors: Scott M. Jordan, Yash Chandak, Daniel Cohen, Mengxue Zhang, Philip S. Thomas
Abstract summary: Performance evaluations are critical for quantifying algorithmic advances in reinforcement learning. Recent analyses have shown that reported performance results are often inconsistent and difficult to replicate. We propose a new comprehensive evaluation methodology for reinforcement learning algorithms that produces reliable measurements of performance both on a single environment and when aggregated across environments.
Score: 30.075897642052126
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Performance evaluations are critical for quantifying algorithmic advances in reinforcement learning. Recent reproducibility analyses have shown that reported performance results are often inconsistent and difficult to replicate. In this work, we argue that the inconsistency of performance stems from the use of flawed evaluation metrics. Taking a step towards ensuring that reported results are consistent, we propose a new comprehensive evaluation methodology for reinforcement learning algorithms that produces reliable measurements of performance both on a single environment and when aggregated across environments. We demonstrate this method by evaluating a broad class of reinforcement learning algorithms on standard benchmark tasks.

Related papers

Improving the Validity and Practical Usefulness of AI/ML Evaluations Using an Estimands Framework [2.4861619769660637]
We propose an estimands framework adapted from international clinical trials guidelines. This framework provides a systematic structure for inference and reporting in evaluations. We demonstrate how the framework can help uncover underlying issues, their causes, and potential solutions.
arXiv Detail & Related papers (2024-06-14T18:47:37Z)
Are we making progress in unlearning? Findings from the first NeurIPS unlearning competition [70.60872754129832]
First NeurIPS competition on unlearning sought to stimulate the development of novel algorithms. Nearly 1,200 teams from across the world participated. We analyze top solutions and delve into discussions on benchmarking unlearning.
arXiv Detail & Related papers (2024-06-13T12:58:00Z)
Towards Reliable Empirical Machine Unlearning Evaluation: A Cryptographic Game Perspective [5.724350004671127]
Machine unlearning updates machine learning models to remove information from specific training samples, complying with data protection regulations. Despite the recent development of numerous unlearning algorithms, reliable evaluation of these algorithms remains an open research question. This work presents a novel and reliable approach to empirically evaluating unlearning algorithms, paving the way for the development of more effective unlearning techniques.
arXiv Detail & Related papers (2024-04-17T17:20:27Z)
A Model-Based Approach for Improving Reinforcement Learning Efficiency Leveraging Expert Observations [9.240917262195046]
We propose an algorithm that automatically adjusts the weights of each component in the augmented loss function. Experiments on a variety of continuous control tasks demonstrate that the proposed algorithm outperforms various benchmarks.
arXiv Detail & Related papers (2024-02-29T03:53:02Z)
From Variability to Stability: Advancing RecSys Benchmarking Practices [3.3331198926331784]
This paper introduces a novel benchmarking methodology to facilitate a fair and robust comparison of RecSys algorithms. By utilizing a diverse set of $30$ open datasets, including two introduced in this work, we critically examine the influence of dataset characteristics on algorithm performance.
arXiv Detail & Related papers (2024-02-15T07:35:52Z)
Provable Representation with Efficient Planning for Partial Observable Reinforcement Learning [74.67655210734338]
In most real-world reinforcement learning applications, state information is only partially observable, which breaks the Markov decision process assumption. We develop a representation-based perspective that leads to a coherent framework and tractable algorithmic approach for practical reinforcement learning from partial observations. We empirically demonstrate the proposed algorithm can surpass state-of-the-art performance with partial observations across various benchmarks.
arXiv Detail & Related papers (2023-11-20T23:56:58Z)
Demystifying Unsupervised Semantic Correspondence Estimation [13.060538447838303]
We explore semantic correspondence estimation through the lens of unsupervised learning. We thoroughly evaluate several recently proposed unsupervised methods across multiple challenging datasets. We introduce a new unsupervised correspondence approach which utilizes the strength of pre-trained features while encouraging better matches during training.
arXiv Detail & Related papers (2022-07-11T17:59:51Z)
Towards Diverse Evaluation of Class Incremental Learning: A Representation Learning Perspective [67.45111837188685]
Class incremental learning (CIL) algorithms aim to continually learn new object classes from incrementally arriving data. We experimentally analyze neural network models trained by CIL algorithms using various evaluation protocols in representation learning.
arXiv Detail & Related papers (2022-06-16T11:44:11Z)
Improving Music Performance Assessment with Contrastive Learning [78.8942067357231]
This study investigates contrastive learning as a potential method to improve existing MPA systems. We introduce a weighted contrastive loss suitable for regression tasks applied to a convolutional neural network. Our results show that contrastive-based methods are able to match and exceed SoTA performance for MPA regression tasks.
arXiv Detail & Related papers (2021-08-03T19:24:25Z)
Performance Evaluation of Adversarial Attacks: Discrepancies and Solutions [51.8695223602729]
adversarial attack methods have been developed to challenge the robustness of machine learning models. We propose a Piece-wise Sampling Curving (PSC) toolkit to effectively address the discrepancy. PSC toolkit offers options for balancing the computational cost and evaluation effectiveness.
arXiv Detail & Related papers (2021-04-22T14:36:51Z)
Similarity of Classification Tasks [46.78404360210806]
We propose a generative approach to analyse task similarity to optimise and better understand the performance of meta-learning. We show that the proposed method can provide an insightful evaluation for meta-learning algorithms on two few-shot classification benchmarks.
arXiv Detail & Related papers (2021-01-27T04:37:34Z)
Cross Learning in Deep Q-Networks [82.20059754270302]
We propose a novel cross Q-learning algorithm, aim at alleviating the well-known overestimation problem in value-based reinforcement learning methods. Our algorithm builds on double Q-learning, by maintaining a set of parallel models and estimate the Q-value based on a randomly selected network.
arXiv Detail & Related papers (2020-09-29T04:58:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.