Towards QD-suite: developing a set of benchmarks for Quality-Diversity
algorithms
- URL: http://arxiv.org/abs/2205.03207v1
- Date: Fri, 6 May 2022 13:33:50 GMT
- Title: Towards QD-suite: developing a set of benchmarks for Quality-Diversity
algorithms
- Authors: Achkan Salehi and Stephane Doncieux
- Abstract summary: Existing benchmarks are not standardized, and there is currently no MNIST equivalent for Quality-Diversity (QD)
We argue that the identification of challenges faced by QD methods and the development of targeted, challenging, scalable benchmarks is an important step.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While the field of Quality-Diversity (QD) has grown into a distinct branch of
stochastic optimization, a few problems, in particular locomotion and
navigation tasks, have become de facto standards. Are such benchmarks
sufficient? Are they representative of the key challenges faced by QD
algorithms? Do they provide the ability to focus on one particular challenge by
properly disentangling it from others? Do they have much predictive power in
terms of scalability and generalization? Existing benchmarks are not
standardized, and there is currently no MNIST equivalent for QD. Inspired by
recent works on Reinforcement Learning benchmarks, we argue that the
identification of challenges faced by QD methods and the development of
targeted, challenging, scalable but affordable benchmarks is an important step.
As an initial effort, we identify three problems that are challenging in sparse
reward settings, and propose associated benchmarks: (1) Behavior metric bias,
which can result from the use of metrics that do not match the structure of the
behavior space. (2) Behavioral Plateaus, with varying characteristics, such
that escaping them would require adaptive QD algorithms and (3) Evolvability
Traps, where small variations in genotype result in large behavioral changes.
The environments that we propose satisfy the properties listed above.
Related papers
- Understanding, Predicting and Better Resolving Q-Value Divergence in
Offline-RL [86.0987896274354]
We first identify a fundamental pattern, self-excitation, as the primary cause of Q-value estimation divergence in offline RL.
We then propose a novel Self-Excite Eigenvalue Measure (SEEM) metric to measure the evolving property of Q-network at training.
For the first time, our theory can reliably decide whether the training will diverge at an early stage.
arXiv Detail & Related papers (2023-10-06T17:57:44Z) - Quality Diversity under Sparse Reward and Sparse Interaction:
Application to Grasping in Robotics [0.0]
Quality-Diversity (QD) methods are algorithms that aim to generate a set of diverse and high-performing solutions to a given problem.
The present work studies how QD can address grasping in robotics.
Experiments have been conducted on 15 different methods on 10 grasping domains, corresponding to 2 different robot-gripper setups and 5 standard objects.
arXiv Detail & Related papers (2023-08-10T10:19:48Z) - On Pitfalls of Test-Time Adaptation [82.8392232222119]
Test-Time Adaptation (TTA) has emerged as a promising approach for tackling the robustness challenge under distribution shifts.
We present TTAB, a test-time adaptation benchmark that encompasses ten state-of-the-art algorithms, a diverse array of distribution shifts, and two evaluation protocols.
arXiv Detail & Related papers (2023-06-06T09:35:29Z) - Benchmark tasks for Quality-Diversity applied to Uncertain domains [1.5469452301122175]
We introduce a set of 8 easy-to-implement and lightweight tasks, split into 3 main categories.
We identify the key uncertainty properties to easily define UQD benchmark tasks.
All our tasks build on the Redundant Arm: a common QD environment that is lightweight and easily replicable.
arXiv Detail & Related papers (2023-04-24T21:23:26Z) - Exposing and Addressing Cross-Task Inconsistency in Unified
Vision-Language Models [80.23791222509644]
Inconsistent AI models are considered brittle and untrustworthy by human users.
We find that state-of-the-art vision-language models suffer from a surprisingly high degree of inconsistent behavior across tasks.
We propose a rank correlation-based auxiliary training objective, computed over large automatically created cross-task contrast sets.
arXiv Detail & Related papers (2023-03-28T16:57:12Z) - Benchmarking Quality-Diversity Algorithms on Neuroevolution for
Reinforcement Learning [3.6350564275444173]
We present a Quality-Diversity benchmark suite for Deep Neuroevolution in Reinforcement Learning domains for robot control.
The benchmark uses standard Quality-Diversity metrics, including coverage, QD-score, maximum fitness, and an archive profile metric.
arXiv Detail & Related papers (2022-11-04T00:14:42Z) - A Survey of Parameters Associated with the Quality of Benchmarks in NLP [24.6240575061124]
Recent studies have shown that models triumph over several popular benchmarks just by overfitting on spurious biases, without truly learning the desired task.
A potential solution to these issues -- a metric quantifying quality -- remains underexplored.
We take the first step towards a metric by identifying certain language properties that can represent various possible interactions leading to biases in a benchmark.
arXiv Detail & Related papers (2022-10-14T06:44:14Z) - Learning to Walk Autonomously via Reset-Free Quality-Diversity [73.08073762433376]
Quality-Diversity algorithms can discover large and complex behavioural repertoires consisting of both diverse and high-performing skills.
Existing QD algorithms need large numbers of evaluations as well as episodic resets, which require manual human supervision and interventions.
This paper proposes Reset-Free Quality-Diversity optimization (RF-QD) as a step towards autonomous learning for robotics in open-ended environments.
arXiv Detail & Related papers (2022-04-07T14:07:51Z) - Modified Double DQN: addressing stability [0.2867517731896504]
The Double-DQN (DDQN) algorithm was originally proposed to address the overestimation issue in the original DQN algorithm.
Three modifications to the DDQN algorithm are proposed with the hope of maintaining the performance in the terms of both stability and overestimation.
arXiv Detail & Related papers (2021-08-09T15:27:22Z) - Contrast and Classify: Training Robust VQA Models [60.80627814762071]
We propose a novel training paradigm (ConClaT) that optimize both cross-entropy and contrastive losses.
We find that optimizing both losses -- either alternately or jointly -- is key to effective training.
arXiv Detail & Related papers (2020-10-13T00:23:59Z) - MetaIQA: Deep Meta-learning for No-Reference Image Quality Assessment [73.55944459902041]
This paper presents a no-reference IQA metric based on deep meta-learning.
We first collect a number of NR-IQA tasks for different distortions.
Then meta-learning is adopted to learn the prior knowledge shared by diversified distortions.
Extensive experiments demonstrate that the proposed metric outperforms the state-of-the-arts by a large margin.
arXiv Detail & Related papers (2020-04-11T23:36:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.