Related papers: muPRL: A Mutation Testing Pipeline for Deep Reinforcement Learning based on Real Faults

muPRL: A Mutation Testing Pipeline for Deep Reinforcement Learning based on Real Faults

URL: http://arxiv.org/abs/2408.15150v1
Date: Tue, 27 Aug 2024 15:45:13 GMT
Title: muPRL: A Mutation Testing Pipeline for Deep Reinforcement Learning based on Real Faults
Authors: Deepak-George Thomas, Matteo Biagiola, Nargiz Humbatova, Mohammad Wardat, Gunel Jahangirova, Hridesh Rajan, Paolo Tonella,
Abstract summary: We first describe a taxonomy of real RL faults obtained by repository mining. Then, we present the mutation operators derived from such real faults and implemented in the tool muPRL. We discuss the experimental results, showing that muPRL is effective at discriminating strong from weak test generators.
Score: 19.32186653723838
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Reinforcement Learning (RL) is increasingly adopted to train agents that can deal with complex sequential tasks, such as driving an autonomous vehicle or controlling a humanoid robot. Correspondingly, novel approaches are needed to ensure that RL agents have been tested adequately before going to production. Among them, mutation testing is quite promising, especially under the assumption that the injected faults (mutations) mimic the real ones. In this paper, we first describe a taxonomy of real RL faults obtained by repository mining. Then, we present the mutation operators derived from such real faults and implemented in the tool muPRL. Finally, we discuss the experimental results, showing that muPRL is effective at discriminating strong from weak test generators, hence providing useful feedback to developers about the adequacy of the generated test scenarios.

Related papers

SEAL: Steerable Reasoning Calibration of Large Language Models for Free [58.190800043449336]
Large Language Models (LLMs) have demonstrated compelling capabilities for complex reasoning tasks via the extended chain-of-thought (CoT) reasoning mechanism. Recent studies reveal substantial redundancy in the CoT reasoning traces, which negatively impacts model performance. We introduce SEAL, a training-free approach that seamlessly calibrates the CoT process, improving accuracy while demonstrating significant efficiency gains.
arXiv Detail & Related papers (2025-04-07T02:42:07Z)
GTR: Guided Thought Reinforcement Prevents Thought Collapse in RL-based VLM Agent Training [62.536191233049614]
Reinforcement learning with verifiable outcome rewards (RLVR) has effectively scaled up chain-of-thought (CoT) reasoning in large language models (LLMs) This work investigates this problem through extensive experiments on complex card games, such as 24 points, and embodied tasks from ALFWorld. We find that when rewards are based solely on action outcomes, RL fails to incentivize CoT reasoning in VLMs, instead leading to a phenomenon we termed thought collapse.
arXiv Detail & Related papers (2025-03-11T15:17:02Z)
Investigating the Robustness of Counterfactual Learning to Rank Models: A Reproducibility Study [61.64685376882383]
Counterfactual learning to rank (CLTR) has attracted extensive attention in the IR community for its ability to leverage massive logged user interaction data to train ranking models. This paper investigates the robustness of existing CLTR models in complex and diverse situations. We find that the DLA models and IPS-DCM show better robustness under various simulation settings than IPS-PBM and PRS with offline propensity estimation.
arXiv Detail & Related papers (2024-04-04T10:54:38Z)
Supervised Pretraining Can Learn In-Context Reinforcement Learning [96.62869749926415]
In this paper, we study the in-context learning capabilities of transformers in decision-making problems. We introduce and study Decision-Pretrained Transformer (DPT), a supervised pretraining method where the transformer predicts an optimal action. We find that the pretrained transformer can be used to solve a range of RL problems in-context, exhibiting both exploration online and conservatism offline.
arXiv Detail & Related papers (2023-06-26T17:58:50Z)
Testing of Deep Reinforcement Learning Agents with Surrogate Models [10.243488468625786]
Deep Reinforcement Learning (DRL) has received a lot of attention from the research community in recent years. In this paper, we propose a search-based approach to test such agents.
arXiv Detail & Related papers (2023-05-22T06:21:39Z)
Mutation Testing of Deep Reinforcement Learning Based on Real Faults [11.584571002297217]
This paper builds on the existing approach of Mutation Testing (MT) to extend it to Reinforcement Learning (RL) systems. We show that the design choice of the mutation killing definition can affect whether or not a mutation is killed as well as the generated test cases.
arXiv Detail & Related papers (2023-01-13T16:45:56Z)
On Transforming Reinforcement Learning by Transformer: The Development Trajectory [97.79247023389445]
Transformer, originally devised for natural language processing, has also attested significant success in computer vision. We group existing developments in two categories: architecture enhancement and trajectory optimization. We examine the main applications of TRL in robotic manipulation, text-based games, navigation and autonomous driving.
arXiv Detail & Related papers (2022-12-29T03:15:59Z)
A Search-Based Testing Approach for Deep Reinforcement Learning Agents [1.1580916951856255]
We propose a Search-based Testing Approach of Reinforcement Learning Agents (STARLA) to test the policy of a DRL agent. We use machine learning models and a dedicated genetic algorithm to narrow the search towards faulty episodes.
arXiv Detail & Related papers (2022-06-15T20:51:33Z)
GalilAI: Out-of-Task Distribution Detection using Causal Active Experimentation for Safe Transfer RL [11.058960131490903]
Out-of-distribution (OOD) detection is a well-studied topic in supervised learning. We propose a novel task: that of Out-of-Task Distribution (OOTD) detection. We name our method GalilAI, in honor of Galileo Galilei, as it discovers, among other causal processes, that gravitational acceleration is independent of the mass of a body.
arXiv Detail & Related papers (2021-10-29T01:45:56Z)
DeepMetis: Augmenting a Deep Learning Test Set to Increase its Mutation Score [4.444652484439581]
tool is effective at augmenting the given test set, increasing its capability to detect mutants by 63% on average. A leave-one-out experiment shows that the augmented test set is capable of exposing unseen mutants.
arXiv Detail & Related papers (2021-09-15T18:20:50Z)
What is Going on Inside Recurrent Meta Reinforcement Learning Agents? [63.58053355357644]
Recurrent meta reinforcement learning (meta-RL) agents are agents that employ a recurrent neural network (RNN) for the purpose of "learning a learning algorithm" We shed light on the internal working mechanisms of these agents by reformulating the meta-RL problem using the Partially Observable Markov Decision Process (POMDP) framework.
arXiv Detail & Related papers (2021-04-29T20:34:39Z)
Combining Pessimism with Optimism for Robust and Efficient Model-Based Deep Reinforcement Learning [56.17667147101263]
In real-world tasks, reinforcement learning agents encounter situations that are not present during training time. To ensure reliable performance, the RL agents need to exhibit robustness against worst-case situations. We propose the Robust Hallucinated Upper-Confidence RL (RH-UCRL) algorithm to provably solve this problem.
arXiv Detail & Related papers (2021-03-18T16:50:17Z)
Empirically Verifying Hypotheses Using Reinforcement Learning [58.09414653169534]
This paper formulates hypothesis verification as an RL problem. We aim to build an agent that, given a hypothesis about the dynamics of the world, can take actions to generate observations which can help predict whether the hypothesis is true or false.
arXiv Detail & Related papers (2020-06-29T01:01:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.