Related papers: Towards Futuristic Autonomous Experimentation--A Surprise-Reacting Sequential Experiment Policy

Towards Futuristic Autonomous Experimentation--A Surprise-Reacting Sequential Experiment Policy

URL: http://arxiv.org/abs/2112.00600v1
Date: Wed, 1 Dec 2021 16:14:49 GMT
Title: Towards Futuristic Autonomous Experimentation--A Surprise-Reacting Sequential Experiment Policy
Authors: Imtiaz Ahmed and Satish Bukkapatnam and Bhaskar Botcha and Yu Ding
Abstract summary: An autonomous experimentation platform in manufacturing is supposedly capable of conducting a sequential search for suitable manufacturing conditions for advanced materials. We argue that such capability is much needed for futuristic autonomous experimentation platforms.
Score: 3.326548149772318
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: An autonomous experimentation platform in manufacturing is supposedly capable of conducting a sequential search for finding suitable manufacturing conditions for advanced materials by itself or even for discovering new materials with minimal human intervention. The core of the intelligent control of such platforms is the policy directing sequential experiments, namely, to decide where to conduct the next experiment based on what has been done thus far. Such policy inevitably trades off exploitation versus exploration and the current practice is under the Bayesian optimization framework using the expected improvement criterion or its variants. We discuss whether it is beneficial to trade off exploitation versus exploration by measuring the element and degree of surprise associated with the immediate past observation. We devise a surprise-reacting policy using two existing surprise metrics, known as the Shannon surprise and Bayesian surprise. Our analysis shows that the surprise-reacting policy appears to be better suited for quickly characterizing the overall landscape of a response surface or a design place under resource constraints. We argue that such capability is much needed for futuristic autonomous experimentation platforms. We do not claim that we have a fully autonomous experimentation platform, but believe that our current effort sheds new lights or provides a different view angle as researchers are racing to elevate the autonomy of various primitive autonomous experimentation systems.

Related papers

Open-ended Scientific Discovery via Bayesian Surprise [63.26412847240136]
AutoDS is a method for open-ended scientific discovery that instead drives scientific exploration using Bayesian surprise.<n>We evaluate AutoDS in the setting of data-driven discovery across 21 real-world datasets spanning domains such as biology, economics, finance, and behavioral science.
arXiv Detail & Related papers (2025-06-30T22:53:59Z)
Confidence Adjusted Surprise Measure for Active Resourceful Trials (CA-SMART): A Data-driven Active Learning Framework for Accelerating Material Discovery under Resource Constraints [7.188573079798082]
A surrogate machine learning (ML) model mimics the scientific discovery process of a human scientist. The concept of surprise (capturing the divergence between expected and observed outcomes) has demonstrated significant potential to drive experimental trials. We propose the Confidence-Adjusted Surprise Measure for Active Resourceful Trials (CA-), a novel Bayesian active learning framework tailored for optimizing data-driven experimentation.
arXiv Detail & Related papers (2025-03-27T02:21:42Z)
Deterministic Exploration via Stationary Bellman Error Maximization [6.474106100512158]
Exploration is a crucial and distinctive aspect of reinforcement learning (RL) In this paper, we introduce three modifications to stabilize the latter and arrive at a deterministic exploration policy. Our experimental results show that our approach can outperform $varepsilon$-greedy in dense and sparse reward settings.
arXiv Detail & Related papers (2024-10-31T11:46:48Z)
Performative Prediction on Games and Mechanism Design [69.7933059664256]
We study a collective risk dilemma where agents decide whether to trust predictions based on past accuracy. As predictions shape collective outcomes, social welfare arises naturally as a metric of concern. We show how to achieve better trade-offs and use them for mechanism design.
arXiv Detail & Related papers (2024-08-09T16:03:44Z)
Dealing with uncertainty: balancing exploration and exploitation in deep recurrent reinforcement learning [0.0]
Incomplete knowledge of the environment leads an agent to make decisions under uncertainty. One of the major dilemmas in Reinforcement Learning (RL) where an autonomous agent has to balance two contrasting needs in making its decisions. We show that adaptive methods better approximate the trade-off between exploration and exploitation.
arXiv Detail & Related papers (2023-10-12T13:45:33Z)
Conformal Decision Theory: Safe Autonomous Decisions from Imperfect Predictions [80.34972679938483]
We introduce Conformal Decision Theory, a framework for producing safe autonomous decisions despite imperfect machine learning predictions. Decisions produced by our algorithms are safe in the sense that they come with provable statistical guarantees of having low risk. Experiments demonstrate the utility of our approach in robot motion planning around humans, automated stock trading, and robot manufacturing.
arXiv Detail & Related papers (2023-10-09T17:59:30Z)
What Should I Know? Using Meta-gradient Descent for Predictive Feature Discovery in a Single Stream of Experience [63.75363908696257]
computational reinforcement learning seeks to construct an agent's perception of the world through predictions of future sensations. An open challenge in this line of work is determining from the infinitely many predictions that the agent could possibly make which predictions might best support decision-making. We introduce a meta-gradient descent process by which an agent learns what predictions to make, 2) the estimates for its chosen predictions, and 3) how to use those estimates to generate policies that maximize future reward.
arXiv Detail & Related papers (2022-06-13T21:31:06Z)
SEREN: Knowing When to Explore and When to Exploit [14.188362393915432]
We introduce Sive Reinforcement Exploration Network (SEREN) that poses the exploration-exploitation trade-off as a game. Using a form of policies known as impulse control, switcher is able to determine the best set of states to switch to the exploration policy. We prove that SEREN converges quickly and induces a natural schedule towards pure exploitation.
arXiv Detail & Related papers (2022-05-30T12:44:56Z)
Sayer: Using Implicit Feedback to Optimize System Policies [63.992191765269396]
We develop a methodology that leverages implicit feedback to evaluate and train new system policies. Sayer builds on two ideas from reinforcement learning to leverage data collected by an existing policy. We show that Sayer can evaluate arbitrary policies accurately, and train new policies that outperform the production policies.
arXiv Detail & Related papers (2021-10-28T04:16:56Z)
Rethinking Exploration for Sample-Efficient Policy Learning [20.573107021603356]
We show that directed exploration methods have not been more influential in the sample efficient control problem. Three issues have limited the applicability of BBE: bias with finite samples, slow adaptation to decaying bonuses, and lack of optimism on unseen transitions. We propose modifications to the bonus-based exploration recipe to address each of these limitations. The resulting algorithm, which we call UFO, produces policies that are Unbiased with finite samples, Fast-adapting as the exploration bonus changes, and Optimistic with respect to new transitions.
arXiv Detail & Related papers (2021-01-23T08:51:04Z)
Temporal Difference Uncertainties as a Signal for Exploration [76.6341354269013]
An effective approach to exploration in reinforcement learning is to rely on an agent's uncertainty over the optimal policy. In this paper, we highlight that value estimates are easily biased and temporally inconsistent. We propose a novel method for estimating uncertainty over the value function that relies on inducing a distribution over temporal difference errors.
arXiv Detail & Related papers (2020-10-05T18:11:22Z)
Learning "What-if" Explanations for Sequential Decision-Making [92.8311073739295]
Building interpretable parameterizations of real-world decision-making on the basis of demonstrated behavior is essential. We propose learning explanations of expert decisions by modeling their reward function in terms of preferences with respect to "what if" outcomes. We highlight the effectiveness of our batch, counterfactual inverse reinforcement learning approach in recovering accurate and interpretable descriptions of behavior.
arXiv Detail & Related papers (2020-07-02T14:24:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.