Dynamic Knowledge Injection for AIXI Agents
- URL: http://arxiv.org/abs/2312.16184v1
- Date: Mon, 18 Dec 2023 13:34:17 GMT
- Title: Dynamic Knowledge Injection for AIXI Agents
- Authors: Samuel Yang-Zhao, Kee Siong Ng, and Marcus Hutter
- Abstract summary: We introduce a new agent called DynamicHedgeAIXI that maintains an exact Bayesian mixture over dynamically changing sets of models.
Experimental results on epidemic control on contact networks validates the agent's practical utility.
- Score: 17.4429135205363
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Prior approximations of AIXI, a Bayesian optimality notion for general
reinforcement learning, can only approximate AIXI's Bayesian environment model
using an a-priori defined set of models. This is a fundamental source of
epistemic uncertainty for the agent in settings where the existence of
systematic bias in the predefined model class cannot be resolved by simply
collecting more data from the environment. We address this issue in the context
of Human-AI teaming by considering a setup where additional knowledge for the
agent in the form of new candidate models arrives from a human operator in an
online fashion. We introduce a new agent called DynamicHedgeAIXI that maintains
an exact Bayesian mixture over dynamically changing sets of models via a
time-adaptive prior constructed from a variant of the Hedge algorithm. The
DynamicHedgeAIXI agent is the richest direct approximation of AIXI known to
date and comes with good performance guarantees. Experimental results on
epidemic control on contact networks validates the agent's practical utility.
Related papers
- Two-Timescale Model Caching and Resource Allocation for Edge-Enabled AI-Generated Content Services [55.0337199834612]
Generative AI (GenAI) has emerged as a transformative technology, enabling customized and personalized AI-generated content (AIGC) services.
These services require executing GenAI models with billions of parameters, posing significant obstacles to resource-limited wireless edge.
We introduce the formulation of joint model caching and resource allocation for AIGC services to balance a trade-off between AIGC quality and latency metrics.
arXiv Detail & Related papers (2024-11-03T07:01:13Z) - Explainable AI for Enhancing Efficiency of DL-based Channel Estimation [1.0136215038345013]
Support of artificial intelligence based decision-making is a key element in future 6G networks.
In such applications, using AI as black-box models is risky and challenging.
We propose a novel-based XAI-CHEST framework that is oriented toward channel estimation in wireless communications.
arXiv Detail & Related papers (2024-07-09T16:24:21Z) - Predicting AI Agent Behavior through Approximation of the Perron-Frobenius Operator [4.076790923976287]
We treat AI agents as nonlinear dynamical systems and adopt a probabilistic perspective to predict their statistical behavior.
We formulate the approximation of the Perron-Frobenius (PF) operator as an entropy minimization problem.
Our data-driven methodology simultaneously approximates the PF operator to perform prediction of the evolution of the agents and also predicts the terminal probability density of AI agents.
arXiv Detail & Related papers (2024-06-04T19:06:49Z) - Exploratory Preference Optimization: Harnessing Implicit Q*-Approximation for Sample-Efficient RLHF [82.7679132059169]
Reinforcement learning from human feedback has emerged as a central tool for language model alignment.
We propose a new algorithm for online exploration in RLHF, Exploratory Preference Optimization (XPO)
XPO enjoys the strongest known provable guarantees and promising empirical performance.
arXiv Detail & Related papers (2024-05-31T17:39:06Z) - Deep autoregressive density nets vs neural ensembles for model-based
offline reinforcement learning [2.9158689853305693]
We consider a model-based reinforcement learning algorithm that infers the system dynamics from the available data and performs policy optimization on imaginary model rollouts.
This approach is vulnerable to exploiting model errors which can lead to catastrophic failures on the real system.
We show that better performance can be obtained with a single well-calibrated autoregressive model on the D4RL benchmark.
arXiv Detail & Related papers (2024-02-05T10:18:15Z) - STORM: Efficient Stochastic Transformer based World Models for
Reinforcement Learning [82.03481509373037]
Recently, model-based reinforcement learning algorithms have demonstrated remarkable efficacy in visual input environments.
We introduce Transformer-based wORld Model (STORM), an efficient world model architecture that combines strong modeling and generation capabilities.
Storm achieves a mean human performance of $126.7%$ on the Atari $100$k benchmark, setting a new record among state-of-the-art methods.
arXiv Detail & Related papers (2023-10-14T16:42:02Z) - Training dynamic models using early exits for automatic speech
recognition on resource-constrained devices [15.879328412777008]
Early-exit architectures enable the development of dynamic models capable of adapting their size and architecture to varying levels of computational resources and ASR performance demands.
We show that early-exit models trained from scratch not only preserve performance when using fewer encoder layers but also exhibit enhanced task accuracy compared to single-exit or pre-trained models.
Results provide insights into the training dynamics of early-exit architectures for ASR models.
arXiv Detail & Related papers (2023-09-18T07:45:16Z) - Differential Assessment of Black-Box AI Agents [29.98710357871698]
We propose a novel approach to differentially assess black-box AI agents that have drifted from their previously known models.
We leverage sparse observations of the drifted agent's current behavior and knowledge of its initial model to generate an active querying policy.
Empirical evaluation shows that our approach is much more efficient than re-learning the agent model from scratch.
arXiv Detail & Related papers (2022-03-24T17:48:58Z) - Unified Instance and Knowledge Alignment Pretraining for Aspect-based
Sentiment Analysis [96.53859361560505]
Aspect-based Sentiment Analysis (ABSA) aims to determine the sentiment polarity towards an aspect.
There always exists severe domain shift between the pretraining and downstream ABSA datasets.
We introduce a unified alignment pretraining framework into the vanilla pretrain-finetune pipeline.
arXiv Detail & Related papers (2021-10-26T04:03:45Z) - Identification of Probability weighted ARX models with arbitrary domains [75.91002178647165]
PieceWise Affine models guarantees universal approximation, local linearity and equivalence to other classes of hybrid system.
In this work, we focus on the identification of PieceWise Auto Regressive with eXogenous input models with arbitrary regions (NPWARX)
The architecture is conceived following the Mixture of Expert concept, developed within the machine learning field.
arXiv Detail & Related papers (2020-09-29T12:50:33Z) - Model-based Reinforcement Learning for Decentralized Multiagent
Rendezvous [66.6895109554163]
Underlying the human ability to align goals with other agents is their ability to predict the intentions of others and actively update their own plans.
We propose hierarchical predictive planning (HPP), a model-based reinforcement learning method for decentralized multiagent rendezvous.
arXiv Detail & Related papers (2020-03-15T19:49:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.