Related papers: The Partially Observable History Process

The Partially Observable History Process

URL: http://arxiv.org/abs/2111.08102v1
Date: Mon, 15 Nov 2021 22:00:14 GMT
Title: The Partially Observable History Process
Authors: Dustin Morrill, Amy R. Greenwald, Michael Bowling
Abstract summary: We introduce the partially observable history process (POHP) formalism for reinforcement learning. POHP centers around actions and observations of a single agent and abstracts away the presence of other players. Our formalism provides a streamlined interface for designing algorithms that defy categorization as exclusively single or multi-agent.
Score: 17.08883385550155
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We introduce the partially observable history process (POHP) formalism for reinforcement learning. POHP centers around the actions and observations of a single agent and abstracts away the presence of other players without reducing them to stochastic processes. Our formalism provides a streamlined interface for designing algorithms that defy categorization as exclusively single or multi-agent, and for developing theory that applies across these domains. We show how the POHP formalism unifies traditional models including the Markov decision process, the Markov game, the extensive-form game, and their partially observable extensions, without introducing burdensome technical machinery or violating the philosophical underpinnings of reinforcement learning. We illustrate the utility of our formalism by concisely exploring observable sequential rationality, re-deriving the extensive-form regret minimization (EFR) algorithm, and examining EFR's theoretical properties in greater generality.

Related papers

Sparks of Explainability: Recent Advancements in Explaining Large Vision Models [6.1642231492615345]
This thesis explores advanced approaches to improve explainability in computer vision by analyzing and modeling the features exploited by deep neural networks. It evaluates attribution methods, notably saliency maps, by introducing a metric based on algorithmic stability and an approach utilizing Sobol indices. Two hypotheses are examined: aligning models with human reasoning and adopting a conceptual explainability approach.
arXiv Detail & Related papers (2025-02-03T04:49:32Z)
BRiTE: Bootstrapping Reinforced Thinking Process to Enhance Language Model Reasoning [78.63421517563056]
Large Language Models (LLMs) have demonstrated remarkable capabilities in complex reasoning tasks. We present a unified probabilistic framework that formalizes LLM reasoning through a novel graphical model. We introduce the Bootstrapping Reinforced Thinking Process (BRiTE) algorithm, which works in two steps.
arXiv Detail & Related papers (2025-01-31T02:39:07Z)
Axiomatic Causal Interventions for Reverse Engineering Relevance Computation in Neural Retrieval Models [20.29451537633895]
We propose the use of causal interventions to reverse engineer neural rankers. We demonstrate how mechanistic interpretability methods can be used to isolate components satisfying term-frequency axioms.
arXiv Detail & Related papers (2024-05-03T22:30:15Z)
Bridging State and History Representations: Understanding Self-Predictive RL [24.772140132462468]
Representations are at the core of all deep reinforcement learning (RL) methods for Markov decision processes (MDPs) and partially observable Markov decision processes (POMDPs) We show that many of these seemingly distinct methods and frameworks for state and history abstractions are, in fact, based on a common idea of self-predictive abstraction. We provide theoretical insights into the widely adopted objectives and optimization, such as the stop-gradient technique, in learning self-predictive representations.
arXiv Detail & Related papers (2024-01-17T00:47:43Z)
ODE-based Recurrent Model-free Reinforcement Learning for POMDPs [15.030970899252601]
We present a novel ODE-based recurrent model combines with model-free reinforcement learning framework to solve POMDPs. We experimentally demonstrate the efficacy of our methods across various PO continuous control and meta-RL tasks. Our experiments illustrate that our method is robust against irregular observations, owing to the ability of ODEs to model irregularly-sampled time series.
arXiv Detail & Related papers (2023-09-25T12:13:56Z)
Explainability for Large Language Models: A Survey [59.67574757137078]
Large language models (LLMs) have demonstrated impressive capabilities in natural language processing. This paper introduces a taxonomy of explainability techniques and provides a structured overview of methods for explaining Transformer-based language models.
arXiv Detail & Related papers (2023-09-02T22:14:26Z)
Understanding Masked Autoencoders via Hierarchical Latent Variable Models [109.35382136147349]
Masked autoencoder (MAE) has recently achieved prominent success in a variety of vision tasks. Despite the emergence of intriguing empirical observations on MAE, a theoretically principled understanding is still lacking.
arXiv Detail & Related papers (2023-06-08T03:00:10Z)
Toward Certified Robustness Against Real-World Distribution Shifts [65.66374339500025]
We train a generative model to learn perturbations from data and define specifications with respect to the output of the learned model. A unique challenge arising from this setting is that existing verifiers cannot tightly approximate sigmoid activations. We propose a general meta-algorithm for handling sigmoid activations which leverages classical notions of counter-example-guided abstraction refinement.
arXiv Detail & Related papers (2022-06-08T04:09:13Z)
A Free Lunch from the Noise: Provable and Practical Exploration for Representation Learning [55.048010996144036]
We show that under some noise assumption, we can obtain the linear spectral feature of its corresponding Markov transition operator in closed-form for free. We propose Spectral Dynamics Embedding (SPEDE), which breaks the trade-off and completes optimistic exploration for representation learning by exploiting the structure of the noise.
arXiv Detail & Related papers (2021-11-22T19:24:57Z)
This looks more like that: Enhancing Self-Explaining Models by Prototypical Relevance Propagation [17.485732906337507]
We present a case study of the self-explaining network, ProtoPNet, in the presence of a spectrum of artifacts. We introduce a novel method for generating more precise model-aware explanations. In order to obtain a clean dataset, we propose to use multi-view clustering strategies for segregating the artifact images.
arXiv Detail & Related papers (2021-08-27T09:55:53Z)
Why Generalization in RL is Difficult: Epistemic POMDPs and Implicit Partial Observability [92.95794652625496]
Generalization is a central challenge for the deployment of reinforcement learning systems. We show that generalization to unseen test conditions from a limited number of training conditions induces implicit partial observability. We recast the problem of generalization in RL as solving the induced partially observed Markov decision process.
arXiv Detail & Related papers (2021-07-13T17:59:25Z)
On Contrastive Representations of Stochastic Processes [53.21653429290478]
Learning representations of processes is an emerging problem in machine learning. We show that our methods are effective for learning representations of periodic functions, 3D objects and dynamical processes.
arXiv Detail & Related papers (2021-06-18T11:00:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.