Learning Algorithms for Intelligent Agents and Mechanisms
- URL: http://arxiv.org/abs/2210.02654v1
- Date: Thu, 6 Oct 2022 03:12:43 GMT
- Title: Learning Algorithms for Intelligent Agents and Mechanisms
- Authors: Jad Rahme
- Abstract summary: In this thesis, we research learning algorithms for optimal decision making in two different contexts, Reinforcement Learning in Part I and Auction Design in Part II.
In Chapter 2, inspired by statistical physics, we develop a novel approach to Reinforcement Learning (RL) that not only learns optimal policies with enhanced desirable properties but also sheds new light on maximum entropy RL.
In Chapter 3, we tackle the generalization problem in RL using a Bayesian perspective. We show that imperfect knowledge of the environments dynamics effectively turn a fully-observed Markov Decision Process (MDP) into a Partially Observed MDP (POMD
- Score: 4.251500966181852
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this thesis, we research learning algorithms for optimal decision making
in two different contexts, Reinforcement Learning in Part I and Auction Design
in Part II.
Reinforcement learning (RL) is an area of machine learning that is concerned
with how an agent should act in an environment in order to maximize its
cumulative reward over time. In Chapter 2, inspired by statistical physics, we
develop a novel approach to Reinforcement Learning (RL) that not only learns
optimal policies with enhanced desirable properties but also sheds new light on
maximum entropy RL. In Chapter 3, we tackle the generalization problem in RL
using a Bayesian perspective. We show that imperfect knowledge of the
environments dynamics effectively turn a fully-observed Markov Decision Process
(MDP) into a Partially Observed MDP (POMDP) that we call the Epistemic POMDP.
Informed by this observation, we develop a new policy learning algorithm LEEP
which has improved generalization properties.
Designing an incentive compatible, individually rational auction that
maximizes revenue is a challenging and intractable problem. Recently, deep
learning based approaches have been proposed to learn optimal auctions from
data. While successful, this approach suffers from a few limitations, including
sample inefficiency, lack of generalization to new auctions, and training
difficulties. In Chapter 4, we construct a symmetry preserving neural network
architecture, EquivariantNet, suitable for anonymous auctions. EquivariantNet
is not only more sample efficient but is also able to learn auction rules that
generalize well to other settings. In Chapter 5, we propose a novel formulation
of the auction learning problem as a two player game. The resulting learning
algorithm, ALGNet, is easier to train, more reliable and better suited for non
stationary settings.
Related papers
- Can Learned Optimization Make Reinforcement Learning Less Difficult? [70.5036361852812]
We consider whether learned optimization can help overcome reinforcement learning difficulties.
Our method, Learned Optimization for Plasticity, Exploration and Non-stationarity (OPEN), meta-learns an update rule whose input features and output structure are informed by previously proposed to these difficulties.
arXiv Detail & Related papers (2024-07-09T17:55:23Z) - How Can LLM Guide RL? A Value-Based Approach [68.55316627400683]
Reinforcement learning (RL) has become the de facto standard practice for sequential decision-making problems by improving future acting policies with feedback.
Recent developments in large language models (LLMs) have showcased impressive capabilities in language understanding and generation, yet they fall short in exploration and self-improvement capabilities.
We develop an algorithm named LINVIT that incorporates LLM guidance as a regularization factor in value-based RL, leading to significant reductions in the amount of data needed for learning.
arXiv Detail & Related papers (2024-02-25T20:07:13Z) - Efficient Reinforcement Learning via Decoupling Exploration and Utilization [6.305976803910899]
Reinforcement Learning (RL) has achieved remarkable success across multiple fields and applications, including gaming, robotics, and autonomous vehicles.
In this work, our aim is to train agent with efficient learning by decoupling exploration and utilization, so that agent can escaping the conundrum of suboptimal Solutions.
The above idea is implemented in the proposed OPARL (Optimistic and Pessimistic Actor Reinforcement Learning) algorithm.
arXiv Detail & Related papers (2023-12-26T09:03:23Z) - An Invitation to Deep Reinforcement Learning [24.807012576054504]
Training a deep neural network to maximize a target objective has become the standard recipe for successful machine learning.
Common objectives like intersection over union (IoU), bilingual evaluation understudy (BLEU) score or rewards cannot be optimized with supervised learning.
Reinforcement learning (RL) has emerged as a promising alternative for optimizing deep neural networks to maximize non-differentiable objectives.
arXiv Detail & Related papers (2023-12-13T18:57:23Z) - Representation Learning with Multi-Step Inverse Kinematics: An Efficient
and Optimal Approach to Rich-Observation RL [106.82295532402335]
Existing reinforcement learning algorithms suffer from computational intractability, strong statistical assumptions, and suboptimal sample complexity.
We provide the first computationally efficient algorithm that attains rate-optimal sample complexity with respect to the desired accuracy level.
Our algorithm, MusIK, combines systematic exploration with representation learning based on multi-step inverse kinematics.
arXiv Detail & Related papers (2023-04-12T14:51:47Z) - Inapplicable Actions Learning for Knowledge Transfer in Reinforcement
Learning [3.194414753332705]
We show that learning inapplicable actions greatly improves the sample efficiency of RL algorithms.
Thanks to the transferability of the knowledge acquired, it can be reused in other tasks and domains to make the learning process more efficient.
arXiv Detail & Related papers (2022-11-28T17:45:39Z) - Improved Algorithms for Neural Active Learning [74.89097665112621]
We improve the theoretical and empirical performance of neural-network(NN)-based active learning algorithms for the non-parametric streaming setting.
We introduce two regret metrics by minimizing the population loss that are more suitable in active learning than the one used in state-of-the-art (SOTA) related work.
arXiv Detail & Related papers (2022-10-02T05:03:38Z) - Contrastive UCB: Provably Efficient Contrastive Self-Supervised Learning in Online Reinforcement Learning [92.18524491615548]
Contrastive self-supervised learning has been successfully integrated into the practice of (deep) reinforcement learning (RL)
We study how RL can be empowered by contrastive learning in a class of Markov decision processes (MDPs) and Markov games (MGs) with low-rank transitions.
Under the online setting, we propose novel upper confidence bound (UCB)-type algorithms that incorporate such a contrastive loss with online RL algorithms for MDPs or MGs.
arXiv Detail & Related papers (2022-07-29T17:29:08Z) - REIN-2: Giving Birth to Prepared Reinforcement Learning Agents Using
Reinforcement Learning Agents [0.0]
In this paper, we introduce a meta-learning scheme that shifts the objective of learning to solve a task into the objective of learning to learn to solve a task (or a set of tasks)
Our model, named REIN-2, is a meta-learning scheme formulated within the RL framework, the goal of which is to develop a meta-RL agent that learns how to produce other RL agents.
Compared to traditional state-of-the-art Deep RL algorithms, experimental results show remarkable performance of our model in popular OpenAI Gym environments.
arXiv Detail & Related papers (2021-10-11T10:13:49Z) - Meta-Learning with Neural Tangent Kernels [58.06951624702086]
We propose the first meta-learning paradigm in the Reproducing Kernel Hilbert Space (RKHS) induced by the meta-model's Neural Tangent Kernel (NTK)
Within this paradigm, we introduce two meta-learning algorithms, which no longer need a sub-optimal iterative inner-loop adaptation as in the MAML framework.
We achieve this goal by 1) replacing the adaptation with a fast-adaptive regularizer in the RKHS; and 2) solving the adaptation analytically based on the NTK theory.
arXiv Detail & Related papers (2021-02-07T20:53:23Z) - Active Finite Reward Automaton Inference and Reinforcement Learning
Using Queries and Counterexamples [31.31937554018045]
Deep reinforcement learning (RL) methods require intensive data from the exploration of the environment to achieve satisfactory performance.
We propose a framework that enables an RL agent to reason over its exploration process and distill high-level knowledge for effectively guiding its future explorations.
Specifically, we propose a novel RL algorithm that learns high-level knowledge in the form of a finite reward automaton by using the L* learning algorithm.
arXiv Detail & Related papers (2020-06-28T21:13:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.