Related papers: Learning Human-Like RL Agents Through Trajectory Optimization With Action Quantization

Learning Human-Like RL Agents Through Trajectory Optimization With Action Quantization

URL: http://arxiv.org/abs/2511.15055v1
Date: Wed, 19 Nov 2025 02:59:47 GMT
Title: Learning Human-Like RL Agents Through Trajectory Optimization With Action Quantization
Authors: Jian-Ting Guo, Yu-Cheng Chen, Ping-Chun Hsieh, Kuo-Hao Ho, Po-Wei Huang, Ti-Rong Wu, I-Chen Wu,
Abstract summary: We introduce Macro Action Quantization (MAQ), a human-like reinforcement learning framework that distills human demonstrations into macro actions.<n>Experiments on D4RL Adroit benchmarks show that MAQ significantly improves human-likeness, increasing trajectory similarity scores, and achieving the highest human-likeness rankings among all RL agents.<n>Our results also demonstrate that MAQ can be easily integrated into various off-the-shelf RL algorithms, opening a promising direction for learning human-like RL agents.
Score: 20.732922711530527
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Human-like agents have long been one of the goals in pursuing artificial intelligence. Although reinforcement learning (RL) has achieved superhuman performance in many domains, relatively little attention has been focused on designing human-like RL agents. As a result, many reward-driven RL agents often exhibit unnatural behaviors compared to humans, raising concerns for both interpretability and trustworthiness. To achieve human-like behavior in RL, this paper first formulates human-likeness as trajectory optimization, where the objective is to find an action sequence that closely aligns with human behavior while also maximizing rewards, and adapts the classic receding-horizon control to human-like learning as a tractable and efficient implementation. To achieve this, we introduce Macro Action Quantization (MAQ), a human-like RL framework that distills human demonstrations into macro actions via Vector-Quantized VAE. Experiments on D4RL Adroit benchmarks show that MAQ significantly improves human-likeness, increasing trajectory similarity scores, and achieving the highest human-likeness rankings among all RL agents in the human evaluation study. Our results also demonstrate that MAQ can be easily integrated into various off-the-shelf RL algorithms, opening a promising direction for learning human-like RL agents. Our code is available at https://rlg.iis.sinica.edu.tw/papers/MAQ.

Related papers

ResMimic: From General Motion Tracking to Humanoid Whole-body Loco-Manipulation via Residual Learning [59.64325421657381]
Humanoid whole-body loco-manipulation promises transformative capabilities for daily service and warehouse tasks.<n>We introduce ResMimic, a two-stage residual learning framework for precise and expressive humanoid control from human motion data.<n>Results show substantial gains in task success, training efficiency, and robustness over strong baselines.
arXiv Detail & Related papers (2025-10-06T17:47:02Z)
Ego-Foresight: Self-supervised Learning of Agent-Aware Representations for Improved RL [26.169030913260084]
We present Ego-Foresight, a self-supervised method for disentangling agent and environment based on motion and prediction.<n>Our main finding is self-supervised agent-awareness by visuomotor prediction of the agent improves sample-efficiency and performance of the underlying RL algorithm.
arXiv Detail & Related papers (2024-05-27T13:32:43Z)
Enhancing Human Experience in Human-Agent Collaboration: A Human-Centered Modeling Approach Based on Positive Human Gain [18.968232976619912]
We propose a "human-centered" modeling scheme for collaborative AI agents. We expect that agents should learn to enhance the extent to which humans achieve these goals while maintaining agents' original abilities. We evaluate the RLHG agent in the popular Multi-player Online Battle Arena (MOBA) game, Honor of Kings.
arXiv Detail & Related papers (2024-01-28T05:05:57Z)
REBEL: Reward Regularization-Based Approach for Robotic Reinforcement Learning from Human Feedback [61.54791065013767]
A misalignment between the reward function and human preferences can lead to catastrophic outcomes in the real world.<n>Recent methods aim to mitigate misalignment by learning reward functions from human preferences.<n>We propose a novel concept of reward regularization within the robotic RLHF framework.
arXiv Detail & Related papers (2023-12-22T04:56:37Z)
Int-HRL: Towards Intention-based Hierarchical Reinforcement Learning [23.062590084580542]
Int-HRL: Hierarchical RL with intention-based sub-goals that are inferred from human eye gaze. Our evaluations show that replacing hand-crafted sub-goals with automatically extracted intentions leads to a HRL agent that is significantly more sample efficient than previous methods.
arXiv Detail & Related papers (2023-06-20T12:12:16Z)
Learning to Influence Human Behavior with Offline Reinforcement Learning [70.7884839812069]
We focus on influence in settings where there is a need to capture human suboptimality. Experiments online with humans is potentially unsafe, and creating a high-fidelity simulator of the environment is often impractical. We show that offline reinforcement learning can learn to effectively influence suboptimal humans by extending and combining elements of observed human-human behavior.
arXiv Detail & Related papers (2023-03-03T23:41:55Z)
Humans are not Boltzmann Distributions: Challenges and Opportunities for Modelling Human Feedback and Interaction in Reinforcement Learning [13.64577704565643]
We argue that these models are too simplistic and that RL researchers need to develop more realistic human models to design and evaluate their algorithms. This paper calls for research from different disciplines to address key questions about how humans provide feedback to AIs and how we can build more robust human-in-the-loop RL systems.
arXiv Detail & Related papers (2022-06-27T13:58:51Z)
Towards Human-Level Bimanual Dexterous Manipulation with Reinforcement Learning [73.92475751508452]
Bimanual Dexterous Hands Benchmark (Bi-DexHands) is a simulator that involves two dexterous hands with tens of bimanual manipulation tasks and thousands of target objects. Tasks in Bi-DexHands are designed to match different levels of human motor skills according to cognitive science literature.
arXiv Detail & Related papers (2022-06-17T11:09:06Z)
Learning to Run with Potential-Based Reward Shaping and Demonstrations from Video Data [70.540936204654]
"Learning to run" competition was to train a two-legged model of a humanoid body to run in a simulated race course with maximum speed. All submissions took a tabula rasa approach to reinforcement learning (RL) and were able to produce relatively fast, but not optimal running behaviour. We demonstrate how data from videos of human running can be used to shape the reward of the humanoid learning agent.
arXiv Detail & Related papers (2020-12-16T09:46:58Z)
Weak Human Preference Supervision For Deep Reinforcement Learning [48.03929962249475]
The current reward learning from human preferences could be used to resolve complex reinforcement learning (RL) tasks without access to a reward function. We propose a weak human preference supervision framework, for which we developed a human preference scaling model. Our established human-demonstration estimator requires human feedback only for less than 0.01% of the agent's interactions with the environment.
arXiv Detail & Related papers (2020-07-25T10:37:15Z)
RL agents Implicitly Learning Human Preferences [1.52292571922932]
We show that RL agents implicitly learn the preferences of humans in their environment. Training a classifier to predict if a simulated human's preferences are fulfilled based on the activations of a RL agent's neural network gets.93 AUC.
arXiv Detail & Related papers (2020-02-14T17:42:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.