Related papers: Learning to Ball: Composing Policies for Long-Horizon Basketball Moves

Learning to Ball: Composing Policies for Long-Horizon Basketball Moves

URL: http://arxiv.org/abs/2509.22442v1
Date: Fri, 26 Sep 2025 15:02:05 GMT
Title: Learning to Ball: Composing Policies for Long-Horizon Basketball Moves
Authors: Pei Xu, Zhen Wu, Ruocheng Wang, Vishnu Sarukkai, Kayvon Fatahalian, Ioannis Karamouzas, Victor Zordan, C. Karen Liu,
Abstract summary: Long-horizon tasks consist of subtasks with well-defined goals, separated by transitional subtasks with unclear goals.<n>Existing methods like the mixture of experts and skill chaining struggle with tasks where individual policies do not share significant commonly explored states.<n>We introduce a novel policy integration framework to enable the composition of drastically different motor skills in long-horizon tasks.
Score: 25.21981598232154
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Learning a control policy for a multi-phase, long-horizon task, such as basketball maneuvers, remains challenging for reinforcement learning approaches due to the need for seamless policy composition and transitions between skills. A long-horizon task typically consists of distinct subtasks with well-defined goals, separated by transitional subtasks with unclear goals but critical to the success of the entire task. Existing methods like the mixture of experts and skill chaining struggle with tasks where individual policies do not share significant commonly explored states or lack well-defined initial and terminal states between different phases. In this paper, we introduce a novel policy integration framework to enable the composition of drastically different motor skills in multi-phase long-horizon tasks with ill-defined intermediate states. Based on that, we further introduce a high-level soft router to enable seamless and robust transitions between the subtasks. We evaluate our framework on a set of fundamental basketball skills and challenging transitions. Policies trained by our approach can effectively control the simulated character to interact with the ball and accomplish the long-horizon task specified by real-time user commands, without relying on ball trajectory references.

Related papers

Learning from 10 Demos: Generalisable and Sample-Efficient Policy Learning with Oriented Affordance Frames [10.738838923944876]
Existing methods require a substantial number of demonstrations to cover possible task variations.<n>We introduce oriented affordance frames, a structured representation for state and action spaces.<n>We show how this abstraction allows for compositional generalisation of independently trained sub-policies.<n>We validate our method across three real-world tasks, each requiring multi-step, multi-object interactions.
arXiv Detail & Related papers (2024-10-15T23:57:35Z)
Single-Shot Learning of Stable Dynamical Systems for Long-Horizon Manipulation Tasks [48.54757719504994]
This paper focuses on improving task success rates while reducing the amount of training data needed. Our approach introduces a novel method that segments long-horizon demonstrations into discrete steps defined by waypoints and subgoals. We validate our approach through both simulation and real-world experiments, demonstrating effective transfer from simulation to physical robotic platforms.
arXiv Detail & Related papers (2024-10-01T19:49:56Z)
GO-DICE: Goal-Conditioned Option-Aware Offline Imitation Learning via Stationary Distribution Correction Estimation [1.4703485217797363]
GO-DICE is an offline IL technique for goal-conditioned long-horizon sequential tasks. Inspired by the expansive DICE-family of techniques, policy learning at both the levels transpires within the space of stationary distributions. Experimental results substantiate that GO-DICE outperforms recent baselines.
arXiv Detail & Related papers (2023-12-17T19:47:49Z)
Sequential Dexterity: Chaining Dexterous Policies for Long-Horizon Manipulation [28.37417344133933]
We present Sequential Dexterity, a general system that chains multiple dexterous policies for achieving long-horizon task goals. The core of the system is a transition feasibility function that progressively finetunes the sub-policies for enhancing chaining success rate. Our system demonstrates generalization capability to novel object shapes and is able to zero-shot transfer to a real-world robot equipped with a dexterous hand.
arXiv Detail & Related papers (2023-09-02T16:55:48Z)
Robust and Versatile Bipedal Jumping Control through Reinforcement Learning [141.56016556936865]
This work aims to push the limits of agility for bipedal robots by enabling a torque-controlled bipedal robot to perform robust and versatile dynamic jumps in the real world. We present a reinforcement learning framework for training a robot to accomplish a large variety of jumping tasks, such as jumping to different locations and directions. We develop a new policy structure that encodes the robot's long-term input/output (I/O) history while also providing direct access to a short-term I/O history.
arXiv Detail & Related papers (2023-02-19T01:06:09Z)
Latent Plans for Task-Agnostic Offline Reinforcement Learning [32.938030244921755]
We propose a novel hierarchical approach to learn task-agnostic long-horizon policies from high-dimensional camera observations. We show that our formulation enables producing previously unseen combinations of skills to reach temporally extended goals by "stitching" together latent skills. We even learn one multi-task visuomotor policy for 25 distinct manipulation tasks in the real world which outperforms both imitation learning and offline reinforcement learning techniques.
arXiv Detail & Related papers (2022-09-19T12:27:15Z)
Planning to Practice: Efficient Online Fine-Tuning by Composing Goals in Latent Space [76.46113138484947]
General-purpose robots require diverse repertoires of behaviors to complete challenging tasks in real-world unstructured environments. To address this issue, goal-conditioned reinforcement learning aims to acquire policies that can reach goals for a wide range of tasks on command. We propose Planning to Practice, a method that makes it practical to train goal-conditioned policies for long-horizon tasks.
arXiv Detail & Related papers (2022-05-17T06:58:17Z)
Autonomous Open-Ended Learning of Tasks with Non-Stationary Interdependencies [64.0476282000118]
Intrinsic motivations have proven to generate a task-agnostic signal to properly allocate the training time amongst goals. While the majority of works in the field of intrinsically motivated open-ended learning focus on scenarios where goals are independent from each other, only few of them studied the autonomous acquisition of interdependent tasks. In particular, we first deepen the analysis of a previous system, showing the importance of incorporating information about the relationships between tasks at a higher level of the architecture. Then we introduce H-GRAIL, a new system that extends the previous one by adding a new learning layer to store the autonomously acquired sequences
arXiv Detail & Related papers (2022-05-16T10:43:01Z)
Skill-based Meta-Reinforcement Learning [65.31995608339962]
We devise a method that enables meta-learning on long-horizon, sparse-reward tasks. Our core idea is to leverage prior experience extracted from offline datasets during meta-learning.
arXiv Detail & Related papers (2022-04-25T17:58:19Z)
Modular Adaptive Policy Selection for Multi-Task Imitation Learning through Task Division [60.232542918414985]
Multi-task learning often suffers from negative transfer, sharing information that should be task-specific. This is done by using proto-policies as modules to divide the tasks into simple sub-behaviours that can be shared. We also demonstrate its ability to autonomously divide the tasks into both shared and task-specific sub-behaviours.
arXiv Detail & Related papers (2022-03-28T15:53:17Z)
Goal-Conditioned Reinforcement Learning with Imagined Subgoals [89.67840168694259]
We propose to incorporate imagined subgoals into policy learning to facilitate learning of complex tasks. Imagined subgoals are predicted by a separate high-level policy, which is trained simultaneously with the policy and its critic. We evaluate our approach on complex robotic navigation and manipulation tasks and show that it outperforms existing methods by a large margin.
arXiv Detail & Related papers (2021-07-01T15:30:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.