Related papers: STAR: Learning Diverse Robot Skill Abstractions through Rotation-Augmented Vector Quantization

STAR: Learning Diverse Robot Skill Abstractions through Rotation-Augmented Vector Quantization

URL: http://arxiv.org/abs/2506.03863v2
Date: Wed, 11 Jun 2025 13:50:28 GMT
Title: STAR: Learning Diverse Robot Skill Abstractions through Rotation-Augmented Vector Quantization
Authors: Hao Li, Qi Lv, Rui Shao, Xiang Deng, Yinchuan Li, Jianye Hao, Liqiang Nie,
Abstract summary: We present textbfSkill textbfTraining with textbfAugmented textbfRotation (textbfSTAR), a framework that advances both skill learning and composition to complete complex behaviors.
Score: 87.77475595961154
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Transforming complex actions into discrete skill abstractions has demonstrated strong potential for robotic manipulation. Existing approaches mainly leverage latent variable models, e.g., VQ-VAE, to learn skill abstractions through learned vectors (codebooks), while they suffer from codebook collapse and modeling the causal relationship between learned skills. To address these limitations, we present \textbf{S}kill \textbf{T}raining with \textbf{A}ugmented \textbf{R}otation (\textbf{STAR}), a framework that advances both skill learning and composition to complete complex behaviors. Specifically, to prevent codebook collapse, we devise rotation-augmented residual skill quantization (RaRSQ). It encodes relative angles between encoder outputs into the gradient flow by rotation-based gradient mechanism. Points within the same skill code are forced to be either pushed apart or pulled closer together depending on gradient directions. Further, to capture the causal relationship between skills, we present causal skill transformer (CST) which explicitly models dependencies between skill representations through an autoregressive mechanism for coherent action generation. Extensive experiments demonstrate the superiority of STAR on both LIBERO benchmark and realworld tasks, with around 12\% improvement over the baselines.

Related papers

SimStep: Chain-of-Abstractions for Incremental Specification and Debugging of AI-Generated Interactive Simulations [16.00479720281197]
Chain-of-Abstractions (CoA) is a way to recover programming's core affordances.<n>CoA decomposes the synthesis process into a sequence of cognitively meaningful, task-aligned representations.<n>SimStep is an authoring environment for teachers that scaffolds simulation creation through four intermediate abstractions.
arXiv Detail & Related papers (2025-07-13T14:54:17Z)
Factorized Visual Tokenization and Generation [37.56136469262736]
We introduce Factorized Quantization (FQ), a novel approach that revitalizes VQ-based tokenizers by decomposing a large codebook into multiple independent sub-codebooks.<n>This factorization reduces the lookup complexity of large codebooks, enabling more efficient and scalable visual tokenization.<n> Experiments show that the proposed FQGAN model substantially improves the reconstruction quality of visual tokenizers, achieving state-of-the-art performance.
arXiv Detail & Related papers (2024-11-25T18:59:53Z)
Skill Transfer and Discovery for Sim-to-Real Learning: A Representation-Based Viewpoint [13.28437541072843]
We study sim-to-real skill transfer and discovery in the context of robotics control using representation learning. We propose a skill discovery algorithm that learns new skills caused by the sim-to-real gap from real-world data. Our skill discovery approach helps narrow the sim-to-real gap and improve the real-world controller performance by up to 30.2%.
arXiv Detail & Related papers (2024-04-07T19:22:51Z)
SkillDiffuser: Interpretable Hierarchical Planning via Skill Abstractions in Diffusion-Based Task Execution [75.2573501625811]
Diffusion models have demonstrated strong potential for robotic trajectory planning. generating coherent trajectories from high-level instructions remains challenging. We propose SkillDiffuser, an end-to-end hierarchical planning framework.
arXiv Detail & Related papers (2023-12-18T18:16:52Z)
Action-Quantized Offline Reinforcement Learning for Robotic Skill Learning [68.16998247593209]
offline reinforcement learning (RL) paradigm provides recipe to convert static behavior datasets into policies that can perform better than the policy that collected the data. In this paper, we propose an adaptive scheme for action quantization. We show that several state-of-the-art offline RL methods such as IQL, CQL, and BRAC improve in performance on benchmarks when combined with our proposed discretization scheme.
arXiv Detail & Related papers (2023-10-18T06:07:10Z)
CONCORD: Clone-aware Contrastive Learning for Source Code [64.51161487524436]
Self-supervised pre-training has gained traction for learning generic code representations valuable for many downstream SE tasks. We argue that it is also essential to factor in how developers code day-to-day for general-purpose representation learning. In particular, we propose CONCORD, a self-supervised, contrastive learning strategy to place benign clones closer in the representation space while moving deviants further apart.
arXiv Detail & Related papers (2023-06-05T20:39:08Z)
Deep Imitation Learning for Bimanual Robotic Manipulation [70.56142804957187]
We present a deep imitation learning framework for robotic bimanual manipulation. A core challenge is to generalize the manipulation skills to objects in different locations. We propose to (i) decompose the multi-modal dynamics into elemental movement primitives, (ii) parameterize each primitive using a recurrent graph neural network to capture interactions, and (iii) integrate a high-level planner that composes primitives sequentially and a low-level controller to combine primitive dynamics and inverse kinematics control.
arXiv Detail & Related papers (2020-10-11T01:40:03Z)
Learning and Sequencing of Object-Centric Manipulation Skills for Industrial Tasks [16.308562047398542]
We propose a rapid robot skill-sequencing algorithm, where the skills are encoded by object-centric hidden semi-Markov models. The learned skill models can encode multimodal (temporal and spatial) trajectory distributions. We demonstrate this approach on a 7 DoF robot arm for industrial assembly tasks.
arXiv Detail & Related papers (2020-08-24T14:20:05Z)
Unsupervised Controllable Generation with Self-Training [90.04287577605723]
controllable generation with GANs remains a challenging research problem. We propose an unsupervised framework to learn a distribution of latent codes that control the generator through self-training. Our framework exhibits better disentanglement compared to other variants such as the variational autoencoder.
arXiv Detail & Related papers (2020-07-17T21:50:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.