GarmentPile++: Affordance-Driven Cluttered Garments Retrieval with Vision-Language Reasoning
- URL: http://arxiv.org/abs/2603.04158v1
- Date: Wed, 04 Mar 2026 15:13:40 GMT
- Title: GarmentPile++: Affordance-Driven Cluttered Garments Retrieval with Vision-Language Reasoning
- Authors: Mingleyang Li, Yuran Wang, Yue Chen, Tianxing Chen, Jiaqi Liang, Zishun Shen, Haoran Lu, Ruihai Wu, Hao Dong,
- Abstract summary: Garment manipulation has attracted increasing attention due to its critical role in home-assistant robotics.<n>We propose a novel garment retrieval pipeline that can not only follow language instruction to execute safe and clean retrieval but also guarantee exactly one garment is retrieved per attempt.<n>Our pipeline seamlessly integrates vision-language reasoning with visual affordance perception, fully leveraging the high-level reasoning and planning capabilities of VLMs.
- Score: 27.756766557197746
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Garment manipulation has attracted increasing attention due to its critical role in home-assistant robotics. However, the majority of existing garment manipulation works assume an initial state consisting of only one garment, while piled garments are far more common in real-world settings. To bridge this gap, we propose a novel garment retrieval pipeline that can not only follow language instruction to execute safe and clean retrieval but also guarantee exactly one garment is retrieved per attempt, establishing a robust foundation for the execution of downstream tasks (e.g., folding, hanging, wearing). Our pipeline seamlessly integrates vision-language reasoning with visual affordance perception, fully leveraging the high-level reasoning and planning capabilities of VLMs alongside the generalization power of visual affordance for low-level actions. To enhance the VLM's comprehensive awareness of each garment's state within a garment pile, we employ visual segmentation model (SAM2) to execute object segmentation on the garment pile for aiding VLM-based reasoning with sufficient visual cues. A mask fine-tuning mechanism is further integrated to address scenarios where the initial segmentation results are suboptimal. In addition, a dual-arm cooperation framework is deployed to address cases involving large or long garments, as well as excessive garment sagging caused by incorrect grasping point determination, both of which are strenuous for a single arm to handle. The effectiveness of our pipeline are consistently demonstrated across diverse tasks and varying scenarios in both real-world and simulation environments. Project page: https://garmentpile2.github.io/.
Related papers
- NGL-Prompter: Training-Free Sewing Pattern Estimation from a Single Image [4.620470560214746]
Estimating sewing patterns from images is a practical approach for creating high-quality 3D garments.<n>We propose NGL (Natural Garment Language), a novel intermediate language that restructures GarmentCode into a representation more understandable to language models.<n>We evaluate our method on the Dress4D, CloSe and a newly collected dataset of approximately 5,000 in-the-wild fashion images.
arXiv Detail & Related papers (2026-02-24T09:01:11Z) - CLASP: General-Purpose Clothes Manipulation with Semantic Keypoints [26.793986224605977]
This paper presents CLothes mAnipulation with Semantic keyPoints (CLASP), which aims at general-purpose clothes manipulation.<n>The core idea of CLASP is semantic keypoints-e.g., ''left sleeve'' and ''right shoulder''-a sparse spatial-semantic representation.<n>CLASP uses semantic keypoints as an intermediate representation to connect high-level task planning and low-level action execution.
arXiv Detail & Related papers (2025-07-26T15:43:25Z) - Incorporating Visual Correspondence into Diffusion Model for Virtual Try-On [89.9123806553489]
Diffusion models have shown success in virtual try-on (VTON) task.<n>The problem remains challenging to preserve the shape and every detail of the given garment due to the intrinsicity of diffusion model.<n>We propose to explicitly capitalize on visual correspondence as the prior to tame diffusion process.
arXiv Detail & Related papers (2025-05-22T17:52:13Z) - DexGarmentLab: Dexterous Garment Manipulation Environment with Generalizable Policy [88.65584817043676]
Garment manipulation is a critical challenge due to the diversity in garment categories, geometries, and deformations.<n>We propose DexGarmentLab, the first environment specifically designed for dexterous (especially bimanual) garment manipulation.<n>It features large-scale high-quality 3D assets for 15 task scenarios, and refines simulation techniques tailored for garment modeling to reduce the sim-to-real gap.
arXiv Detail & Related papers (2025-05-16T09:26:59Z) - GarmentPile: Point-Level Visual Affordance Guided Retrieval and Adaptation for Cluttered Garments Manipulation [14.604134812602044]
Unlike single-garment manipulation, cluttered scenarios require managing complex garment entanglements and interactions.<n>We learn point-level affordance, the dense representation modeling the complex space and multi-modal manipulation candidates.<n>We introduce an adaptation module, guided by learned affordance, to reorganize highly-entangled garments into states plausible for manipulation.
arXiv Detail & Related papers (2025-03-12T10:39:12Z) - IMAGDressing-v1: Customizable Virtual Dressing [58.44155202253754]
IMAGDressing-v1 is a virtual dressing task that generates freely editable human images with fixed garments and optional conditions.
IMAGDressing-v1 incorporates a garment UNet that captures semantic features from CLIP and texture features from VAE.
We present a hybrid attention module, including a frozen self-attention and a trainable cross-attention, to integrate garment features from the garment UNet into a frozen denoising UNet.
arXiv Detail & Related papers (2024-07-17T16:26:30Z) - Magic Clothing: Controllable Garment-Driven Image Synthesis [7.46772222515689]
We propose Magic Clothing, a latent diffusion model (LDM)-based network architecture for an unexplored garment-driven image synthesis task.
Aiming at generating customized characters wearing the target garments with diverse text prompts, the image controllability is the most critical issue.
We introduce a garment extractor to capture the detailed garment features, and employ self-attention fusion to incorporate them into the pretrained LDMs.
arXiv Detail & Related papers (2024-04-15T07:15:39Z) - GarmentTracking: Category-Level Garment Pose Tracking [47.219348193140775]
We present a complete package to address the category-level garment pose tracking task.<n>A recording system VR-Garment, with which users can manipulate virtual garment models in simulation through a VR interface.<n>A large-scale dataset VR-Folding, with complex garment pose configurations in manipulation like flattening and folding.<n>An end-to-end online tracking framework GarmentTracking, which predicts complete garment pose both in canonical space and task space given a point cloud sequence.
arXiv Detail & Related papers (2023-03-24T10:59:17Z) - UIGR: Unified Interactive Garment Retrieval [105.56179829647142]
Interactive garment retrieval (IGR) aims to retrieve a target garment image based on a reference garment image.
Two IGR tasks have been studied extensively: text-guided garment retrieval (TGR) and visually compatible garment retrieval (VCR)
We propose a Unified Interactive Garment Retrieval (UIGR) framework to unify TGR and VCR.
arXiv Detail & Related papers (2022-04-06T21:54:14Z) - Towards Scalable Unpaired Virtual Try-On via Patch-Routed
Spatially-Adaptive GAN [66.3650689395967]
We propose a texture-preserving end-to-end network, the PAtch-routed SpaTially-Adaptive GAN (PASTA-GAN), that facilitates real-world unpaired virtual try-on.
To disentangle the style and spatial information of each garment, PASTA-GAN consists of an innovative patch-routed disentanglement module.
arXiv Detail & Related papers (2021-11-20T08:36:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.