How to Peel with a Knife: Aligning Fine-Grained Manipulation with Human Preference
- URL: http://arxiv.org/abs/2603.03280v1
- Date: Tue, 03 Mar 2026 18:59:32 GMT
- Title: How to Peel with a Knife: Aligning Fine-Grained Manipulation with Human Preference
- Authors: Toru Lin, Shuying Deng, Zhao-Heng Yin, Pieter Abbeel, Jitendra Malik,
- Abstract summary: We present a learning framework for essential manipulation tasks, using peeling with a knife as an example.<n>Our system achieves over 90% average success rates on challenging produce including cucumbers, apples, and potatoes.<n>Remarkably, policies trained on a single produce category exhibit strong zero-shot generalization to unseen in-category instances.
- Score: 73.16380468921543
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many essential manipulation tasks - such as food preparation, surgery, and craftsmanship - remain intractable for autonomous robots. These tasks are characterized not only by contact-rich, force-sensitive dynamics, but also by their "implicit" success criteria: unlike pick-and-place, task quality in these domains is continuous and subjective (e.g. how well a potato is peeled), making quantitative evaluation and reward engineering difficult. We present a learning framework for such tasks, using peeling with a knife as a representative example. Our approach follows a two-stage pipeline: first, we learn a robust initial policy via force-aware data collection and imitation learning, enabling generalization across object variations; second, we refine the policy through preference-based finetuning using a learned reward model that combines quantitative task metrics with qualitative human feedback, aligning policy behavior with human notions of task quality. Using only 50-200 peeling trajectories, our system achieves over 90% average success rates on challenging produce including cucumbers, apples, and potatoes, with performance improving by up to 40% through preference-based finetuning. Remarkably, policies trained on a single produce category exhibit strong zero-shot generalization to unseen in-category instances and to out-of-distribution produce from different categories while maintaining over 90% success rates.
Related papers
- OmniSapiens: A Foundation Model for Social Behavior Processing via Heterogeneity-Aware Relative Policy Optimization [50.11607985532808]
We introduce Heterogeneity-Aware Relative Policy Optimization (HARPO), an RL method that balances leaning across heterogeneous tasks and samples.<n>Using HARPO, we develop and release Omnisapiens-7B 2.0, a foundation model for social behavior processing.<n>Relative to existing behavioral foundation models, Omnisapiens-7B 2.0 achieves the strongest performance across behavioral tasks.
arXiv Detail & Related papers (2026-02-11T08:35:59Z) - Robust Finetuning of Vision-Language-Action Robot Policies via Parameter Merging [53.41119829581115]
Generalist robot policies, trained on large and diverse datasets, have demonstrated the ability to generalize.<n>They still fall short on new tasks not covered in the training data.<n>We develop a method that preserves the generalization capabilities of the generalist policy during finetuning.
arXiv Detail & Related papers (2025-12-09T08:02:11Z) - Using Temperature Sampling to Effectively Train Robot Learning Policies on Imbalanced Datasets [3.342232437547785]
Many datasets of robotic tasks are substantially imbalanced in terms of the physical robotic actions they represent.<n>We propose a simple sampling strategy for policy training that mitigates this imbalance.<n>Our results show substantial improvements on low-resource tasks compared to prior state-of-the-art methods.
arXiv Detail & Related papers (2025-10-22T08:48:55Z) - FLaRe: Achieving Masterful and Adaptive Robot Policies with Large-Scale Reinforcement Learning Fine-Tuning [74.25049012472502]
FLaRe is a large-scale Reinforcement Learning framework that integrates robust pre-trained representations, large-scale training, and gradient stabilization techniques.
Our method aligns pre-trained policies towards task completion, achieving state-of-the-art (SoTA) performance on previously demonstrated and on entirely novel tasks and embodiments.
arXiv Detail & Related papers (2024-09-25T03:15:17Z) - Imagination Policy: Using Generative Point Cloud Models for Learning Manipulation Policies [25.760946763103483]
We propose Imagination Policy, a novel multi-task key-frame policy network for solving high-precision pick and place tasks.<n>Instead of learning actions directly, Imagination Policy generates point clouds to imagine desired states which are then translated to actions using rigid action estimation.
arXiv Detail & Related papers (2024-06-17T17:00:41Z) - Active Task Randomization: Learning Robust Skills via Unsupervised
Generation of Diverse and Feasible Tasks [37.73239471412444]
We introduce Active Task Randomization (ATR), an approach that learns robust skills through the unsupervised generation of training tasks.
ATR selects suitable tasks, which consist of an initial environment state and manipulation goal, for learning robust skills by balancing the diversity and feasibility of the tasks.
We demonstrate that the learned skills can be composed by a task planner to solve unseen sequential manipulation problems based on visual inputs.
arXiv Detail & Related papers (2022-11-11T11:24:55Z) - muNet: Evolving Pretrained Deep Neural Networks into Scalable
Auto-tuning Multitask Systems [4.675744559395732]
Most uses of machine learning today involve training a model from scratch for a particular task, or starting with a model pretrained on a related task and then fine-tuning on a downstream task.
We propose a method that uses the layers of a pretrained deep neural network as building blocks to construct an ML system that can jointly solve an arbitrary number of tasks.
The resulting system can leverage cross tasks knowledge transfer, while being immune from common drawbacks of multitask approaches such as catastrophic forgetting, gradients interference and negative transfer.
arXiv Detail & Related papers (2022-05-22T21:54:33Z) - Diversity-based Trajectory and Goal Selection with Hindsight Experience
Replay [8.259694128526112]
We propose diversity-based trajectory and goal selection with HER (DTGSH)
We show that our method can learn more quickly and reach higher performance than other state-of-the-art approaches on all tasks.
arXiv Detail & Related papers (2021-08-17T21:34:24Z) - Adaptive Task Sampling for Meta-Learning [79.61146834134459]
Key idea of meta-learning for few-shot classification is to mimic the few-shot situations faced at test time.
We propose an adaptive task sampling method to improve the generalization performance.
arXiv Detail & Related papers (2020-07-17T03:15:53Z) - Meta-Reinforcement Learning Robust to Distributional Shift via Model
Identification and Experience Relabeling [126.69933134648541]
We present a meta-reinforcement learning algorithm that is both efficient and extrapolates well when faced with out-of-distribution tasks at test time.
Our method is based on a simple insight: we recognize that dynamics models can be adapted efficiently and consistently with off-policy data.
arXiv Detail & Related papers (2020-06-12T13:34:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.