Related papers: SceneScore: Learning a Cost Function for Object Arrangement

SceneScore: Learning a Cost Function for Object Arrangement

URL: http://arxiv.org/abs/2311.08530v1
Date: Tue, 14 Nov 2023 20:55:40 GMT
Title: SceneScore: Learning a Cost Function for Object Arrangement
Authors: Ivan Kapelyukh, Edward Johns
Abstract summary: "SceneScore" learns a cost function for arrangements, such that desirable, human-like arrangements have a low cost. We learn the distribution of training arrangements offline using an energy-based model, solely from example images. Experiments demonstrate that the learned cost function can be used to predict poses for missing objects, generalise to novel objects using semantic features, and can be composed with other cost functions to satisfy constraints at inference time.
Score: 15.215659641228655
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Arranging objects correctly is a key capability for robots which unlocks a wide range of useful tasks. A prerequisite for creating successful arrangements is the ability to evaluate the desirability of a given arrangement. Our method "SceneScore" learns a cost function for arrangements, such that desirable, human-like arrangements have a low cost. We learn the distribution of training arrangements offline using an energy-based model, solely from example images without requiring environment interaction or human supervision. Our model is represented by a graph neural network which learns object-object relations, using graphs constructed from images. Experiments demonstrate that the learned cost function can be used to predict poses for missing objects, generalise to novel objects using semantic features, and can be composed with other cost functions to satisfy constraints at inference time.

Related papers

ShapeGrasp: Zero-Shot Task-Oriented Grasping with Large Language Models through Geometric Decomposition [8.654140442734354]
Task-oriented grasping of unfamiliar objects is a necessary skill for robots in dynamic in-home environments. We present a novel zero-shot task-oriented grasping method leveraging a geometric decomposition of the target object into simple convex shapes. Our approach employs minimal essential information - the object's name and the intended task - to facilitate zero-shot task-oriented grasping.
arXiv Detail & Related papers (2024-03-26T19:26:53Z)
One-Shot Open Affordance Learning with Foundation Models [54.15857111929812]
We introduce One-shot Open Affordance Learning (OOAL), where a model is trained with just one example per base object category. We propose a vision-language framework with simple and effective designs that boost the alignment between visual features and affordance text embeddings. Experiments on two affordance segmentation benchmarks show that the proposed method outperforms state-of-the-art models with less than 1% of the full training data.
arXiv Detail & Related papers (2023-11-29T16:23:06Z)
Multi-Object Graph Affordance Network: Goal-Oriented Planning through Learned Compound Object Affordances [1.9336815376402723]
The Multi-Object Graph Affordance Network models complex compound object affordances by learning the outcomes of robot actions that facilitate interactions between an object and a compound. We show that our system successfully modeled the affordances of compound objects that include concave and convex objects, in both simulated and real-world environments.
arXiv Detail & Related papers (2023-09-19T08:40:46Z)
Learning Reward Functions for Robotic Manipulation by Observing Humans [92.30657414416527]
We use unlabeled videos of humans solving a wide range of manipulation tasks to learn a task-agnostic reward function for robotic manipulation policies. The learned rewards are based on distances to a goal in an embedding space learned using a time-contrastive objective.
arXiv Detail & Related papers (2022-11-16T16:26:48Z)
PartAfford: Part-level Affordance Discovery from 3D Objects [113.91774531972855]
We present a new task of part-level affordance discovery (PartAfford) Given only the affordance labels per object, the machine is tasked to (i) decompose 3D shapes into parts and (ii) discover how each part corresponds to a certain affordance category. We propose a novel learning framework for PartAfford, which discovers part-level representations by leveraging only the affordance set supervision and geometric primitive regularization.
arXiv Detail & Related papers (2022-02-28T02:58:36Z)
One-Shot Object Affordance Detection in the Wild [76.46484684007706]
Affordance detection refers to identifying the potential action possibilities of objects in an image. We devise a One-Shot Affordance Detection Network (OSAD-Net) that estimates the human action purpose and then transfers it to help detect the common affordance from all candidate images. With complex scenes and rich annotations, our PADv2 dataset can be used as a test bed to benchmark affordance detection methods.
arXiv Detail & Related papers (2021-08-08T14:53:10Z)
Self-Supervision by Prediction for Object Discovery in Videos [62.87145010885044]
In this paper, we use the prediction task as self-supervision and build a novel object-centric model for image sequence representation. Our framework can be trained without the help of any manual annotation or pretrained network. Initial experiments confirm that the proposed pipeline is a promising step towards object-centric video prediction.
arXiv Detail & Related papers (2021-03-09T19:14:33Z)
Model-Based Visual Planning with Self-Supervised Functional Distances [104.83979811803466]
We present a self-supervised method for model-based visual goal reaching. Our approach learns entirely using offline, unlabeled data. We find that this approach substantially outperforms both model-free and model-based prior methods.
arXiv Detail & Related papers (2020-12-30T23:59:09Z)
Relational Learning for Skill Preconditions [15.427056235112152]
We focus on learning precondition models for manipulation skills in unconstrained environments. Our work is motivated by the intuition that many complex manipulation tasks, with multiple objects, can be simplified by focusing on less complex pairwise object relations. We show that our approach leads to significant improvements in predicting preconditions for all 3 tasks, across objects of different shapes and sizes.
arXiv Detail & Related papers (2020-12-03T04:13:49Z)
Tell me what this is: Few-Shot Incremental Object Learning by a Robot [22.387008072671005]
This paper presents a system for incrementally training a robot to recognize different object categories. The paper uses a recently developed state-of-the-art method for few-shot incremental learning of objects.
arXiv Detail & Related papers (2020-07-15T04:42:14Z)
Learning Object Placements For Relational Instructions by Hallucinating Scene Representations [26.897316325189205]
We present a convolutional neural network for estimating pixelwise object placement probabilities for a set of spatial relations from a single input image. Our method does not require ground truth data for the pixelwise relational probabilities or 3D models of the objects. Results obtained using real-world data and human-robot experiments demonstrate the effectiveness of our method.
arXiv Detail & Related papers (2020-01-23T12:58:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.