Related papers: Visionary: Vision architecture discovery for robot learning

Visionary: Vision architecture discovery for robot learning

URL: http://arxiv.org/abs/2103.14633v1
Date: Fri, 26 Mar 2021 17:51:43 GMT
Title: Visionary: Vision architecture discovery for robot learning
Authors: Iretiayo Akinola, Anelia Angelova, Yao Lu, Yevgen Chebotar, Dmitry Kalashnikov, Jacob Varley, Julian Ibarz, Michael S. Ryoo
Abstract summary: We propose a vision-based architecture search algorithm for robot manipulation learning, which discovers interactions between low dimension action inputs and high dimensional visual inputs. Our approach automatically designs architectures while training on the task - discovering novel ways of combining and attending image feature representations with actions as well as features from previous layers.
Score: 58.67846907923373
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We propose a vision-based architecture search algorithm for robot manipulation learning, which discovers interactions between low dimension action inputs and high dimensional visual inputs. Our approach automatically designs architectures while training on the task - discovering novel ways of combining and attending image feature representations with actions as well as features from previous layers. The obtained new architectures demonstrate better task success rates, in some cases with a large margin, compared to a recent high performing baseline. Our real robot experiments also confirm that it improves grasping performance by 6%. This is the first approach to demonstrate a successful neural architecture search and attention connectivity search for a real-robot task.

Related papers

Rodrigues Network for Learning Robot Actions [76.69283501115855]
We propose the Neural Rodrigues Operator to inject kinematics-aware inductive bias into neural computation.<n>We design the Rodrigues Network (RodriNet), a novel neural architecture specialized for processing actions.<n>Our results suggest that integrating structured kinematic priors into the network architecture improves action learning in various domains.
arXiv Detail & Related papers (2025-06-03T08:34:06Z)
RoBridge: A Hierarchical Architecture Bridging Cognition and Execution for General Robotic Manipulation [90.81956345363355]
RoBridge is a hierarchical intelligent architecture for general robotic manipulation.<n>It consists of a high-level cognitive planner (HCP) based on a large-scale pre-trained vision-language model (VLM)<n>It unleashes the procedural skill of reinforcement learning, effectively bridging the gap between cognition and execution.
arXiv Detail & Related papers (2025-05-03T06:17:18Z)
Offline Imitation Learning Through Graph Search and Retrieval [57.57306578140857]
Imitation learning is a powerful machine learning algorithm for a robot to acquire manipulation skills. We propose GSR, a simple yet effective algorithm that learns from suboptimal demonstrations through Graph Search and Retrieval. GSR can achieve a 10% to 30% higher success rate and over 30% higher proficiency compared to baselines.
arXiv Detail & Related papers (2024-07-22T06:12:21Z)
Enhancing Robot Learning through Learned Human-Attention Feature Maps [6.724036710994883]
We think that embedding auxiliary information about focus point into robot learning would enhance efficiency and robustness of the learning process. In this paper, we propose a novel approach to model and emulate the human attention with an approximate prediction model. We test our approach on two learning tasks - object detection and imitation learning.
arXiv Detail & Related papers (2023-08-29T14:23:44Z)
Exploring Visual Pre-training for Robot Manipulation: Datasets, Models and Methods [14.780597545674157]
We investigate the effects of visual pre-training strategies on robot manipulation tasks from three fundamental perspectives. We propose a visual pre-training scheme for robot manipulation termed Vi-PRoM, which combines self-supervised learning and supervised learning.
arXiv Detail & Related papers (2023-08-07T14:24:52Z)
Fast GraspNeXt: A Fast Self-Attention Neural Network Architecture for Multi-task Learning in Computer Vision Tasks for Robotic Grasping on the Edge [80.88063189896718]
High architectural and computational complexity can result in poor suitability for deployment on embedded devices. Fast GraspNeXt is a fast self-attention neural network architecture tailored for embedded multi-task learning in computer vision tasks for robotic grasping.
arXiv Detail & Related papers (2023-04-21T18:07:14Z)
Physics-Guided Hierarchical Reward Mechanism for Learning-Based Robotic Grasping [10.424363966870775]
We develop a Physics-Guided Deep Reinforcement Learning with a Hierarchical Reward Mechanism to improve learning efficiency and generalizability for learning-based autonomous grasping. Our method is validated in robotic grasping tasks with a 3-finger MICO robot arm.
arXiv Detail & Related papers (2022-05-26T18:01:56Z)
Graph Neural Networks for Relational Inductive Bias in Vision-based Deep Reinforcement Learning of Robot Control [0.0]
This work introduces a neural network architecture that combines relational inductive bias and visual feedback to learn an efficient position control policy. We derive a graph representation that models the robot's internal state with a low-dimensional description of the visual scene generated by an image encoding network. We show the ability of the model to improve sample efficiency for a 6-DoF robot arm in a visually realistic 3D environment.
arXiv Detail & Related papers (2022-03-11T15:11:54Z)
Cognitive architecture aided by working-memory for self-supervised multi-modal humans recognition [54.749127627191655]
The ability to recognize human partners is an important social skill to build personalized and long-term human-robot interactions. Deep learning networks have achieved state-of-the-art results and demonstrated to be suitable tools to address such a task. One solution is to make robots learn from their first-hand sensory data with self-supervision.
arXiv Detail & Related papers (2021-03-16T13:50:24Z)
NAS-DIP: Learning Deep Image Prior with Neural Architecture Search [65.79109790446257]
Recent work has shown that the structure of deep convolutional neural networks can be used as a structured image prior. We propose to search for neural architectures that capture stronger image priors. We search for an improved network by leveraging an existing neural architecture search algorithm.
arXiv Detail & Related papers (2020-08-26T17:59:36Z)
SAPIEN: A SimulAted Part-based Interactive ENvironment [77.4739790629284]
SAPIEN is a realistic and physics-rich simulated environment that hosts a large-scale set for articulated objects. We evaluate state-of-the-art vision algorithms for part detection and motion attribute recognition as well as demonstrate robotic interaction tasks.
arXiv Detail & Related papers (2020-03-19T00:11:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.