RUMI: Rummaging Using Mutual Information
        - URL: http://arxiv.org/abs/2408.10450v1
- Date: Mon, 19 Aug 2024 23:16:18 GMT
- Title: RUMI: Rummaging Using Mutual Information
- Authors: Sheng Zhong, Nima Fazeli, Dmitry Berenson, 
- Abstract summary: Rummaging Using Mutual Information (RUMI) is a method for online generation of robot action sequences.
We develop an information gain cost function and a reachability cost function to keep the object within the robot's reach.
RUMI demonstrates superior performance in both simulated and real tasks compared to baseline methods.
- Score: 9.88370289799239
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract:   This paper presents Rummaging Using Mutual Information (RUMI), a method for online generation of robot action sequences to gather information about the pose of a known movable object in visually-occluded environments. Focusing on contact-rich rummaging, our approach leverages mutual information between the object pose distribution and robot trajectory for action planning. From an observed partial point cloud, RUMI deduces the compatible object pose distribution and approximates the mutual information of it with workspace occupancy in real time. Based on this, we develop an information gain cost function and a reachability cost function to keep the object within the robot's reach. These are integrated into a model predictive control (MPC) framework with a stochastic dynamics model, updating the pose distribution in a closed loop. Key contributions include a new belief framework for object pose estimation, an efficient information gain computation strategy, and a robust MPC-based control scheme. RUMI demonstrates superior performance in both simulated and real tasks compared to baseline methods. 
 
      
        Related papers
        - Learning Video Generation for Robotic Manipulation with Collaborative   Trajectory Control [72.00655365269]
 We present RoboMaster, a novel framework that models inter-object dynamics through a collaborative trajectory formulation.<n>Unlike prior methods that decompose objects, our core is to decompose the interaction process into three sub-stages: pre-interaction, interaction, and post-interaction.<n>Our method outperforms existing approaches, establishing new state-of-the-art performance in trajectory-controlled video generation for robotic manipulation.
 arXiv  Detail & Related papers  (2025-06-02T17:57:06Z)
- Can foundation models actively gather information in interactive   environments to test hypotheses? [56.651636971591536]
 We introduce a framework in which a model must determine the factors influencing a hidden reward function.
We investigate whether approaches such as self- throughput and increased inference time improve information gathering efficiency.
 arXiv  Detail & Related papers  (2024-12-09T12:27:21Z)
- R-AIF: Solving Sparse-Reward Robotic Tasks from Pixels with Active   Inference and World Models [50.19174067263255]
 We introduce prior preference learning techniques and self-revision schedules to help the agent excel in sparse-reward, continuous action, goal-based robotic control POMDP environments.
We show that our agents offer improved performance over state-of-the-art models in terms of cumulative rewards, relative stability, and success rate.
 arXiv  Detail & Related papers  (2024-09-21T18:32:44Z)
- Representing Positional Information in Generative World Models for   Object Manipulation [12.263162194821787]
 We introduce a general approach that empowers world model-based agents to solve object-positioning tasks.
In particular, LCP employs object-centric latent representations that explicitly capture object positional information for goal specification.
Our methods are rigorously evaluated across several manipulation environments, showing favorable performance compared to current model-based control approaches.
 arXiv  Detail & Related papers  (2024-09-18T14:19:50Z)
- Information-driven Affordance Discovery for Efficient Robotic   Manipulation [14.863105174430087]
 We argue that well-directed interactions with the environment can mitigate this problem.
We provide a theoretical justification of our approach and we empirically validate the approach both in simulation and real-world tasks.
Our method, which we dub IDA, enables the efficient discovery of visual affordances for several action primitives.
 arXiv  Detail & Related papers  (2024-05-06T21:25:51Z)
- H-SAUR: Hypothesize, Simulate, Act, Update, and Repeat for Understanding
  Object Articulations from Interactions [62.510951695174604]
 "Hypothesize, Simulate, Act, Update, and Repeat" (H-SAUR) is a probabilistic generative framework that generates hypotheses about how objects articulate given input observations.
We show that the proposed model significantly outperforms the current state-of-the-art articulated object manipulation framework.
We further improve the test-time efficiency of H-SAUR by integrating a learned prior from learning-based vision models.
 arXiv  Detail & Related papers  (2022-10-22T18:39:33Z)
- Nonprehensile Riemannian Motion Predictive Control [57.295751294224765]
 We introduce a novel Real-to-Sim reward analysis technique to reliably imagine and predict the outcome of taking possible actions for a real robotic platform.
We produce a closed-loop controller to reactively push objects in a continuous action space.
We observe that RMPC is robust in cluttered as well as occluded environments and outperforms the baselines.
 arXiv  Detail & Related papers  (2021-11-15T18:50:04Z)
- TRiPOD: Human Trajectory and Pose Dynamics Forecasting in the Wild [77.59069361196404]
 TRiPOD is a novel method for predicting body dynamics based on graph attentional networks.
To incorporate a real-world challenge, we learn an indicator representing whether an estimated body joint is visible/invisible at each frame.
Our evaluation shows that TRiPOD outperforms all prior work and state-of-the-art specifically designed for each of the trajectory and pose forecasting tasks.
 arXiv  Detail & Related papers  (2021-04-08T20:01:00Z)
- Object-Driven Active Mapping for More Accurate Object Pose Estimation
  and Robotic Grasping [5.385583891213281]
 The framework is built on an object SLAM system integrated with a simultaneous multi-object pose estimation process.
By combining the mapping module and the exploration strategy, an accurate object map that is compatible with robotic grasping can be generated.
 arXiv  Detail & Related papers  (2020-12-03T09:36:55Z)
- POMP: Pomcp-based Online Motion Planning for active visual search in
  indoor environments [89.43830036483901]
 We focus on the problem of learning an optimal policy for Active Visual Search (AVS) of objects in known indoor environments with an online setup.
Our POMP method uses as input the current pose of an agent and a RGB-D frame.
We validate our method on the publicly available AVD benchmark, achieving an average success rate of 0.76 with an average path length of 17.1.
 arXiv  Detail & Related papers  (2020-09-17T08:23:50Z)
- Risk-Averse MPC via Visual-Inertial Input and Recurrent Networks for
  Online Collision Avoidance [95.86944752753564]
 We propose an online path planning architecture that extends the model predictive control (MPC) formulation to consider future location uncertainties.
Our algorithm combines an object detection pipeline with a recurrent neural network (RNN) which infers the covariance of state estimates.
The robustness of our methods is validated on complex quadruped robot dynamics and can be generally applied to most robotic platforms.
 arXiv  Detail & Related papers  (2020-07-28T07:34:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.