Leveraging Post Hoc Context for Faster Learning in Bandit Settings with
Applications in Robot-Assisted Feeding
- URL: http://arxiv.org/abs/2011.02604v2
- Date: Thu, 25 Mar 2021 22:04:50 GMT
- Title: Leveraging Post Hoc Context for Faster Learning in Bandit Settings with
Applications in Robot-Assisted Feeding
- Authors: Ethan K. Gordon, Sumegh Roychowdhury, Tapomayukh Bhattacharjee, Kevin
Jamieson, Siddhartha S. Srinivasa
- Abstract summary: Autonomous robot-assisted feeding requires the ability to acquire a wide variety of food items.
Previous work showed that the problem can be represented as a linear bandit with visual context.
We propose a modified linear contextual bandit framework augmented with post hoc context.
- Score: 23.368884607763093
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Autonomous robot-assisted feeding requires the ability to acquire a wide
variety of food items. However, it is impossible for such a system to be
trained on all types of food in existence. Therefore, a key challenge is
choosing a manipulation strategy for a previously unseen food item. Previous
work showed that the problem can be represented as a linear bandit with visual
context. However, food has a wide variety of multi-modal properties relevant to
manipulation that can be hard to distinguish visually. Our key insight is that
we can leverage the haptic context we collect during and after manipulation
(i.e., "post hoc") to learn some of these properties and more quickly adapt our
visual model to previously unseen food. In general, we propose a modified
linear contextual bandit framework augmented with post hoc context observed
after action selection to empirically increase learning speed and reduce
cumulative regret. Experiments on synthetic data demonstrate that this effect
is more pronounced when the dimensionality of the context is large relative to
the post hoc context or when the post hoc context model is particularly easy to
learn. Finally, we apply this framework to the bite acquisition problem and
demonstrate the acquisition of 8 previously unseen types of food with 21% fewer
failures across 64 attempts.
Related papers
- ActionCOMET: A Zero-shot Approach to Learn Image-specific Commonsense Concepts about Actions [66.20773952864802]
We develop a dataset consisting of 8.5k images and 59.3k inferences about actions grounded in those images.
We propose ActionCOMET, a framework to discern knowledge present in language models specific to the provided visual input.
arXiv Detail & Related papers (2024-10-17T15:22:57Z) - IMRL: Integrating Visual, Physical, Temporal, and Geometric Representations for Enhanced Food Acquisition [16.32678094159896]
We introduce IMRL (Integrated Multi-Dimensional Representation Learning), which integrates visual, physical, temporal, and geometric representations to enhance robustness and generalizability of IL for food acquisition.
Our approach captures food types and physical properties, models temporal dynamics of acquisition actions, and introduces geometric information to determine optimal scooping points.
IMRL enables IL to adaptively adjust scooping strategies based on context, improving the robot's capability to handle diverse food acquisition scenarios.
arXiv Detail & Related papers (2024-09-18T16:09:06Z) - Adaptive Visual Imitation Learning for Robotic Assisted Feeding Across Varied Bowl Configurations and Food Types [17.835835270751176]
We introduce a novel visual imitation network with a spatial attention module for robotic assisted feeding (RAF)
We propose a framework that integrates visual perception with imitation learning to enable the robot to handle diverse scenarios during scooping.
Our approach, named AVIL (adaptive visual imitation learning), exhibits adaptability and robustness across different bowl configurations.
arXiv Detail & Related papers (2024-03-19T16:40:57Z) - FoodLMM: A Versatile Food Assistant using Large Multi-modal Model [96.76271649854542]
Large Multi-modal Models (LMMs) have made impressive progress in many vision-language tasks.
This paper proposes FoodLMM, a versatile food assistant based on LMMs with various capabilities.
We introduce a series of novel task-specific tokens and heads, enabling the model to predict food nutritional values and multiple segmentation masks.
arXiv Detail & Related papers (2023-12-22T11:56:22Z) - Food Image Classification and Segmentation with Attention-based Multiple
Instance Learning [51.279800092581844]
The paper presents a weakly supervised methodology for training food image classification and semantic segmentation models.
The proposed methodology is based on a multiple instance learning approach in combination with an attention-based mechanism.
We conduct experiments on two meta-classes within the FoodSeg103 data set to verify the feasibility of the proposed approach.
arXiv Detail & Related papers (2023-08-22T13:59:47Z) - Transferring Knowledge for Food Image Segmentation using Transformers
and Convolutions [65.50975507723827]
Food image segmentation is an important task that has ubiquitous applications, such as estimating the nutritional value of a plate of food.
One challenge is that food items can overlap and mix, making them difficult to distinguish.
Two models are trained and compared, one based on convolutional neural networks and the other on Bidirectional representation for Image Transformers (BEiT)
The BEiT model outperforms the previous state-of-the-art model by achieving a mean intersection over union of 49.4 on FoodSeg103.
arXiv Detail & Related papers (2023-06-15T15:38:10Z) - Learning Visuo-Haptic Skewering Strategies for Robot-Assisted Feeding [13.381485293778654]
We leverage visual and haptic observations during interaction with an item to plan skewering motions.
We learn a generalizable, multimodal representation for a food item from raw sensory inputs.
We propose a zero-shot framework to sense visuo-haptic properties of a previously unseen item and reactively skewer it.
arXiv Detail & Related papers (2022-11-26T20:01:03Z) - Cluttered Food Grasping with Adaptive Fingers and Synthetic-Data Trained
Object Detection [8.218146534971156]
Food packaging industry handles an immense variety of food products with wide-ranging shapes and sizes.
A popular approach to bin-picking is to first identify each piece of food in the tray by using an instance segmentation method.
We propose a method that trains purely on synthetic data and successfully transfers to the real world using sim2real methods.
arXiv Detail & Related papers (2022-03-10T06:44:09Z) - Video Understanding as Machine Translation [53.59298393079866]
We tackle a wide variety of downstream video understanding tasks by means of a single unified framework.
We report performance gains over the state-of-the-art on several downstream tasks including video classification (EPIC-Kitchens), question answering (TVQA), captioning (TVC, YouCook2, and MSR-VTT)
arXiv Detail & Related papers (2020-06-12T14:07:04Z) - MCEN: Bridging Cross-Modal Gap between Cooking Recipes and Dish Images
with Latent Variable Model [28.649961369386148]
We present Modality-Consistent Embedding Network (MCEN) that learns modality-invariant representations by projecting images and texts to the same embedding space.
Our method learns the cross-modal alignments during training but computes embeddings of different modalities independently at inference time for the sake of efficiency.
arXiv Detail & Related papers (2020-04-02T16:00:10Z) - Cross-Modal Food Retrieval: Learning a Joint Embedding of Food Images
and Recipes with Semantic Consistency and Attention Mechanism [70.85894675131624]
We learn an embedding of images and recipes in a common feature space, such that the corresponding image-recipe embeddings lie close to one another.
We propose Semantic-Consistent and Attention-based Networks (SCAN), which regularize the embeddings of the two modalities through aligning output semantic probabilities.
We show that we can outperform several state-of-the-art cross-modal retrieval strategies for food images and cooking recipes by a significant margin.
arXiv Detail & Related papers (2020-03-09T07:41:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.