Related papers: Cluttered Food Grasping with Adaptive Fingers and Synthetic-Data Trained Object Detection

Cluttered Food Grasping with Adaptive Fingers and Synthetic-Data Trained Object Detection

URL: http://arxiv.org/abs/2203.05187v1
Date: Thu, 10 Mar 2022 06:44:09 GMT
Title: Cluttered Food Grasping with Adaptive Fingers and Synthetic-Data Trained Object Detection
Authors: Avinash Ummadisingu, Kuniyuki Takahashi, Naoki Fukaya
Abstract summary: Food packaging industry handles an immense variety of food products with wide-ranging shapes and sizes. A popular approach to bin-picking is to first identify each piece of food in the tray by using an instance segmentation method. We propose a method that trains purely on synthetic data and successfully transfers to the real world using sim2real methods.
Score: 8.218146534971156
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The food packaging industry handles an immense variety of food products with wide-ranging shapes and sizes, even within one kind of food. Menus are also diverse and change frequently, making automation of pick-and-place difficult. A popular approach to bin-picking is to first identify each piece of food in the tray by using an instance segmentation method. However, human annotations to train these methods are unreliable and error-prone since foods are packed close together with unclear boundaries and visual similarity making separation of pieces difficult. To address this problem, we propose a method that trains purely on synthetic data and successfully transfers to the real world using sim2real methods by creating datasets of filled food trays using high-quality 3d models of real pieces of food for the training instance segmentation models. Another concern is that foods are easily damaged during grasping. We address this by introducing two additional methods -- a novel adaptive finger mechanism to passively retract when a collision occurs, and a method to filter grasps that are likely to cause damage to neighbouring pieces of food during a grasp. We demonstrate the effectiveness of the proposed method on several kinds of real foods.

Related papers

CookingDiffusion: Cooking Procedural Image Generation with Stable Diffusion [58.92430755180394]
We present textbfCookingDiffusion, a novel approach to generate photo-realistic images of cooking steps. These prompts encompass text prompts, image prompts, and multi-modal prompts, ensuring the consistent generation of cooking procedural images. Our experimental results demonstrate that our model excels at generating high-quality cooking procedural images.
arXiv Detail & Related papers (2025-01-15T06:58:53Z)
RoDE: Linear Rectified Mixture of Diverse Experts for Food Large Multi-Modal Models [96.43285670458803]
Uni-Food is a unified food dataset that comprises over 100,000 images with various food labels. Uni-Food is designed to provide a more holistic approach to food data analysis. We introduce a novel Linear Rectification Mixture of Diverse Experts (RoDE) approach to address the inherent challenges of food-related multitasking.
arXiv Detail & Related papers (2024-07-17T16:49:34Z)
Adaptive Visual Imitation Learning for Robotic Assisted Feeding Across Varied Bowl Configurations and Food Types [17.835835270751176]
We introduce a novel visual imitation network with a spatial attention module for robotic assisted feeding (RAF) We propose a framework that integrates visual perception with imitation learning to enable the robot to handle diverse scenarios during scooping. Our approach, named AVIL (adaptive visual imitation learning), exhibits adaptability and robustness across different bowl configurations.
arXiv Detail & Related papers (2024-03-19T16:40:57Z)
NutritionVerse: Empirical Study of Various Dietary Intake Estimation Approaches [59.38343165508926]
Accurate dietary intake estimation is critical for informing policies and programs to support healthy eating. Recent work has focused on using computer vision and machine learning to automatically estimate dietary intake from food images. We introduce NutritionVerse- Synth, the first large-scale dataset of 84,984 synthetic 2D food images with associated dietary information. We also collect a real image dataset, NutritionVerse-Real, containing 889 images of 251 dishes to evaluate realism.
arXiv Detail & Related papers (2023-09-14T13:29:41Z)
Food Image Classification and Segmentation with Attention-based Multiple Instance Learning [51.279800092581844]
The paper presents a weakly supervised methodology for training food image classification and semantic segmentation models. The proposed methodology is based on a multiple instance learning approach in combination with an attention-based mechanism. We conduct experiments on two meta-classes within the FoodSeg103 data set to verify the feasibility of the proposed approach.
arXiv Detail & Related papers (2023-08-22T13:59:47Z)
Transferring Knowledge for Food Image Segmentation using Transformers and Convolutions [65.50975507723827]
Food image segmentation is an important task that has ubiquitous applications, such as estimating the nutritional value of a plate of food. One challenge is that food items can overlap and mix, making them difficult to distinguish. Two models are trained and compared, one based on convolutional neural networks and the other on Bidirectional representation for Image Transformers (BEiT) The BEiT model outperforms the previous state-of-the-art model by achieving a mean intersection over union of 49.4 on FoodSeg103.
arXiv Detail & Related papers (2023-06-15T15:38:10Z)
Cook-Gen: Robust Generative Modeling of Cooking Actions from Recipes [6.666528076345153]
Food computation models have become increasingly popular in assisting people in maintaining healthy eating habits. In this study, we explore the use of generative AI methods to extend current food computation models to include cooking actions. We propose novel aggregation-based generative AI methods, Cook-Gen, that reliably generate cooking actions from recipes.
arXiv Detail & Related papers (2023-06-01T18:49:47Z)
A Large-Scale Benchmark for Food Image Segmentation [62.28029856051079]
We build a new food image dataset FoodSeg103 (and its extension FoodSeg154) containing 9,490 images. We annotate these images with 154 ingredient classes and each image has an average of 6 ingredient labels and pixel-wise masks. We propose a multi-modality pre-training approach called ReLeM that explicitly equips a segmentation model with rich and semantic food knowledge.
arXiv Detail & Related papers (2021-05-12T03:00:07Z)
Leveraging Post Hoc Context for Faster Learning in Bandit Settings with Applications in Robot-Assisted Feeding [23.368884607763093]
Autonomous robot-assisted feeding requires the ability to acquire a wide variety of food items. Previous work showed that the problem can be represented as a linear bandit with visual context. We propose a modified linear contextual bandit framework augmented with post hoc context.
arXiv Detail & Related papers (2020-11-05T01:28:25Z)
Cross-Modal Food Retrieval: Learning a Joint Embedding of Food Images and Recipes with Semantic Consistency and Attention Mechanism [70.85894675131624]
We learn an embedding of images and recipes in a common feature space, such that the corresponding image-recipe embeddings lie close to one another. We propose Semantic-Consistent and Attention-based Networks (SCAN), which regularize the embeddings of the two modalities through aligning output semantic probabilities. We show that we can outperform several state-of-the-art cross-modal retrieval strategies for food images and cooking recipes by a significant margin.
arXiv Detail & Related papers (2020-03-09T07:41:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.