Related papers: Cooking Object's State Identification Without Using Pretrained Model

Cooking Object's State Identification Without Using Pretrained Model

URL: http://arxiv.org/abs/2103.02305v1
Date: Wed, 3 Mar 2021 10:33:27 GMT
Title: Cooking Object's State Identification Without Using Pretrained Model
Authors: Md Sadman Sakib
Abstract summary: In this paper, we have proposed a CNN and trained it from scratch. The model is trained and tested on the dataset from cooking state recognition challenge. Our model achieves 65.8% accuracy on the unseen test dataset.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recently, Robotic Cooking has been a very promising field. To execute a recipe, a robot has to recognize different objects and their states. Contrary to object recognition, state identification has not been explored that much. But it is very important because different recipe might require different state of an object. Moreover, robotic grasping depends on the state. Pretrained model usually perform very well in this type of tests. Our challenge was to handle this problem without using any pretrained model. In this paper, we have proposed a CNN and trained it from scratch. The model is trained and tested on the dataset from cooking state recognition challenge. We have also evaluated the performance of our network from various perspective. Our model achieves 65.8% accuracy on the unseen test dataset.

Related papers

Realtime Person Identification via Gait Analysis [1.3260363717086592]
We propose a small CNN model with 4 layers that is very amenable for edge AI deployment and realtime gait recognition. Our model achieves 96.7% accuracy and consumes only 5KB RAM with an inferencing time of 70 ms and 125mW power.
arXiv Detail & Related papers (2024-04-02T18:15:06Z)
Continuous Object State Recognition for Cooking Robots Using Pre-Trained Vision-Language Models and Black-box Optimization [18.41474014665171]
We propose a method to recognize the continuous state changes of food for cooking robots through the spoken language. We show that by adjusting the weighting of each text prompt, more accurate and robust continuous state recognition can be achieved.
arXiv Detail & Related papers (2024-03-13T04:45:40Z)
Rethinking Cooking State Recognition with Vision Transformers [0.0]
Self-attention mechanism of Vision Transformer (ViT) architecture is proposed for the Cooking State Recognition task. The proposed approach encapsulates the globally salient features from images, while also exploiting the weights learned from a larger dataset. Our framework has an accuracy of 94.3%, which significantly outperforms the state-of-the-art.
arXiv Detail & Related papers (2022-12-16T17:06:28Z)
Finding Differences Between Transformers and ConvNets Using Counterfactual Simulation Testing [82.67716657524251]
We present a counterfactual framework that allows us to study the robustness of neural networks with respect to naturalistic variations. Our method allows for a fair comparison of the robustness of recently released, state-of-the-art Convolutional Neural Networks and Vision Transformers.
arXiv Detail & Related papers (2022-11-29T18:59:23Z)
Could Giant Pretrained Image Models Extract Universal Representations? [94.97056702288317]
We present a study of frozen pretrained models when applied to diverse and representative computer vision tasks. Our work answers the questions of what pretraining task fits best with this frozen setting, how to make the frozen setting more flexible to various downstream tasks, and the effect of larger model sizes.
arXiv Detail & Related papers (2022-11-03T17:57:10Z)
Sim-to-Real 6D Object Pose Estimation via Iterative Self-training for Robotic Bin-picking [98.5984733963713]
We propose an iterative self-training framework for sim-to-real 6D object pose estimation to facilitate cost-effective robotic grasping. We establish a photo-realistic simulator to synthesize abundant virtual data, and use this to train an initial pose estimation network. This network then takes the role of a teacher model, which generates pose predictions for unlabeled real data.
arXiv Detail & Related papers (2022-04-14T15:54:01Z)
Where is my hand? Deep hand segmentation for visual self-recognition in humanoid robots [129.46920552019247]
We propose the use of a Convolution Neural Network (CNN) to segment the robot hand from an image in an egocentric view. We fine-tuned the Mask-RCNN network for the specific task of segmenting the hand of the humanoid robot Vizzy.
arXiv Detail & Related papers (2021-02-09T10:34:32Z)
Application of Facial Recognition using Convolutional Neural Networks for Entry Access Control [0.0]
The paper focuses on solving the supervised classification problem of taking images of people as input and classifying the person in the image as one of the authors or not. Two approaches are proposed: (1) building and training a neural network called WoodNet from scratch and (2) leveraging transfer learning by utilizing a network pre-trained on the ImageNet database. The results are two models classifying the individuals in the dataset with high accuracy, achieving over 99% accuracy on held-out test data.
arXiv Detail & Related papers (2020-11-23T07:55:24Z)
Fast Uncertainty Quantification for Deep Object Pose Estimation [91.09217713805337]
Deep learning-based object pose estimators are often unreliable and overconfident. In this work, we propose a simple, efficient, and plug-and-play UQ method for 6-DoF object pose estimation.
arXiv Detail & Related papers (2020-11-16T06:51:55Z)
A robot that counts like a child: a developmental model of counting and pointing [69.26619423111092]
A novel neuro-robotics model capable of counting real items is introduced. The model allows us to investigate the interaction between embodiment and numerical cognition. The trained model is able to count a set of items and at the same time points to them.
arXiv Detail & Related papers (2020-08-05T21:06:27Z)
Action Recognition and State Change Prediction in a Recipe Understanding Task Using a Lightweight Neural Network Model [8.49031088470346]
In this paper, we propose a simplified neural network model that separates action recognition and state change prediction. This allows learning to indirectly influence each other.
arXiv Detail & Related papers (2020-01-23T17:04:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.