Object Detection for Understanding Assembly Instruction Using
Context-aware Data Augmentation and Cascade Mask R-CNN
- URL: http://arxiv.org/abs/2101.02509v2
- Date: Fri, 8 Jan 2021 02:38:51 GMT
- Title: Object Detection for Understanding Assembly Instruction Using
Context-aware Data Augmentation and Cascade Mask R-CNN
- Authors: Joosoon Lee, Seongju Lee, Seunghyeok Back, Sungho Shin, Kyoobin Lee
- Abstract summary: We developed a context-aware data augmentation scheme for speech bubble segmentation.
Also, we showed that deep learning can be useful to understand assembly instruction by detecting the essential objects in the instruction.
- Score: 4.3310896118860445
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Understanding assembly instruction has the potential to enhance the robot s
task planning ability and enables advanced robotic applications. To recognize
the key components from the 2D assembly instruction image, We mainly focus on
segmenting the speech bubble area, which contains lots of information about
instructions. For this, We applied Cascade Mask R-CNN and developed a
context-aware data augmentation scheme for speech bubble segmentation, which
randomly combines images cuts by considering the context of assembly
instructions. We showed that the proposed augmentation scheme achieves a better
segmentation performance compared to the existing augmentation algorithm by
increasing the diversity of trainable data while considering the distribution
of components locations. Also, we showed that deep learning can be useful to
understand assembly instruction by detecting the essential objects in the
assembly instruction, such as tools and parts.
Related papers
- RefMask3D: Language-Guided Transformer for 3D Referring Segmentation [32.11635464720755]
RefMask3D aims to explore the comprehensive multi-modal feature interaction and understanding.
RefMask3D outperforms previous state-of-the-art method by a large margin of 3.16% mIoU on the challenging ScanRefer dataset.
arXiv Detail & Related papers (2024-07-25T17:58:03Z) - SegPoint: Segment Any Point Cloud via Large Language Model [62.69797122055389]
We propose a model, called SegPoint, to produce point-wise segmentation masks across a diverse range of tasks.
SegPoint is the first model to address varied segmentation tasks within a single framework.
arXiv Detail & Related papers (2024-07-18T17:58:03Z) - VISA: Reasoning Video Object Segmentation via Large Language Models [64.33167989521357]
We introduce a new task, Reasoning Video Object (ReasonVOS)
This task aims to generate a sequence of segmentation masks in response to implicit text queries that require complex reasoning abilities.
We introduce VISA (Video-based large language Instructed Assistant) to tackle ReasonVOS.
arXiv Detail & Related papers (2024-07-16T02:29:29Z) - Learning Semantic Segmentation with Query Points Supervision on Aerial
Images [62.36946925639107]
We present a weakly supervised learning algorithm to train semantic segmentation algorithms.
Our proposed approach performs accurate semantic segmentation and improves efficiency by significantly reducing the cost and time required for manual annotation.
arXiv Detail & Related papers (2023-09-11T14:32:04Z) - LISA: Reasoning Segmentation via Large Language Model [68.24075852136761]
We propose a new segmentation task -- reasoning segmentation.
The task is designed to output a segmentation mask given a complex and implicit query text.
We present LISA: large Language Instructed Assistant, which inherits the language generation capabilities of multimodal Large Language Models.
arXiv Detail & Related papers (2023-08-01T17:50:17Z) - Position-Aware Contrastive Alignment for Referring Image Segmentation [65.16214741785633]
We present a position-aware contrastive alignment network (PCAN) to enhance the alignment of multi-modal features.
Our PCAN consists of two modules: 1) Position Aware Module (PAM), which provides position information of all objects related to natural language descriptions, and 2) Contrastive Language Understanding Module (CLUM), which enhances multi-modal alignment.
arXiv Detail & Related papers (2022-12-27T09:13:19Z) - Self-Supervised Visual Representation Learning with Semantic Grouping [50.14703605659837]
We tackle the problem of learning visual representations from unlabeled scene-centric data.
We propose contrastive learning from data-driven semantic slots, namely SlotCon, for joint semantic grouping and representation learning.
arXiv Detail & Related papers (2022-05-30T17:50:59Z) - Depth-aware Object Segmentation and Grasp Detection for Robotic Picking
Tasks [13.337131101813934]
We present a novel deep neural network architecture for joint class-agnostic object segmentation and grasp detection for robotic picking tasks.
We introduce depth-aware Coordinate Convolution (CoordConv), a method to increase accuracy for point proposal based object instance segmentation.
We evaluate the accuracy of grasp detection and instance segmentation on challenging robotic picking datasets, namely Sil'eane and OCID_grasp.
arXiv Detail & Related papers (2021-11-22T11:06:33Z) - PalmTree: Learning an Assembly Language Model for Instruction Embedding [8.74990895782223]
We propose to pre-train an assembly language model called PalmTree for generating general-purpose instruction embeddings.
PalmTree has the best performance for intrinsic metrics, and outperforms the other instruction embedding schemes for all downstream tasks.
arXiv Detail & Related papers (2021-01-21T22:30:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.