Fugu-MT 論文翻訳(概要): Object Detection for Understanding Assembly Instruction Using Context-aware Data Augmentation and Cascade Mask R-CNN

論文の概要: Object Detection for Understanding Assembly Instruction Using Context-aware Data Augmentation and Cascade Mask R-CNN

arxiv url: http://arxiv.org/abs/2101.02509v2
Date: Fri, 8 Jan 2021 02:38:51 GMT
ステータス: 翻訳完了
システム内更新日: 2021-04-10 13:30:17.842613
Title: Object Detection for Understanding Assembly Instruction Using Context-aware Data Augmentation and Cascade Mask R-CNN
Title（参考訳）: コンテキスト認識データ拡張とカスケードマスクR-CNNを用いたアセンブリ命令理解のためのオブジェクト検出
Authors: Joosoon Lee, Seongju Lee, Seunghyeok Back, Sungho Shin, Kyoobin Lee
Abstract要約: 音声バブルセグメンテーションのための文脈認識型データ拡張手法を開発した。また,深層学習は,命令中の重要なオブジェクトを検出することで,アセンブリ命令の理解に有用であることが示された。
参考スコア（独自算出の注目度）: 4.3310896118860445
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Understanding assembly instruction has the potential to enhance the robot s task planning ability and enables advanced robotic applications. To recognize the key components from the 2D assembly instruction image, We mainly focus on segmenting the speech bubble area, which contains lots of information about instructions. For this, We applied Cascade Mask R-CNN and developed a context-aware data augmentation scheme for speech bubble segmentation, which randomly combines images cuts by considering the context of assembly instructions. We showed that the proposed augmentation scheme achieves a better segmentation performance compared to the existing augmentation algorithm by increasing the diversity of trainable data while considering the distribution of components locations. Also, we showed that deep learning can be useful to understand assembly instruction by detecting the essential objects in the assembly instruction, such as tools and parts.
Abstract（参考訳）: 組立指導を理解することは、ロボットのタスク計画能力を高め、高度なロボット応用を可能にする可能性がある。 2Dアセンブリ・インストラクション・イメージから鍵成分を認識するため、主に命令に関する情報を多く含む音声バブル領域のセグメンテーションに焦点を当てる。そこで我々はCascade Mask R-CNNを応用し,組立命令のコンテキストを考慮した画像の切り取りをランダムに組み合わせた,音声バブルセグメンテーションのためのコンテキスト対応データ拡張スキームを開発した。提案手法は, 学習可能データの多様性を高めつつ, 部品配置の分布を考慮しつつ, 既存の拡張アルゴリズムよりもセグメンテーション性能がよいことを示した。また,深層学習は,ツールや部品など,アセンブリ命令の必須オブジェクトを検出することで,アセンブリ命令を理解するのに有用であることを示した。

関連論文リスト

Inst-IT: Boosting Multimodal Instance Understanding via Explicit Visual Prompt Instruction Tuning [125.79428219851289]
Inst-ITは、明示的な視覚的プロンプトインストラクションチューニングを通じてインスタンス理解におけるLMMを強化するソリューションである。 Inst-ITは、マルチモーダルなインスタンスレベルの理解を診断するためのベンチマーク、大規模命令チューニングデータセット、継続的命令チューニングトレーニングパラダイムで構成されている。
論文参考訳（メタデータ） (2024-12-04T18:58:10Z)
ForgeryGPT: Multimodal Large Language Model For Explainable Image Forgery Detection and Localization [49.992614129625274]
ForgeryGPTはImage Forgery DetectionとLocalizationタスクを進化させる新しいフレームワークである。多様な言語的特徴空間からの偽画像の高次相関をキャプチャする。新たにカスタマイズされたLarge Language Model (LLM)アーキテクチャを通じて、説明可能な生成と対話を可能にする。
論文参考訳（メタデータ） (2024-10-14T07:56:51Z)
SegPoint: Segment Any Point Cloud via Large Language Model [62.69797122055389]
我々は,多種多様なタスクにまたがるポイントワイドセグメンテーションマスクを生成するSegPointと呼ばれるモデルを提案する。 SegPointは、単一のフレームワーク内でさまざまなセグメンテーションタスクに対処する最初のモデルである。
論文参考訳（メタデータ） (2024-07-18T17:58:03Z)
VISA: Reasoning Video Object Segmentation via Large Language Models [64.33167989521357]
我々は新しいタスク、Reasoning Video Object(ReasonVOS)を導入する。このタスクは、複雑な推論能力を必要とする暗黙のテキストクエリに応答して、セグメンテーションマスクのシーケンスを生成することを目的としている。本稿では、ReasonVOSに取り組むためにVISA(ビデオベース大規模言語命令付きアシスタント)を導入する。
論文参考訳（メタデータ） (2024-07-16T02:29:29Z)
Vision-Aware Text Features in Referring Image Segmentation: From Object Understanding to Context Understanding [26.768147543628096]
本稿では,人間の認知プロセスに触発された対象と文脈の理解を強調する新しい枠組みを提案する。提案手法は,3つのベンチマークデータセットにおいて,大幅な性能向上を実現する。
論文参考訳（メタデータ） (2024-04-12T16:38:48Z)
LISA: Reasoning Segmentation via Large Language Model [68.24075852136761]
そこで我々は,新たなセグメンテーションタスク,すなわち推論セグメンテーションを提案する。このタスクは、複雑で暗黙的なクエリテキストを与えられたセグメンテーションマスクを出力するように設計されている。提案するLISA: Large Language Instructed Assistantは,マルチモーダル大規模言語モデルの言語生成能力を継承する。
論文参考訳（メタデータ） (2023-08-01T17:50:17Z)
Position-Aware Contrastive Alignment for Referring Image Segmentation [65.16214741785633]
マルチモーダル特徴のアライメントを強化するために,位置認識型コントラストアライメントネットワーク(PCAN)を提案する。 1)自然言語記述に関連するすべてのオブジェクトの位置情報を提供する位置認識モジュール(PAM)と,2)マルチモーダルアライメントを強化するコントラスト言語理解モジュール(CLUM)の2つのモジュールで構成されている。
論文参考訳（メタデータ） (2022-12-27T09:13:19Z)
Self-Supervised Visual Representation Learning with Semantic Grouping [50.14703605659837]
我々は、未ラベルのシーン中心のデータから視覚表現を学習する問題に取り組む。本研究では,データ駆動型セマンティックスロット,すなわちSlotConによる協調型セマンティックグルーピングと表現学習のためのコントラスト学習を提案する。
論文参考訳（メタデータ） (2022-05-30T17:50:59Z)
Depth-aware Object Segmentation and Grasp Detection for Robotic Picking Tasks [13.337131101813934]
本稿では,ロボットピッキングタスクの協調型クラス非依存オブジェクト分割と把握検出のための新しいディープニューラルネットワークアーキテクチャを提案する。本稿では,ポイント提案に基づくオブジェクトインスタンスセグメンテーションの精度を高める手法であるDeep-Aware Coordinate Convolution(CoordConv)を紹介する。我々は,Sil'eane と OCID_grasp という,難易度の高いロボットピッキングデータセットに対して,把握検出とインスタンスセグメンテーションの精度を評価する。
論文参考訳（メタデータ） (2021-11-22T11:06:33Z)
PalmTree: Learning an Assembly Language Model for Instruction Embedding [8.74990895782223]
汎用命令埋め込み生成のためのアセンブリ言語モデルであるPalmTreeの事前トレーニングを提案する。 PalmTreeは固有のメトリクスに対して最高のパフォーマンスを持ち、下流タスクの他の命令埋め込みスキームよりも優れています。
論文参考訳（メタデータ） (2021-01-21T22:30:01Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。