Fugu-MT 論文翻訳(概要): KE-RCNN: Unifying Knowledge based Reasoning into Part-level Attribute Parsing

論文の概要: KE-RCNN: Unifying Knowledge based Reasoning into Part-level Attribute Parsing

arxiv url: http://arxiv.org/abs/2206.10146v1
Date: Tue, 21 Jun 2022 07:05:14 GMT
ステータス: 翻訳完了
システム内更新日: 2022-06-22 16:53:19.505984
Title: KE-RCNN: Unifying Knowledge based Reasoning into Part-level Attribute Parsing
Title（参考訳）: KE-RCNN:知識に基づく推論をパートレベルの属性解析に統一する
Authors: Xuanhan Wang, Jingkuan Song, Xiaojia Chen, Lechao Cheng, Lianli Gao, Heng Tao Shen
Abstract要約: 部分レベルの解析は基本的だが難しい作業であり、説明可能な身体部分の詳細を提供するには領域レベルの視覚的理解が必要である。既存のほとんどのアプローチでは、属性予測ヘッドを備えた地域畳み込みニューラルネットワーク(RCNN)を2段階検出器に追加することでこの問題に対処している。暗黙の知識を含む豊富な知識を活用することで属性を識別するための知識埋め込みRCNN(KE-RCNN)を提案する。
参考スコア（独自算出の注目度）: 115.55331747000844
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Part-level attribute parsing is a fundamental but challenging task, which requires the region-level visual understanding to provide explainable details of body parts. Most existing approaches address this problem by adding a regional convolutional neural network (RCNN) with an attribute prediction head to a two-stage detector, in which attributes of body parts are identified from local-wise part boxes. However, local-wise part boxes with limit visual clues (i.e., part appearance only) lead to unsatisfying parsing results, since attributes of body parts are highly dependent on comprehensive relations among them. In this article, we propose a Knowledge Embedded RCNN (KE-RCNN) to identify attributes by leveraging rich knowledges, including implicit knowledge (e.g., the attribute ``above-the-hip'' for a shirt requires visual/geometry relations of shirt-hip) and explicit knowledge (e.g., the part of ``shorts'' cannot have the attribute of ``hoodie'' or ``lining''). Specifically, the KE-RCNN consists of two novel components, i.e., Implicit Knowledge based Encoder (IK-En) and Explicit Knowledge based Decoder (EK-De). The former is designed to enhance part-level representation by encoding part-part relational contexts into part boxes, and the latter one is proposed to decode attributes with a guidance of prior knowledge about \textit{part-attribute} relations. In this way, the KE-RCNN is plug-and-play, which can be integrated into any two-stage detectors, e.g., Attribute-RCNN, Cascade-RCNN, HRNet based RCNN and SwinTransformer based RCNN. Extensive experiments conducted on two challenging benchmarks, e.g., Fashionpedia and Kinetics-TPS, demonstrate the effectiveness and generalizability of the KE-RCNN. In particular, it achieves higher improvements over all existing methods, reaching around 3% of AP on Fashionpedia and around 4% of Acc on Kinetics-TPS.
Abstract（参考訳）: 部分レベルの属性解析は基本的だが難しい作業であり、説明可能な身体部分の詳細を提供するには領域レベルの視覚的理解が必要である。既存のほとんどのアプローチでは、属性予測ヘッドを持つ地域畳み込みニューラルネットワーク(RCNN)を2段階検出器に追加することでこの問題に対処している。しかし、身体部位の属性はそれらの包括的関係に大きく依存するため、局所的な視覚的な手がかり(すなわち外観のみ)を持つ部分ボックスは、解析結果に満足できない結果をもたらす。本稿では,暗黙の知識(例えば,シャツの'above-the-hip'という属性は,シャツヒップの視覚的/幾何学的関係)や明示的な知識(例えば,'shorts'の一部が'hoodie'や'lining'の属性を持つことができない)を含む,豊富な知識を活用することで属性を識別する知識埋め込みRCNNを提案する。具体的には、KE-RCNNは、IK-En(Implicit Knowledge Based Encoder)とEK-De(Explicit Knowledge Based Decoder)の2つの新しいコンポーネントで構成されている。前者は部分関係コンテキストを部分ボックスにエンコードすることで部分レベル表現を強化するように設計されており、後者は \textit{part-attribute} 関係に関する事前知識のガイダンスを用いて属性をデコードする。このようにして、KE-RCNNはプラグ・アンド・プレイであり、Attribute-RCNN、Cascade-RCNN、HRNetベースのRCNN、SwinTransformerベースのRCNNなどの2段階検出器に統合することができる。 FashionpediaとKinetics-TPSの2つの挑戦的なベンチマークで実施された大規模な実験は、KE-RCNNの有効性と一般化性を実証している。特に、既存のすべての方法よりも高い改善を達成し、 fashionpediaのapの約3%、kinetics-tpsのaccの約4%に達する。

論文の概要: KE-RCNN: Unifying Knowledge based Reasoning into Part-level Attribute Parsing

関連論文リスト