Fugu-MT 論文翻訳(概要): Modeling Entities as Semantic Points for Visual Information Extraction in the Wild

論文の概要: Modeling Entities as Semantic Points for Visual Information Extraction in the Wild

arxiv url: http://arxiv.org/abs/2303.13095v1
Date: Thu, 23 Mar 2023 08:21:16 GMT
ステータス: 翻訳完了
システム内更新日: 2023-03-24 15:13:18.825084
Title: Modeling Entities as Semantic Points for Visual Information Extraction in the Wild
Title（参考訳）: 野生における視覚情報抽出のための意味ポイントとしての実体のモデリング
Authors: Zhibo Yang, Rujiao Long, Pengfei Wang, Sibo Song, Humen Zhong, Wenqing Cheng, Xiang Bai, Cong Yao
Abstract要約: 文書画像から鍵情報を正確かつ堅牢に抽出する手法を提案する。我々は、エンティティを意味的ポイントとして明示的にモデル化する。つまり、エンティティの中心点は、異なるエンティティの属性と関係を記述する意味情報によって豊かになる。提案手法は,従来の最先端モデルと比較して,エンティティラベルとリンクの性能を著しく向上させることができる。
参考スコア（独自算出の注目度）: 55.91783742370978
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recently, Visual Information Extraction (VIE) has been becoming increasingly important in both the academia and industry, due to the wide range of real-world applications. Previously, numerous works have been proposed to tackle this problem. However, the benchmarks used to assess these methods are relatively plain, i.e., scenarios with real-world complexity are not fully represented in these benchmarks. As the first contribution of this work, we curate and release a new dataset for VIE, in which the document images are much more challenging in that they are taken from real applications, and difficulties such as blur, partial occlusion, and printing shift are quite common. All these factors may lead to failures in information extraction. Therefore, as the second contribution, we explore an alternative approach to precisely and robustly extract key information from document images under such tough conditions. Specifically, in contrast to previous methods, which usually either incorporate visual information into a multi-modal architecture or train text spotting and information extraction in an end-to-end fashion, we explicitly model entities as semantic points, i.e., center points of entities are enriched with semantic information describing the attributes and relationships of different entities, which could largely benefit entity labeling and linking. Extensive experiments on standard benchmarks in this field as well as the proposed dataset demonstrate that the proposed method can achieve significantly enhanced performance on entity labeling and linking, compared with previous state-of-the-art models. Dataset is available at https://www.modelscope.cn/datasets/damo/SIBR/summary.
Abstract（参考訳）: 近年、視覚情報抽出(vie)は、現実世界の幅広いアプリケーションによって、学界と産業の両方でますます重要になっている。これまで、この問題に取り組むために多くの研究が提案されてきた。しかし、これらの手法を評価するために使われるベンチマークは比較的単純であり、現実の複雑さを持つシナリオはこれらのベンチマークで完全には表現されない。この研究の最初の貢献として、我々はVIEの新しいデータセットをキュレートしてリリースし、文書画像は実際のアプリケーションから取り出され、ぼやけや部分閉塞、印刷のシフトといった困難がとても多いという点で、より困難である。これらの要因は情報抽出の失敗につながる可能性がある。そこで,第2のコントリビューションとして,このような厳しい条件下で文書画像からキー情報を正確かつ堅牢に抽出する手法を提案する。具体的には、通常、視覚情報をマルチモーダルアーキテクチャに組み込むか、テキストスポッティングとエンドツーエンドの方法で情報抽出を訓練する以前の方法とは対照的に、エンティティの中心点は、エンティティのラベリングとリンクに大いに役立つ、異なるエンティティの属性と関係を記述したセマンティック情報によって、セマンティックポイントとして明示的にモデル化します。この分野での標準ベンチマークと提案したデータセットの広範な実験により,提案手法は従来の最先端モデルと比較して,エンティティラベリングおよびリンクの性能を大幅に向上できることを示した。 Datasetはhttps://www.modelscope.cn/datasets/damo/SIBR/summaryで入手できる。

論文の概要: Modeling Entities as Semantic Points for Visual Information Extraction in the Wild

関連論文リスト