論文の概要: Tag, Copy or Predict: A Unified Weakly-Supervised Learning Framework for
Visual Information Extraction using Sequences
- arxiv url: http://arxiv.org/abs/2106.10681v1
- Date: Sun, 20 Jun 2021 11:56:46 GMT
- ステータス: 処理完了
- システム内更新日: 2021-06-22 16:01:05.655223
- Title: Tag, Copy or Predict: A Unified Weakly-Supervised Learning Framework for
Visual Information Extraction using Sequences
- Title(参考訳): tag, copy, predict: シーケンスを用いた視覚情報抽出のための統一的弱教師付き学習フレームワーク
- Authors: Jiapeng Wang, Tianwei Wang, Guozhi Tang, Lianwen Jin, Weihong Ma, Kai
Ding, Yichao Huang
- Abstract要約: TCPN(Tag, Copy, Predict Network)と呼ばれる弱教師付き学習フレームワークを提案する。
- 参考スコア(独自算出の注目度): 27.75850798545413
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Visual information extraction (VIE) has attracted increasing attention in
recent years. The existing methods usually first organized optical character
recognition (OCR) results into plain texts and then utilized token-level entity
annotations as supervision to train a sequence tagging model. However, it
expends great annotation costs and may be exposed to label confusion, and the
OCR errors will also significantly affect the final performance. In this paper,
we propose a unified weakly-supervised learning framework called TCPN (Tag,
Copy or Predict Network), which introduces 1) an efficient encoder to
simultaneously model the semantic and layout information in 2D OCR results; 2)
a weakly-supervised training strategy that utilizes only key information
sequences as supervision; and 3) a flexible and switchable decoder which
contains two inference modes: one (Copy or Predict Mode) is to output key
information sequences of different categories by copying a token from the input
or predicting one in each time step, and the other (Tag Mode) is to directly
tag the input sequence in a single forward pass. Our method shows new
state-of-the-art performance on several public benchmarks, which fully proves
its effectiveness.
- Abstract(参考訳): 近年,視覚情報抽出(VIE)が注目されている。
In this paper, we propose a unified weakly-supervised learning framework called TCPN (Tag, Copy or Predict Network), which introduces 1) an efficient encoder to simultaneously model the semantic and layout information in 2D OCR results; 2) a weakly-supervised training strategy that utilizes only key information sequences as supervision; and 3) a flexible and switchable decoder which contains two inference modes: one (Copy or Predict Mode) is to output key information sequences of different categories by copying a token from the input or predicting one in each time step, and the other (Tag Mode) is to directly tag the input sequence in a single forward pass.
- Enhancing Hyperspectral Image Prediction with Contrastive Learning in Low-Label Regime [0.810304644344495]
論文 参考訳(メタデータ) (2024-10-10T10:20:16Z) - Bidirectional Trained Tree-Structured Decoder for Handwritten
Mathematical Expression Recognition [51.66383337087724]
Handwriting Mathematical Expression Recognition (HMER) タスクは、OCRの分野における重要な分岐である。
本稿では,MF-SLT と双方向非同期トレーニング (BAT) 構造を提案する。
論文 参考訳(メタデータ) (2023-12-31T09:24:21Z) - One-bit Supervision for Image Classification: Problem, Solution, and
Beyond [114.95815360508395]
論文 参考訳(メタデータ) (2023-11-26T07:39:00Z) - Open-Vocabulary Animal Keypoint Detection with Semantic-feature Matching [74.75284453828017]
Open-Vocabulary Keypoint Detection (OVKD)タスクは、任意の種類のキーポイントを特定するためにテキストプロンプトを使用するように設計されている。
セマンティック・フェールマッチング(KDSM)を用いた開語彙キーポイント検出(Open-Vocabulary Keypoint Detection)という新しいフレームワークを開発した。
論文 参考訳(メタデータ) (2023-10-08T07:42:41Z) - Controllable Data Augmentation for Few-Shot Text Mining with Chain-of-Thought Attribute Manipulation [35.33340453046864]
Chain-of-Thought Attribute Manipulation (CoTAM)は、既存の例から新しいデータを生成する新しいアプローチである。
論文 参考訳(メタデータ) (2023-07-14T00:10:03Z) - Scalable Learning of Latent Language Structure With Logical Offline
Cycle Consistency [71.42261918225773]
論文 参考訳(メタデータ) (2023-05-31T16:47:20Z) - Deepfake Detection via Joint Unsupervised Reconstruction and Supervised
Classification [25.84902508816679]
論文 参考訳(メタデータ) (2022-11-24T05:44:26Z) - RetroMAE v2: Duplex Masked Auto-Encoder For Pre-Training
Retrieval-Oriented Language Models [3.4523793651427113]
本稿では,[] と通常のトークンの両方のコンテキスト化埋め込みにおける意味表現能力の向上を目標とする,二重マスク付き自動エンコーダ DupMAE を提案する。
論文 参考訳(メタデータ) (2022-11-16T08:57:55Z) - MAPLE: Masked Pseudo-Labeling autoEncoder for Semi-supervised Point
Cloud Action Recognition [160.49403075559158]
本稿では,Pseudo-Labeling autoEncoder (textbfMAPLE) フレームワークを提案する。
特に、MAPLEのバックボーンとして、新規で効率的なtextbfDecoupled textbfspatial-textbftemporal TranstextbfFormer(textbfDestFormer)を設計する。
論文 参考訳(メタデータ) (2022-09-01T12:32:40Z) - SeCo: Exploring Sequence Supervision for Unsupervised Representation
Learning [114.58986229852489]
私たちはContrastive Learning(SeCo)という特定の形式を導き出します。
論文 参考訳(メタデータ) (2020-08-03T15:51:35Z) - ReADS: A Rectified Attentional Double Supervised Network for Scene Text
Recognition [22.367624178280682]
一般的なシーンテキスト認識のためのRectified Attentional Double Supervised Network (ReADS) を精巧に設計する。
論文 参考訳(メタデータ) (2020-04-05T02:05:35Z)