Fugu-MT 論文翻訳(概要): Bidirectional Trained Tree-Structured Decoder for Handwritten Mathematical Expression Recognition

論文の概要: Bidirectional Trained Tree-Structured Decoder for Handwritten Mathematical Expression Recognition

arxiv url: http://arxiv.org/abs/2401.00435v1
Date: Sun, 31 Dec 2023 09:24:21 GMT
ステータス: 翻訳完了
システム内更新日: 2024-01-03 17:17:19.548336
Title: Bidirectional Trained Tree-Structured Decoder for Handwritten Mathematical Expression Recognition
Title（参考訳）: 手書き数式認識のための双方向木構造デコーダ
Authors: Hanbo Cheng, Chenyu Liu, Pengfei Hu, Zhenrong Zhang, Jiefeng Ma, Jun Du
Abstract要約: Handwriting Mathematical Expression Recognition (HMER) タスクは、OCRの分野における重要な分岐である。近年の研究では、双方向コンテキスト情報の導入により、HMERモデルの性能が大幅に向上することが示されている。本稿では,MF-SLT と双方向非同期トレーニング (BAT) 構造を提案する。
参考スコア（独自算出の注目度）: 51.66383337087724
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The Handwritten Mathematical Expression Recognition (HMER) task is a critical branch in the field of OCR. Recent studies have demonstrated that incorporating bidirectional context information significantly improves the performance of HMER models. However, existing methods fail to effectively utilize bidirectional context information during the inference stage. Furthermore, current bidirectional training methods are primarily designed for string decoders and cannot adequately generalize to tree decoders, which offer superior generalization capabilities and structural analysis capacity. In order to overcome these limitations, we propose the Mirror-Flipped Symbol Layout Tree (MF-SLT) and Bidirectional Asynchronous Training (BAT) structure. Our method extends the bidirectional training strategy to the tree decoder, allowing for more effective training by leveraging bidirectional information. Additionally, we analyze the impact of the visual and linguistic perception of the HMER model separately and introduce the Shared Language Modeling (SLM) mechanism. Through the SLM, we enhance the model's robustness and generalization when dealing with visual ambiguity, particularly in scenarios with abundant training data. Our approach has been validated through extensive experiments, demonstrating its ability to achieve new state-of-the-art results on the CROHME 2014, 2016, and 2019 datasets, as well as the HME100K dataset. The code used in our experiments will be publicly available.
Abstract（参考訳）: Handwriting Mathematical Expression Recognition (HMER) タスクは、OCRの分野における重要な分岐である。近年の研究では、双方向コンテキスト情報の導入により、HMERモデルの性能が大幅に向上することが示されている。しかし、既存の手法では、推論段階で双方向の文脈情報を有効に利用できない。さらに、現在の双方向トレーニング方法は、主に文字列デコーダ用に設計されており、ツリーデコーダに適切に一般化することはできない。これらの制約を克服するため,我々はmf-slt(mirror-flipped symbol layout tree)とbat(bidirectional asynchronous training)構造を提案する。本手法は,双方向学習戦略をツリーデコーダに拡張し,双方向情報を活用することにより,より効果的なトレーニングを可能にする。さらに、HMERモデルの視覚的および言語的知覚の影響を別々に分析し、共有言語モデリング(SLM)機構を導入する。 SLMを通して、視覚的曖昧性を扱う場合、特に豊富なトレーニングデータを持つシナリオにおいて、モデルの堅牢性と一般化を強化する。我々のアプローチは広範な実験を通じて検証され、CROHME 2014、2016、2019データセット、およびHME100Kデータセットで新しい最先端結果を達成する能力を示している。私たちの実験で使われたコードは公開されます。

関連論文リスト

Unseen from Seen: Rewriting Observation-Instruction Using Foundation Models for Augmenting Vision-Language Navigation [67.31811007549489]
視覚言語ナビゲーション(VLN)のためのリライト駆動型AugMentation(RAM)パラダイムを提案する。書き換え機構を応用して, シミュレータフリー, 省力化の両面で新たな観察指導が可能となり, 一般化が促進される。離散環境 (R2R, REVERIE, R4R) と連続環境 (R2R-CE) の両方における実験により, 本手法の優れた性能と優れた一般化能力が示された。
論文参考訳（メタデータ） (2025-03-23T13:18:17Z)
MAGNET: Augmenting Generative Decoders with Representation Learning and Infilling Capabilities [37.173351698875685]
本稿では,デコーダのみの大規模言語モデル (LLM) を適応させて,ロバストな表現を生成し,不足するテキストスパンを埋め込む手法であるMAGNETを提案する。この結果から, MAGNET に適応した LLM は, トークンレベルおよび文レベル表現学習タスクにおいて, 強いテキストエンコーダを超越していることがわかった。
論文参考訳（メタデータ） (2025-01-15T08:24:03Z)
Contextualization Distillation from Large Language Model for Knowledge Graph Completion [51.126166442122546]
我々は、差別的かつ生成的なKGCフレームワークと互換性のあるプラグイン・アンド・プレイ方式であるContextualization Distillation戦略を導入する。提案手法は,大規模言語モデルに対して,コンパクトで構造的な三重項を文脈に富んだセグメントに変換するように指示することから始まる。多様なデータセットとKGC技術にわたる総合的な評価は、我々のアプローチの有効性と適応性を強調している。
論文参考訳（メタデータ） (2024-01-28T08:56:49Z)
Visual Explanations of Image-Text Representations via Multi-Modal Information Bottleneck Attribution [49.762034744605955]
視覚言語モデルの解釈性を改善するために,マルチモーダル情報ボトルネック手法を提案する。視覚言語事前学習モデルの帰属分析にM2IBを適用する方法を示す。
論文参考訳（メタデータ） (2023-12-28T18:02:22Z)
Unifying Structure and Language Semantic for Efficient Contrastive Knowledge Graph Completion with Structured Entity Anchors [0.3913403111891026]
知識グラフ補完(KGC)の目標は、すでに知られている訓練された事実を用いて、KGの欠落したリンクを予測することである。本稿では,帰納的推論の力を失うことなく,構造情報と言語意味を効果的に統一する手法を提案する。
論文参考訳（メタデータ） (2023-11-07T11:17:55Z)
RefSAM: Efficiently Adapting Segmenting Anything Model for Referring Video Object Segmentation [53.4319652364256]
本稿では,ビデオオブジェクトのセグメンテーションを参照するためのSAMの可能性を探るRefSAMモデルを提案する。提案手法は,Cross-RValModalを用いることで,モダリティ学習を向上させるためにオリジナルのSAMモデルに適応する。我々は、言語と視覚の特徴を効果的に調整し、融合させるために、パラメータ効率のチューニング戦略を採用している。
論文参考訳（メタデータ） (2023-07-03T13:21:58Z)
Scalable Learning of Latent Language Structure With Logical Offline Cycle Consistency [71.42261918225773]
概念的には、LOCCOは、トレーニング対象のセマンティクスを使用してラベルなしテキストのアノテーションを生成する、自己学習の一形態と見なすことができる。追加ボーナスとして、LOCCOによって生成されたアノテーションは、神経テキスト生成モデルをトレーニングするために自明に再利用することができる。
論文参考訳（メタデータ） (2023-05-31T16:47:20Z)
USER: Unified Semantic Enhancement with Momentum Contrast for Image-Text Retrieval [115.28586222748478]
Image-Text Retrieval (ITR) は、与えられたクエリに意味のあるターゲットインスタンスを、他のモダリティから検索することを目的としている。既存のアプローチは通常、2つの大きな制限に悩まされる。
論文参考訳（メタデータ） (2023-01-17T12:42:58Z)
Handwritten Mathematical Expression Recognition via Attention Aggregation based Bi-directional Mutual Learning [13.696706205837234]
本稿では,アテンションアグリゲーションに基づく双方向相互学習ネットワーク(ABM)を提案する。推論フェーズでは、モデルが既に2つの逆方向から知識を学習していることを考えると、推論にはL2Rブランチのみを使用する。提案手法は,CROHME 2014 では 56.85 %,CROHME 2016 では 52.92 %,CROHME 2019 では 53.96 % である。
論文参考訳（メタデータ） (2021-12-07T09:53:40Z)
Incorporating Linguistic Knowledge for Abstractive Multi-document Summarization [20.572283625521784]
ニューラルネットワークに基づく抽象的多文書要約(MDS)モデルを開発した。依存関係情報を言語誘導型注意機構に処理する。言語信号の助けを借りて、文レベルの関係を正しく捉えることができる。
論文参考訳（メタデータ） (2021-09-23T08:13:35Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。