Fugu-MT 論文翻訳(概要): Multi-Head Attention based interaction-aware architecture for Bangla Handwritten Character Recognition: Introducing a Primary Dataset

論文の概要: Multi-Head Attention based interaction-aware architecture for Bangla Handwritten Character Recognition: Introducing a Primary Dataset

arxiv url: http://arxiv.org/abs/2604.09717v1
Date: Wed, 08 Apr 2026 13:18:08 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-14 20:13:15.625992
Title: Multi-Head Attention based interaction-aware architecture for Bangla Handwritten Character Recognition: Introducing a Primary Dataset
Title（参考訳）: Bangla手書き文字認識のためのマルチヘッドアテンションに基づく対話型アーキテクチャ:プライマリデータセットの導入
Authors: Mirza Raquib, Asif Pervez Polok, Kedar Nath Biswas, Farida Siddiqi Prity, Saydul Akbar Murad, Nick Rahimi,
Abstract要約: 我々はバングラ文字の新しいバランスの取れたデータセットを構築した。基本文字、合成文字(Juktobarno)、数字を含む。提案したモデルは、構築されたデータセットで98.84%、外部CHBCRベンチマークで96.49%の精度を達成した。
参考スコア（独自算出の注目度）: 2.0524609401792397
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Character recognition is the fundamental part of an optical character recognition (OCR) system. Word recognition, sentence transcription, document digitization, and language processing are some of the higher-order activities that can be done accurately through character recognition. Nonetheless, recognizing handwritten Bangla characters is not an easy task because they are written in different styles with inconsistent stroke patterns and a high degree of visual character resemblance. The datasets available are usually limited in intra-class and inequitable in class distribution. We have constructed a new balanced dataset of Bangla written characters to overcome those problems. This consists of 78 classes and each class has approximately 650 samples. It contains the basic characters, composite (Juktobarno) characters and numerals. The samples were a diverse group comprising a large age range and socioeconomic groups. Elementary and high school students, university students, and professionals are the contributing factors. The sample also has right and left-handed writers. We have further proposed an interaction-aware hybrid deep learning architecture that integrates EfficientNetB3, Vision Transformer, and Conformer modules in parallel. A multi-head cross-attention fusion mechanism enables effective feature interaction across these components. The proposed model achieves 98.84% accuracy on the constructed dataset and 96.49% on the external CHBCR benchmark, demonstrating strong generalization capability. Grad-CAM visualizations further provide interpretability by highlighting discriminative regions. The dataset and source code of this research is publicly available at: https://huggingface.co/MIRZARAQUIB/Bangla_Handwritten_Character_Recognition.
Abstract（参考訳）: 文字認識は光学文字認識(OCR)システムの基本部分である。単語認識、文の書き起こし、文書のデジタル化、言語処理は、文字認識を通じて正確に行うことができる高次アクティビティの1つである。それでも、手書きのバングラ文字の認識は、無矛盾なストロークパターンと高度な視覚的文字類似性を持つ異なるスタイルで書かれているため、容易な作業ではない。利用可能なデータセットは通常、クラス内およびクラスの分散で制限される。我々はこれらの問題を克服するために、新しいバランスの取れたBangla文字のデータセットを構築した。 78のクラスで構成され、各クラスにはおよそ650のサンプルがある。基本文字、合成文字(Juktobarno)、数字を含む。サンプルは多年齢群と社会経済群からなる多彩なグループであった。小・高校生、大学生、専門職などが寄与要因である。サンプルには左右手書きのライターも載っている。我々はさらに、EfficientNetB3、Vision Transformer、Conformerモジュールを並列に統合した対話型ハイブリッドディープラーニングアーキテクチャを提案している。マルチヘッド・クロスアテンション融合機構は、これらのコンポーネント間の効果的な機能相互作用を可能にする。提案したモデルは、構築されたデータセットで98.84%、外部CHBCRベンチマークで96.49%の精度を実現し、強力な一般化能力を示している。 Grad-CAMビジュアライゼーションにより、識別領域の強調による解釈性がさらに向上する。この研究のデータセットとソースコードは、https://huggingface.co/MIRZARAQUIB/Bangla_Hand written_Character_Recognitionで公開されている。

論文の概要: Multi-Head Attention based interaction-aware architecture for Bangla Handwritten Character Recognition: Introducing a Primary Dataset

関連論文リスト