Fugu-MT 論文翻訳(概要): AttnGen: Attention-Guided Saliency Learning for Interpretable Genomic Sequence Classification

論文の概要: AttnGen: Attention-Guided Saliency Learning for Interpretable Genomic Sequence Classification

arxiv url: http://arxiv.org/abs/2605.14073v1
Date: Wed, 13 May 2026 19:49:40 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-15 21:45:34.487373
Title: AttnGen: Attention-Guided Saliency Learning for Interpretable Genomic Sequence Classification
Title（参考訳）: AttnGen:解釈可能なゲノム配列分類のための注意誘導型サリエンシ学習
Authors: Rayhaneh Shabani Nia, Ali Karkehabadi,
Abstract要約: 我々は、最適化プロセスに解釈可能性を直接組み込む、注意誘導型トレーニングフレームワークであるAttnGenを紹介します。 AttnGenは、注意機構を使用してヌクレオチドレベルの重要度を計算し、トレーニング中の低コントリビューション位置を徐々に抑制する。適度なマスキングでは、AttnGenは96.73%の検証精度を達成し、95.83%の精度で従来のCNNベースラインを上回っている。
参考スコア（独自算出の注目度）: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep neural networks have achieved strong performance in genomic sequence classification; however, relating their predictions to biologically meaningful sequence patterns remains challenging. In this work, we present AttnGen, an attention-guided training framework that embeds interpretability directly into the optimization process. AttnGen computes nucleotide-level importance scores using an attention mechanism and progressively suppresses low-contribution positions during training. This encourages the model to focus its predictions on a compact set of informative regions while reducing reliance on noisy sequence elements. We evaluate AttnGen on the standardized demo_human_or_worm benchmark, a binary classification task over 200-nucleotide sequences. With moderate masking, AttnGen achieves a validation accuracy of 96.73%, outperforming a conventional CNN baseline with 95.83% accuracy, while also exhibiting faster convergence and improved training stability. To assess whether the learned importance scores reflect functionally relevant signal, we conduct perturbation-based analysis by removing high-saliency nucleotides. This causes accuracy to drop from 96.9% to near chance level on a 3,000-sequence evaluation set, indicating that the model relies on a relatively small subset of informative positions. Our analysis shows that masking 10--20% of positions provides the most favorable trade-off between predictive performance and interpretability. These results suggest that attention-guided masking not only improves classification performance but also reshapes how models distribute importance across sequence positions. Although this study focuses on short genomic sequences, the proposed approach may extend to more complex interpretable sequence modeling settings.
Abstract（参考訳）: 深層ニューラルネットワークはゲノム配列分類において強い性能を保っているが、それらの予測を生物学的に意味のある配列パターンに関連付けることは依然として困難である。本稿では、最適化プロセスに解釈可能性を直接組み込む、注意誘導型トレーニングフレームワークであるAttnGenを紹介する。 AttnGenは、注意機構を使用してヌクレオチドレベルの重要スコアを計算し、トレーニング中の低コントリビューション位置を徐々に抑制する。これにより、モデルは、ノイズのあるシーケンス要素への依存を減らしながら、その予測を情報領域のコンパクトな集合に焦点を合わせることができる。我々は,200個のヌクレオチド配列のバイナリ分類タスクである demo_human_or_worm ベンチマークで AttnGen を評価する。適度なマスキングにより、AttnGenは96.73%の検証精度を達成し、従来のCNNベースラインを95.83%の精度で上回った。得られた重要度スコアが機能的に関連したシグナルを反映するかどうかを評価するため,高濃度ヌクレオチドを除去して摂動解析を行う。これにより、3000列の評価セットで96.9%からほぼチャンスレベルまで精度が低下し、このモデルが情報的位置の比較的小さなサブセットに依存していることを示す。分析の結果,10～20%の位置をマスキングすることで,予測性能と解釈可能性のトレードオフが最も良好であることがわかった。これらの結果から,注意誘導マスキングは分類性能を向上するだけでなく,モデルが配列位置間で重要度を分配する方法を再検討すると考えられる。本研究は、短いゲノム配列に焦点をあてるが、提案手法はより複雑な解釈可能な配列モデリング設定にまで拡張される可能性がある。

論文の概要: AttnGen: Attention-Guided Saliency Learning for Interpretable Genomic Sequence Classification

関連論文リスト