Fugu-MT 論文翻訳(概要): Validating Vision Transformers for Otoscopy: Performance and Data-Leakage Effects

論文の概要: Validating Vision Transformers for Otoscopy: Performance and Data-Leakage Effects

arxiv url: http://arxiv.org/abs/2511.04872v1
Date: Thu, 06 Nov 2025 23:20:37 GMT
ステータス: 翻訳完了
システム内更新日: 2025-11-10 21:00:44.617444
Title: Validating Vision Transformers for Otoscopy: Performance and Data-Leakage Effects
Title（参考訳）: 眼科用視覚変換器のバリデーション:性能とデータ漏洩効果
Authors: James Ndubuisi, Fernando Auat, Marta Vallejo,
Abstract要約: 本研究では、耳疾患の診断精度を高めるために、視覚トランスモデル、特にスウィントランスモデルの有効性を評価する。この研究はチリ大学臨床病院の耳鼻咽喉科の実際のデータセットを利用した。
参考スコア（独自算出の注目度）: 42.465094107111646
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This study evaluates the efficacy of vision transformer models, specifically Swin transformers, in enhancing the diagnostic accuracy of ear diseases compared to traditional convolutional neural networks. With a reported 27% misdiagnosis rate among specialist otolaryngologists, improving diagnostic accuracy is crucial. The research utilised a real-world dataset from the Department of Otolaryngology at the Clinical Hospital of the Universidad de Chile, comprising otoscopic videos of ear examinations depicting various middle and external ear conditions. Frames were selected based on the Laplacian and Shannon entropy thresholds, with blank frames removed. Initially, Swin v1 and Swin v2 transformer models achieved accuracies of 100% and 99.1%, respectively, marginally outperforming the ResNet model (99.5%). These results surpassed metrics reported in related studies. However, the evaluation uncovered a critical data leakage issue in the preprocessing step, affecting both this study and related research using the same raw dataset. After mitigating the data leakage, model performance decreased significantly. Corrected accuracies were 83% for both Swin v1 and Swin v2, and 82% for the ResNet model. This finding highlights the importance of rigorous data handling in machine learning studies, especially in medical applications. The findings indicate that while vision transformers show promise, it is essential to find an optimal balance between the benefits of advanced model architectures and those derived from effective data preprocessing. This balance is key to developing a reliable machine learning model for diagnosing ear diseases.
Abstract（参考訳）: 本研究では、従来の畳み込みニューラルネットワークと比較して耳疾患の診断精度を高めるために、視覚トランスモデル、特にスウィントランスモデルの有効性を評価する。専門の耳鼻咽喉科医の誤診率は27%と報告されており、診断精度の向上が重要である。この研究は、チリ大学臨床病院の耳咽喉科の実際のデータセットを利用して、様々な中耳と外耳の状態を映した耳の観察ビデオを含む。フレームはラプラシアとシャノンのエントロピー閾値に基づいて選択され、ブランクフレームは除去された。当初、Swin v1とSwin v2のトランスフォーマーモデルはそれぞれ100%と99.1%の精度を達成し、ResNetモデル(99.5%)を上回った。これらの結果は、関連する研究で報告された指標を上回った。しかし、この評価は前処理工程において重要なデータ漏洩問題を明らかにし、同じ生データセットを用いた研究と関連する研究の両方に影響を及ぼした。データ漏洩を緩和した後、モデルの性能は大幅に低下した。修正精度はSwin v1とSwin v2の両方で83%、ResNetモデルでは82%であった。この発見は、特に医学応用における機械学習研究における厳密なデータ処理の重要性を強調している。この結果は,ビジョントランスフォーマーが将来性を示す一方で,高度なモデルアーキテクチャの利点と,効率的なデータ前処理の利点との最適なバランスを見つけることが不可欠であることを示唆している。このバランスは、耳疾患を診断するための信頼性の高い機械学習モデルを開発する上で鍵となる。

論文の概要: Validating Vision Transformers for Otoscopy: Performance and Data-Leakage Effects

関連論文リスト