Fugu-MT 論文翻訳(概要): Real Image Denoising with Knowledge Distillation for High-Performance Mobile NPUs

論文の概要: Real Image Denoising with Knowledge Distillation for High-Performance Mobile NPUs

arxiv url: http://arxiv.org/abs/2605.03680v1
Date: Tue, 05 May 2026 12:19:30 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-06 19:35:43.927228
Title: Real Image Denoising with Knowledge Distillation for High-Performance Mobile NPUs
Title（参考訳）: 高性能モバイルNPUのための知識蒸留による実画像認識
Authors: Faraz Kayani, Sarmad Kayani, Asad Ahmed, Radu Timofte, Dmitry Ignatov,
Abstract要約: そこで我々は,モバイルNPUを用いた実世界の画像認識のための,NPU対応ハードウェアアルゴリズムの協調設計手法を提案する。本手法では,高能力の教師を用いて,軽量な学生ネットワークを監督する。
参考スコア（独自算出の注目度）: 41.66955282583396
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: While deep-learning-based image restoration has achieved unprecedented fidelity, deployment on mobile Neural Processing Units (NPUs) remains bottlenecked by operator incompatibility and memory-access overhead. We propose an NPU-aware hardware-algorithm co-design approach for real-world image denoising on mobile NPUs. Our approach employs a high-capacity teacher to supervise a lightweight student network specifically designed to leverage the tiled-memory architectures of modern mobile SoCs. By prioritizing NPU-native primitives -- standard 3x3 convolutions, ReLU activations, and nearest-neighbor upsampling -- and employing a progressive context expansion strategy (up to 1024x1024 crops), the model achieves 37.66 dB PSNR / 0.9278 SSIM on the validation benchmark and 37.58 dB PSNR / 0.9098 SSIM on the held-out test benchmark at full resolution (2432x3200) in the Mobile AI 2026 challenge. Following the official challenge rules, the inference runtime is measured under a standardized Full HD (1088x1920) protocol, where it runs in 34.0 ms on the MediaTek Dimensity 9500 and 46.1 ms on the Qualcomm Snapdragon 8 Elite NPU. We further reveal an "Inference Inversion" effect, where strict adherence to NPU-compatible operations enables dedicated NPU execution up to 3.88x faster than the integrated mobile GPU. The 1.96M-parameter student recovers 99.8% of the teacher's restoration quality via high-alpha knowledge distillation (alpha = 0.9), achieving a 21.2x parameter reduction while closing the PSNR gap from 1.63 dB to only 0.05 dB. These results establish hardware-aware distillation as an effective strategy for unifying high-fidelity denoising with practical deployment across diverse mobile NPU architectures. The proposed lightweight student model (LiteDenoiseNet) and its training statistics are provided in the NN Dataset, available at https://github.com/ABrain-One/NN-Dataset.
Abstract（参考訳）: ディープラーニングベースのイメージ復元は前例のない忠実性を達成したが、モバイルニューラル処理ユニット(NPU)へのデプロイは、オペレータの非互換性とメモリアクセスオーバーヘッドによってボトルネックを被っている。そこで我々は,モバイルNPUを用いた実世界の画像認識のための,NPU対応ハードウェアアルゴリズムの協調設計手法を提案する。提案手法では,現代のモバイルSoCの階層型メモリアーキテクチャを活用するために設計された,軽量な学生ネットワークを監督するために,高容量の教師を用いる。 NPUネイティブプリミティブ -- 標準的な3x3畳み込み、ReLUアクティベーション、最寄りのアップサンプリング -- を優先順位付けし、プログレッシブなコンテキスト拡張戦略(最大1024x1024の作物)を採用することで、検証ベンチマークで37.66dB PSNR / 0.9278 SSIM、Mobile AI 2026チャレンジでフル解像度(2432x3200)でホールドアウトテストベンチマークで37.58dB PSNR / 0.9098 SSIMを達成した。公式なチャレンジルールに従って、推論ランタイムは標準のFull HD (1088x1920)プロトコルで測定され、MediaTek Dimensity 9500で34.0ms、Qualcomm Snapdragon 8 Elite NPUで46.1msで動作する。さらに、NPU互換操作への厳密な固執により、モバイルGPUの最大3.88倍高速な専用NPU実行が可能になる「推論インバージョン」効果を明らかにした。 1.96Mパラメーターの学生は、高アルファの知識蒸留(alpha = 0.9)によって教師の回復品質の99.8%を回復し、PSNRギャップを1.63dBから0.05dBに閉じながら21.2倍のパラメータ還元を達成する。これらの結果は,多種多様なモバイルNPUアーキテクチャにまたがる実用的展開と高忠実度デノナイズを統合化するための有効な戦略として,ハードウェア対応蒸留を確立した。提案されている軽量学生モデル(LiteDenoiseNet)とそのトレーニング統計はNN Datasetで提供されており、https://github.com/ABrain-One/NN-Datasetで公開されている。

論文の概要: Real Image Denoising with Knowledge Distillation for High-Performance Mobile NPUs

関連論文リスト