Fugu-MT 論文翻訳(概要): Faster Segment Anything: Towards Lightweight SAM for Mobile Applications

論文の概要: Faster Segment Anything: Towards Lightweight SAM for Mobile Applications

arxiv url: http://arxiv.org/abs/2306.14289v1
Date: Sun, 25 Jun 2023 16:37:25 GMT
ステータス: 翻訳完了
システム内更新日: 2023-06-27 15:44:22.634968
Title: Faster Segment Anything: Towards Lightweight SAM for Mobile Applications
Title（参考訳）: より高速なセグメンテーション:モバイルアプリケーションのための軽量SAMを目指して
Authors: Chaoning Zhang, Dongshen Han, Yu Qiao, Jung Uk Kim, Sung-Ho Bae, Seungkyu Lee, Choong Seon Hong
Abstract要約: Segment Any Model (SAM) は、関心の対象を背景から切り離すためのプロンプト誘導型視覚基盤モデルである。本研究では,重厚画像エンコーダを軽量画像エンコーダに置き換えることで,SAMをモバイルフレンドリーにすることを目的とする。元のSAMのイメージエンコーダViT-Hから、元のSAMのマスクデコーダと自動的に互換性のある軽量画像エンコーダに、知識を蒸留する。
参考スコア（独自算出の注目度）: 47.177751899636164
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Segment anything model (SAM) is a prompt-guided vision foundation model for cutting out the object of interest from its background. Since Meta research team released the SA project, SAM has attracted significant attention due to its impressive zero-shot transfer performance and high versatility of being compatible with other models for advanced vision applications like image editing with fine-grained control. Many of such use cases need to be run on resource-constraint edge devices, like mobile Apps. In this work, we aim to make SAM mobile-friendly by replacing the heavyweight image encoder with a lightweight one. A naive way to train such a new SAM as in the original SAM paper leads to unsatisfactory performance, especially when limited training sources are available. We find that this is mainly caused by the coupled optimization of the image encoder and mask decoder, motivated by which we propose decoupled distillation. Concretely, we distill the knowledge from the image encoder ViT-H in the original SAM to a lightweight image encoder, which can be automatically compatible with the mask decoder in the original SAM. The training can be completed on a single GPU within less than one day, and the resulting lightweight SAM is termed MobileSAM which is more than 60 times smaller yet performs on par with the original SAM. For inference speed, MobileSAM runs around 10ms per image: 8ms on the image encoder and 2ms on the mask decoder. With superior performance and a higher versatility, our MobileSAM is 7 times smaller and 4 times faster than the concurrent FastSAM, making it more suitable for mobile applications. The code for MobileSAM project is provided at https://github.com/ChaoningZhang/MobileSAM
Abstract（参考訳）: Segment Any Model (SAM) は、関心の対象を背景から切り離すためのプロンプト誘導型視覚基盤モデルである。 Meta研究チームがSAプロジェクトをリリースして以来、SAMは印象的なゼロショット転送性能と、画像編集やきめ細かい制御といった高度なビジョンアプリケーションのために他のモデルと互換性があるという高い汎用性のために、大きな注目を集めている。このようなユースケースの多くは、モバイルアプリのようなリソース制約のあるエッジデバイス上で実行する必要がある。本研究では,重厚画像エンコーダを軽量画像エンコーダに置き換えることで,SAMをモバイルフレンドリーにすることを目的とする。オリジナルのSAM論文のように、このような新しいSAMをトレーニングする簡単な方法は、特に限られたトレーニングソースが利用できる場合、不満足なパフォーマンスをもたらす。画像エンコーダとマスクデコーダの結合最適化が主な原因で,脱カップリング蒸留法を提案する。具体的には、元のSAMのイメージエンコーダViT-Hから、元のSAMのマスクデコーダと自動的に互換性のある軽量画像エンコーダに、知識を蒸留する。トレーニングは1日以内で1つのGPU上で完了することができ、その結果得られる軽量SAMはMobileSAMと呼ばれる。推論速度では、MobileSAMは画像あたり約10msで動作し、画像エンコーダでは8ms、マスクデコーダでは2msである。優れたパフォーマンスと高い汎用性により、MobileSAMは同時実行時のFastSAMの7倍の速さで、モバイルアプリケーションにもより適しています。 MobileSAMプロジェクトのコードはhttps://github.com/ChaoningZhang/MobileSAMにある。

論文の概要: Faster Segment Anything: Towards Lightweight SAM for Mobile Applications

関連論文リスト