Fugu-MT 論文翻訳(概要): AuroraEdge-V-2B: A Faster And Stronger Edge Visual Large Language Model

論文の概要: AuroraEdge-V-2B: A Faster And Stronger Edge Visual Large Language Model

arxiv url: http://arxiv.org/abs/2601.16615v1
Date: Fri, 23 Jan 2026 10:14:54 GMT
ステータス: 翻訳完了
システム内更新日: 2026-01-26 14:27:27.627874
Title: AuroraEdge-V-2B: A Faster And Stronger Edge Visual Large Language Model
Title（参考訳）: AuroraEdge-V-2B: より高速で強力なエッジビジュアル大言語モデル
Authors: Xiang Chen,
Abstract要約: 本稿では,エッジデプロイメントのためのコンパクトで堅牢で高速なビジュアル大言語モデルであるAuroraEdge-V-2Bを紹介する。リアルタイムのパフォーマンスが向上し、デコードプロセスにおける視覚トークンの数が大幅に削減される。 9つのベンチマークで同じ数のパラメータを持つモデルよりも高いスコアを得る。
参考スコア（独自算出の注目度）: 8.049753893207559
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recently, due to the advancement of multimodal technology, people are attempting to use visual large language models (VLLMs) in industrial production. Many deep learning models (DLMs) deployed in the production environment are gradually being replaced by VLLMs. Compared with DLMs, VLLMs have some advantages in industrial applications: (1) Their strong generalization ability enables them to perform well across a wide range of tasks. (2) They are flexible and can deal with unfamiliar samples through context learning quickly. However, VLLMs also have obvious drawbacks: (1) VLLMs do not perform as well as custom-developed DLMs in specific domains. (2) The number of parameters in VLLMs is generally quite large, and their deployment requires substantial computational resources. (3) VLLMs generally operate much slower than DLMs, making real-time response challenging to achieve. To better utilize VLLMs in industrial applications, we introduce AuroraEdge-V-2B in this work, a compact, robust, and high-speed VLLM designed for edge deployment. To make the model run faster, we also propose a compression-fusion method to improve inference efficiency. AuroraEdge-V-2B has the following notable features: (1) Easy deployment and faster: It has only 2B parameters and is highly suitable for edge deployment, offering better real-time performance. (2) Fewer visual tokens and cheaper: It significantly reduces the number of visual tokens in the decoding process, thereby reducing the floating-point operations by half during inference and making it cheaper to use. (3) Strong performance: It gets a higher score on 9 benchmarks than models with the same number of parameter (e.g., Qwen2-VL-2B, Qwen2.5-VL-3B, InternVL-2.5-2B).
Abstract（参考訳）: 近年,マルチモーダル技術の発展により,視覚的大規模言語モデル(VLLM)を産業生産に利用しようと試みている。実運用環境にデプロイされた多くのディープラーニングモデル(DLM)は、徐々にVLLMに置き換えられている。 DLMと比較して、VLLMは工業的応用においていくつかの利点がある。 2) 柔軟性があり, 文脈学習を通じて, 未知のサンプルを迅速に処理することができる。しかしながら、VLLMには明らかな欠点がある: 1) VLLMは特定のドメインでのカスタム開発DLMと同等に動作しない。 2) VLLM のパラメータの数は概して非常に多く,その展開には相当な計算資源が必要である。 (3) VLLM は DLM よりも動作が遅く,リアルタイム応答が困難である。産業アプリケーションにおけるVLLMをよりよく活用するために,エッジ展開用に設計されたコンパクトで堅牢で高速なVLLMであるAuroraEdge-V-2Bを導入する。また,モデルの実行を高速化するために,推論効率を向上させる圧縮融合法を提案する。 AuroraEdge-V-2Bには次のような注目すべき機能がある。 1) デプロイの容易さと高速化: 2Bパラメータだけで、エッジデプロイメントに非常に適しており、リアルタイムのパフォーマンスが向上している。 2) 視覚トークンが少なくて安価: 復号処理における視覚トークンの数を著しく減らし, 推論中の浮動小数点演算を半分に減らし, 使用コストを下げる。 (3)強い性能:同じパラメータを持つモデル(例えば、Qwen2-VL-2B、Qwen2.5-VL-3B、InternVL-2.5-2B)よりも9つのベンチマークで高いスコアを得る。

論文の概要: AuroraEdge-V-2B: A Faster And Stronger Edge Visual Large Language Model

関連論文リスト