Fugu-MT 論文翻訳(概要): A ROS 2 Wrapper for Florence-2: Multi-Mode Local Vision-Language Inference for Robotic Systems

論文の概要: A ROS 2 Wrapper for Florence-2: Multi-Mode Local Vision-Language Inference for Robotic Systems

arxiv url: http://arxiv.org/abs/2604.01179v1
Date: Wed, 01 Apr 2026 17:29:59 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-02 16:44:32.121433
Title: A ROS 2 Wrapper for Florence-2: Multi-Mode Local Vision-Language Inference for Robotic Systems
Title（参考訳）: フィレンツェ2用ROS2ラッパー:ロボットシステムのための多モード局所視覚言語推論
Authors: J. E. Domínguez-Vidal,
Abstract要約: 本稿では、3つの相補的な相互作用モードを通してモデルを公開するFlorence-2用のROS 2ラッパーについて述べる。ラッパーはローカル実行用に設計されており、ネイティブインストールとDockerコンテナのデプロイの両方をサポートする。
参考スコア（独自算出の注目度）: 0.0
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Foundation vision-language models are becoming increasingly relevant to robotics because they can provide richer semantic perception than narrow task-specific pipelines. However, their practical adoption in robot software stacks still depends on reproducible middleware integrations rather than on model quality alone. Florence-2 is especially attractive in this regard because it unifies captioning, optical character recognition, open-vocabulary detection, grounding and related vision-language tasks within a comparatively manageable model size. This article presents a ROS 2 wrapper for Florence-2 that exposes the model through three complementary interaction modes: continuous topic-driven processing, synchronous service calls and asynchronous actions. The wrapper is designed for local execution and supports both native installation and Docker container deployment. It also combines generic JSON outputs with standard ROS 2 message bindings for detection-oriented tasks. A functional validation is reported together with a throughput study on several GPUs, showing that local deployment is feasible with consumer grade hardware. The repository is publicly available here: https://github.com/JEDominguezVidal/florence2_ros2_wrapper
Abstract（参考訳）: ファンデーションビジョン言語モデルは、狭いタスク固有のパイプラインよりもリッチなセマンティックな知覚を提供することができるため、ロボット工学にますます関係を増している。しかし、ロボットソフトウェアスタックへの実践的な採用は、モデル品質のみではなく再現可能なミドルウェアの統合に依存している。フィレンツェ2はキャプション、光学的文字認識、オープン語彙検出、接地および関連する視覚言語タスクを比較的管理可能なモデルサイズで統合するので、この点において特に魅力的である。本稿では、連続トピック駆動処理、同期サービス呼び出し、非同期アクションの3つの相補的な相互作用モードを通じてモデルを公開するFlorence-2用のROS 2ラッパーを紹介します。ラッパーはローカル実行用に設計されており、ネイティブインストールとDockerコンテナのデプロイの両方をサポートする。また、ジェネリックJSON出力と、検出指向タスクのための標準のROS 2メッセージバインディングを組み合わせる。機能検証は、いくつかのGPUのスループット調査と合わせて報告されており、ローカルデプロイメントがコンシューマグレードのハードウェアで実現可能であることを示している。 https://github.com/JEDominguezVidal/florence2_ros2_wrapper

論文の概要: A ROS 2 Wrapper for Florence-2: Multi-Mode Local Vision-Language Inference for Robotic Systems

関連論文リスト