Fugu-MT 論文翻訳(概要): Live Interactive Training for Video Segmentation

論文の概要: Live Interactive Training for Video Segmentation

arxiv url: http://arxiv.org/abs/2603.26929v1
Date: Fri, 27 Mar 2026 19:10:23 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-31 23:18:44.695466
Title: Live Interactive Training for Video Segmentation
Title（参考訳）: ビデオセグメンテーションのためのライブインタラクティブトレーニング
Authors: Xinyu Yang, Haozheng Yu, Yihong Sun, Bharath Hariharan, Jennifer J. Sun,
Abstract要約: 本稿では,プロンプトベースの視覚システムのための新しいフレームワークであるLive Interactive Training (LIT)を紹介する。我々の主要なインスタンスであるLIT-LoRAは、軽量のLoRAモジュールをオンザフライで継続的に更新することでこれを実装します。我々のLIT-LoRA実装は、挑戦的なビデオセグメンテーションベンチマークの総修正を平均18～34%削減する。
参考スコア（独自算出の注目度）: 41.00426438707627
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Interactive video segmentation often requires many user interventions for robust performance in challenging scenarios (e.g., occlusions, object separations, camouflage, etc.). Yet, even state-of-the-art models like SAM2 use corrections only for immediate fixes without learning from this feedback, leading to inefficient, repetitive user effort. To address this, we introduce Live Interactive Training (LIT), a novel framework for prompt-based visual systems where models also learn online from human corrections at inference time. Our primary instantiation, LIT-LoRA, implements this by continually updating a lightweight LoRA module on-the-fly. When a user provides a correction, this module is rapidly trained on that feedback, allowing the vision system to improve performance on subsequent frames of the same video. Leveraging the core principles of LIT, our LIT-LoRA implementation achieves an average 18-34% reduction in total corrections on challenging video segmentation benchmarks, with a negligible training overhead of ~0.5s per correction. We further demonstrate its generality by successfully adapting it to other segmentation models and extending it to CLIP-based fine-grained image classification. Our work highlights the promise of live adaptation to transform interactive tools and significantly reduce redundant human effort in complex visual tasks. Project: https://youngxinyu1802.github.io/projects/LIT/.
Abstract（参考訳）: インタラクティブなビデオセグメンテーションは、難しいシナリオ(例えば、オクルージョン、オブジェクト分離、カモフラージュなど)において、堅牢なパフォーマンスのために多くのユーザ介入を必要とすることが多い。しかし、SAM2のような最先端のモデルでさえ、このフィードバックから学ぶことなく即時修正にのみ修正を使用します。この問題を解決するために、モデルが推論時に人間の修正からオンラインで学習するプロンプトベースの視覚システムのための新しいフレームワークであるLive Interactive Training (LIT)を紹介した。我々の主要なインスタンスであるLIT-LoRAは、軽量のLoRAモジュールをオンザフライで継続的に更新することでこれを実装します。ユーザが修正を行うと、このモジュールはそのフィードバックに基づいて迅速にトレーニングされる。 LITのコア原則を活用することで、LIT-LoRA実装は、挑戦的なビデオセグメンテーションベンチマークの総修正を平均18～34%削減し、1修正あたり0.5秒のトレーニングオーバーヘッドを無視できる。さらに、他のセグメンテーションモデルに適応し、CLIPに基づくきめ細かい画像分類に拡張することで、その一般化を実証する。我々の研究は、インタラクティブなツールを変換し、複雑な視覚タスクにおける冗長な人間の労力を大幅に削減するライブ適応の可能性を強調している。プロジェクト:https://youngxinyu1802.github.io/projects/LIT/。

論文の概要: Live Interactive Training for Video Segmentation

関連論文リスト