Fugu-MT 論文翻訳(概要): Match4Annotate: Propagating Sparse Video Annotations via Implicit Neural Feature Matching

論文の概要: Match4Annotate: Propagating Sparse Video Annotations via Implicit Neural Feature Matching

arxiv url: http://arxiv.org/abs/2603.06471v1
Date: Fri, 06 Mar 2026 16:56:46 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-09 13:17:46.281739
Title: Match4Annotate: Propagating Sparse Video Annotations via Implicit Neural Feature Matching
Title（参考訳）: Match4 Annotate: 意図しないニューラル特徴マッチングによるスパースビデオアノテーションの伝搬
Authors: Zhuorui Zhang, Roger Pallarès-López, Praneeth Namburi, Brian W. Anthony,
Abstract要約: Match4Annotateは、ビデオ内および動画間の両方のポイントおよびマスクアノテーションの伝搬のためのフレームワークである。本手法は,テスト時のDINOv3特徴に対するSIRENに基づく暗黙的表現に適合し,連続的かつ高時間的特徴場を生成する。臨床用超音波データセットを3つ評価した。
参考スコア（独自算出の注目度）: 0.5459797813771498
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Acquiring per-frame video annotations remains a primary bottleneck for deploying computer vision in specialized domains such as medical imaging, where expert labeling is slow and costly. Label propagation offers a natural solution, yet existing approaches face fundamental limitations. Video trackers and segmentation models can propagate labels within a single sequence but require per-video initialization and cannot generalize across videos. Classic correspondence pipelines operate on detector-chosen keypoints and struggle in low-texture scenes, while dense feature matching and one-shot segmentation methods enable cross-video propagation but lack spatiotemporal smoothness and unified support for both point and mask annotations. We present Match4Annotate, a lightweight framework for both intra-video and inter-video propagation of point and mask annotations. Our method fits a SIREN-based implicit neural representation to DINOv3 features at test time, producing a continuous, high-resolution spatiotemporal feature field, and learns a smooth implicit deformation field between frame pairs to guide correspondence matching. We evaluate on three challenging clinical ultrasound datasets. Match4Annotate achieves state-of-the-art inter-video propagation, outperforming feature matching and one-shot segmentation baselines, while remaining competitive with specialized trackers for intra-video propagation. Our results show that lightweight, test-time-optimized feature matching pipelines have the potential to offer an efficient and accessible solution for scalable annotation workflows.
Abstract（参考訳）: フレーム単位のビデオアノテーションの取得は、専門家のラベル付けが遅くてコストがかかる医療画像のような特殊な領域にコンピュータビジョンをデプロイする上で、依然として主要なボトルネックとなっている。ラベル伝搬は自然な解決策を提供するが、既存のアプローチは基本的な制限に直面している。ビデオトラッカーとセグメンテーションモデルは、ラベルを単一のシーケンスで伝播することができるが、ビデオ単位の初期化が必要であり、ビデオ全体にわたって一般化できない。古典的な対応パイプラインは検出器長のキーポイントで動作し、低テクスチャシーンで苦労する一方、密集した特徴マッチングとワンショットのセグメンテーションはビデオ間の伝搬を可能にするが、時空間の滑らかさが欠如し、ポイントとマスクのアノテーションの統一サポートが欠如している。 Match4Annotateは、ポイントとマスクのアノテーションをビデオ内とビデオ間の両方で表現するための軽量なフレームワークである。本手法は,DINOv3特徴量に対するSIRENに基づく暗黙的ニューラル表現に適合し,連続的かつ高分解能な時空間特徴量場を生成し,フレーム対間のスムーズな暗黙的変形場を学習し,対応を導出する。臨床用超音波データセットを3つ評価した。 Match4Annotateは、最先端のビデオ間伝搬、優れた特徴マッチング、ワンショットセグメンテーションベースラインを実現すると同時に、ビデオ内伝搬のための特別なトラッカーと競合する。我々の結果は、軽量でテスト時間に最適化された機能マッチングパイプラインは、スケーラブルなアノテーションワークフローに対して効率的でアクセスしやすいソリューションを提供する可能性があることを示しています。

論文の概要: Match4Annotate: Propagating Sparse Video Annotations via Implicit Neural Feature Matching

関連論文リスト