Fugu-MT 論文翻訳(概要): COAL: Counterfactual and Observation-Enhanced Alignment Learning for Discriminative Referring Multi-Object Tracking

論文の概要: COAL: Counterfactual and Observation-Enhanced Alignment Learning for Discriminative Referring Multi-Object Tracking

arxiv url: http://arxiv.org/abs/2605.14795v1
Date: Thu, 14 May 2026 13:06:00 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-15 21:45:34.833501
Title: COAL: Counterfactual and Observation-Enhanced Alignment Learning for Discriminative Referring Multi-Object Tracking
Title（参考訳）: COAL:識別的参照多対象追跡のための非現実的・観察的強化アライメント学習
Authors: Shukun Jia, Shiyu Hu, Yipei Wang, Ximeng Cheng, Yichao Cao, Xiaobo Lu,
Abstract要約: COAL(Counterfactual and Observation-enhanced Alignment Learning)は、RMOTを知識正規化を通じて分離された構造最適化を超えて前進させるフレームワークである。 VLMによる明示的セマンティックインジェクション(ESI)を導入し,観測空間の密度化とインスタンス識別性の向上を図る。また,頑健な構成認識のための厳密な属性検証を実施することにより,監督を強化するために,対実学習(CFL)を提案する。
参考スコア（独自算出の注目度）: 38.34677413728821
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Referring Multi-Object Tracking (RMOT) faces a fundamental structural contradiction between the high-discriminability demand and the sparse semantic supervision. This mismatch is particularly acute in highly homogeneous scenarios that require fine-grained discrimination over complex compositional semantics. However, under sparse supervision, models overfit to salient yet insufficient cues, thereby encouraging shortcut learning and semantic collapse. To resolve this, we propose COAL (Counterfactual and Observation-enhanced Alignment Learning), a framework that advances RMOT beyond isolated structural optimization through knowledge regularization. First, we introduce Explicit Semantic Injection (ESI) via a VLM to densify the observation space and enhance instance discriminability. Second, leveraging LLM reasoning, we propose Counterfactual Learning (CFL) to augment supervision, enforcing strict attribute verification for robust compositional recognition. These strategies are unified within a Hierarchical Multi-Stream Integration (HMSI) architecture, which distills external knowledge into domain-specific discriminative representations. Experiments on Refer-KITTI and Refer-KITTI-V2 benchmarks validate COAL's efficacy. Notably, it surpasses the state-of-the-art by 7.28% HOTA on the highly challenging Refer-KITTI-V2. These results demonstrate the effectiveness of knowledge regularization for resolving the sparsity-discriminability paradox in RMOT.
Abstract（参考訳）: Referring Multi-Object Tracking (RMOT) は、高識別性要求とスパースセマンティック監視の基本的な構造上の矛盾に直面している。このミスマッチは、複雑な構成的意味論に対してきめ細かい識別を必要とする非常に均一なシナリオにおいて特に急激である。しかし、厳密な監督の下では、モデルは十分に不十分なキューに過度に適合し、ショートカット学習とセマンティック崩壊を促進する。この問題を解決するために, RMOT を知識正規化を通じて孤立的な構造最適化を超えて発展させるフレームワークである COAL (Counterfactual and Observation-enhanced Alignment Learning) を提案する。まず,VLMによる明示的意味注入(ESI)を導入し,観測空間の密度化とインスタンス識別性の向上を図る。第2に,LLM推論を活用して,頑健な構成認識のための厳密な属性検証を実施・強化するための対実的学習(CFL)を提案する。これらの戦略は階層型マルチストリーム統合(HMSI)アーキテクチャ内で統合され、外部知識をドメイン固有の識別表現に蒸留する。 Refer-KITTIとRefer-KITTI-V2ベンチマークの実験は、COALの有効性を検証する。特に、非常に挑戦的なRefer-KITTI-V2において、最先端の7.28%のHOTAを上回っている。これらの結果は,RMOTにおけるスパーシリティ-識別可能性パラドックスの解法における知識正規化の有効性を示す。

論文の概要: COAL: Counterfactual and Observation-Enhanced Alignment Learning for Discriminative Referring Multi-Object Tracking

関連論文リスト