Fugu-MT 論文翻訳(概要): From Vicious to Virtuous Cycles: Synergistic Representation Learning for Unsupervised Video Object-Centric Learning

論文の概要: From Vicious to Virtuous Cycles: Synergistic Representation Learning for Unsupervised Video Object-Centric Learning

arxiv url: http://arxiv.org/abs/2602.03390v1
Date: Tue, 03 Feb 2026 11:11:58 GMT
ステータス: 翻訳完了
システム内更新日: 2026-02-04 18:37:15.408735
Title: From Vicious to Virtuous Cycles: Synergistic Representation Learning for Unsupervised Video Object-Centric Learning
Title（参考訳）: 活気から活気へ:教師なしビデオ中心学習のための相乗的表現学習
Authors: Hyun Seok Seong, WonJun Moon, Jae-Pil Heo,
Abstract要約: 我々は、エンコーダとデコーダが相互に相互に洗練されるような活発なサイクルを導入する。エンコーダとデコーダの間の表現的ギャップを埋めることで、SRLはビデオオブジェクト中心の学習ベンチマークで最先端の結果を得る。
参考スコア（独自算出の注目度）: 45.1920794546889
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Unsupervised object-centric learning models, particularly slot-based architectures, have shown great promise in decomposing complex scenes. However, their reliance on reconstruction-based training creates a fundamental conflict between the sharp, high-frequency attention maps of the encoder and the spatially consistent but blurry reconstruction maps of the decoder. We identify that this discrepancy gives rise to a vicious cycle: the noisy feature map from the encoder forces the decoder to average over possibilities and produce even blurrier outputs, while the gradient computed from blurry reconstruction maps lacks high-frequency details necessary to supervise encoder features. To break this cycle, we introduce Synergistic Representation Learning (SRL) that establishes a virtuous cycle where the encoder and decoder mutually refine one another. SRL leverages the encoder's sharpness to deblur the semantic boundary within the decoder output, while exploiting the decoder's spatial consistency to denoise the encoder's features. This mutual refinement process is stabilized by a warm-up phase with a slot regularization objective that initially allocates distinct entities per slot. By bridging the representational gap between the encoder and decoder, SRL achieves state-of-the-art results on video object-centric learning benchmarks. Codes are available at https://github.com/hynnsk/SRL.
Abstract（参考訳）: 教師なしのオブジェクト中心学習モデル、特にスロットベースのアーキテクチャは、複雑なシーンを分解する大きな可能性を示してきた。しかし、その再構成に基づくトレーニングへの依存は、エンコーダのシャープで高周波のアテンションマップとデコーダの空間的に一貫性があるが、ぼやけた再構築マップとの根本的な矛盾を生じさせる。エンコーダからのノイズの多い特徴写像は、デコーダを平均的な確率で強制し、ぼやけた出力を発生させ、一方、ぼやけた再構成マップから計算された勾配は、エンコーダの特徴を監督するために必要な高周波の詳細を欠いている。このサイクルを打破するために、私たちは、エンコーダとデコーダが相互に洗練する活発なサイクルを確立するSynergistic Representation Learning (SRL)を導入します。 SRLはエンコーダのシャープさを利用してデコーダ出力内のセマンティック境界を曖昧にし、デコーダの空間的一貫性を利用してエンコーダの特徴を識別する。この相互改善プロセスは、当初スロット毎に異なるエンティティを割り当てるスロット正規化目標を有するウォームアップフェーズによって安定化される。エンコーダとデコーダの間の表現的ギャップを埋めることで、SRLはビデオオブジェクト中心の学習ベンチマークで最先端の結果を得る。コードはhttps://github.com/hynnsk/SRL.comで公開されている。

論文の概要: From Vicious to Virtuous Cycles: Synergistic Representation Learning for Unsupervised Video Object-Centric Learning

関連論文リスト