Fugu-MT 論文翻訳(概要): Trace Anything: Representing Any Video in 4D via Trajectory Fields

論文の概要: Trace Anything: Representing Any Video in 4D via Trajectory Fields

arxiv url: http://arxiv.org/abs/2510.13802v1
Date: Wed, 15 Oct 2025 17:59:04 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-16 20:13:28.804652
Title: Trace Anything: Representing Any Video in 4D via Trajectory Fields
Title（参考訳）: トレーサリー・フィールドで4D動画を映し出す「Trace Anything」
Authors: Xinhang Liu, Yuxi Xiao, Donny Y. Chen, Jiashi Feng, Yu-Wing Tai, Chi-Keung Tang, Bingyi Kang,
Abstract要約: 軌道場 (Trajectory Field) は、各フレーム内の各ピクセルに時間の連続した3次元軌跡関数を割り当てる密集写像である。我々は,1つのフィードフォワードパスで軌道場全体を予測するニューラルネットワークであるTrace Anythingを紹介する。私たちは、新しいプラットフォームからのデータを含む大規模な4Dデータに基づいて、Trace Anythingモデルをトレーニングしました。
参考スコア（独自算出の注目度）: 98.85848134960172
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Effective spatio-temporal representation is fundamental to modeling, understanding, and predicting dynamics in videos. The atomic unit of a video, the pixel, traces a continuous 3D trajectory over time, serving as the primitive element of dynamics. Based on this principle, we propose representing any video as a Trajectory Field: a dense mapping that assigns a continuous 3D trajectory function of time to each pixel in every frame. With this representation, we introduce Trace Anything, a neural network that predicts the entire trajectory field in a single feed-forward pass. Specifically, for each pixel in each frame, our model predicts a set of control points that parameterizes a trajectory (i.e., a B-spline), yielding its 3D position at arbitrary query time instants. We trained the Trace Anything model on large-scale 4D data, including data from our new platform, and our experiments demonstrate that: (i) Trace Anything achieves state-of-the-art performance on our new benchmark for trajectory field estimation and performs competitively on established point-tracking benchmarks; (ii) it offers significant efficiency gains thanks to its one-pass paradigm, without requiring iterative optimization or auxiliary estimators; and (iii) it exhibits emergent abilities, including goal-conditioned manipulation, motion forecasting, and spatio-temporal fusion. Project page: https://trace-anything.github.io/.
Abstract（参考訳）: ビデオのモデリング、理解、予測には、効果的な時空間表現が不可欠である。ビデオの原子単位であるピクセルは、時間の経過とともに連続した3D軌道を辿り、ダイナミクスの原始的な要素として機能する。この原理に基づいて,任意の映像をトラジェクティブ・フィールドとして表現し,フレーム毎に連続的な3次元トラジェクトリ関数を割り当てる密集写像を提案する。この表現により、単一フィードフォワードパスにおける軌道場全体を予測するニューラルネットワークであるTrace Anythingを導入する。具体的には,各フレームの各ピクセルに対して,軌道をパラメータ化する制御点の集合(B-スプライン)を予測し,任意のクエリ時に3D位置を求める。私たちは、新しいプラットフォームからのデータを含む大規模な4Dデータに基づいてTrace Anythingモデルをトレーニングしました。 (i)Trace Anythingは、軌道場推定のための新しいベンチマークで最先端のパフォーマンスを達成し、確立された点追跡ベンチマーク上で競争的に実行します。 (II)反復最適化や補助推定器を必要とせず、ワンパスパラダイムによる大幅な効率向上を提供する。三目標条件付き操作、運動予測、時空間融合等の創発的能力を示すこと。プロジェクトページ: https://trace-anything.github.io/.com

論文の概要: Trace Anything: Representing Any Video in 4D via Trajectory Fields

関連論文リスト