Fugu-MT 論文翻訳(概要): Transductive Visual Programming: Evolving Tool Libraries from Experience for Spatial Reasoning

論文の概要: Transductive Visual Programming: Evolving Tool Libraries from Experience for Spatial Reasoning

arxiv url: http://arxiv.org/abs/2512.20934v1
Date: Wed, 24 Dec 2025 04:30:21 GMT
ステータス: 翻訳完了
システム内更新日: 2025-12-25 19:43:21.676355
Title: Transductive Visual Programming: Evolving Tool Libraries from Experience for Spatial Reasoning
Title（参考訳）: Transductive Visual Programming: 空間推論の経験からツールライブラリを進化させる
Authors: Shengguang Wu, Xiaohan Wang, Yuhui Zhang, Hao Zhu, Serena Yeung-Levy,
Abstract要約: 提案するTransductive Visual Programming (TVP, Transductive Visual Programming) は、投機ではなく、独自の経験から新しいツールを構築する新しいフレームワークである。 TVPは最先端のパフォーマンスを達成し、GPT-4oを22%上回り、以前の最高のビジュアルプログラミングシステムを11%上回っている。私たちの研究は、自己進化型ビジュアルプログラミングエージェントを構築するための強力なパラダイムとして、経験駆動型トランスダクティブツールの作成を確立します。
参考スコア（独自算出の注目度）: 63.071280297939005
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Spatial reasoning in 3D scenes requires precise geometric calculations that challenge vision-language models. Visual programming addresses this by decomposing problems into steps calling specialized tools, yet existing methods rely on either fixed toolsets or speculative tool induction before solving problems, resulting in suboptimal programs and poor utilization of induced tools. We present Transductive Visual Programming (TVP), a novel framework that builds new tools from its own experience rather than speculation. TVP first solves problems using basic tools while accumulating experiential solutions into an Example Library, then abstracts recurring patterns from these programs into reusable higher-level tools for an evolving Tool Library. This allows TVP to tackle new problems with increasingly powerful tools learned from experience. On Omni3D-Bench, TVP achieves state-of-the-art performance, outperforming GPT-4o by 22% and the previous best visual programming system by 11%. Our transductively learned tools are used 5x more frequently as core program dependency than inductively created ones, demonstrating more effective tool discovery and reuse. The evolved tools also show strong generalization to unseen spatial tasks, achieving superior performance on benchmarks from SpatialScore-Hard collection without any testset-specific modification. Our work establishes experience-driven transductive tool creation as a powerful paradigm for building self-evolving visual programming agents that effectively tackle challenging spatial reasoning tasks. We release our code at https://transductive-visualprogram.github.io/.
Abstract（参考訳）: 3次元シーンにおける空間的推論は、視覚言語モデルに挑戦する正確な幾何学的計算を必要とする。ビジュアルプログラミングは、問題を特殊なツールを呼び出すステップに分解することでこの問題に対処するが、既存の手法は、問題を解決する前に固定されたツールセットまたは投機的なツールインジェクションに依存する。提案するTransductive Visual Programming (TVP, Transductive Visual Programming) は、投機ではなく、独自の経験から新しいツールを構築する新しいフレームワークである。 TVPはまず、経験的なソリューションをサンプルライブラリに蓄積しながら、基本的なツールを使用して問題を解決し、その後、これらのプログラムからの繰り返しパターンを、進化するツールライブラリのための再利用可能な高レベルツールに抽象化する。これによってTVPは、経験から学んだ強力なツールによって、新たな問題に取り組むことができる。 Omni3D-Benchでは、TVPは最先端のパフォーマンスを達成し、GPT-4oを22%上回った。我々のトランスダクティブ学習ツールは、誘導学習ツールよりも5倍頻繁にコアプログラム依存として使われ、より効果的なツール発見と再利用を実証しています。進化したツールはまた、空間的なタスクが見えないように強力な一般化を示し、SpatialScore-Hardコレクションのベンチマークでテストセット固有の変更をせずに優れたパフォーマンスを実現している。私たちの研究は、空間推論の課題に効果的に取り組む自己進化型ビジュアルプログラミングエージェントを構築するための強力なパラダイムとして、経験駆動型トランスダクティブツールの作成を確立します。コードについてはhttps://transductive-visual programs.github.io/で公開しています。

論文の概要: Transductive Visual Programming: Evolving Tool Libraries from Experience for Spatial Reasoning

関連論文リスト