Fugu-MT 論文翻訳(概要): Bridging Vision, Language, and Mathematics: Pictographic Character Reconstruction with Bézier Curves

論文の概要: Bridging Vision, Language, and Mathematics: Pictographic Character Reconstruction with Bézier Curves

arxiv url: http://arxiv.org/abs/2511.00076v1
Date: Wed, 29 Oct 2025 15:26:34 GMT
ステータス: 翻訳完了
システム内更新日: 2025-11-05 16:37:26.565693
Title: Bridging Vision, Language, and Mathematics: Pictographic Character Reconstruction with Bézier Curves
Title（参考訳）: ブリッジングビジョン、言語、数学:ベジエ曲線を用いた図形的文字再構成
Authors: Zihao Wan, Pau Tong Lin Xu, Fuwen Luo, Ziyue Wang, Peng Li, Yang Liu,
Abstract要約: 視覚形式とシンボル構造を組み合わせた画像文字は、この能力の理想的なテストケースを提供する。各文字はプリミティブの実行可能なプログラムによって表現される数学的領域において、この視覚的認識課題を定式化する。これはプログラム合成タスクとしてフレーム化され、VLMを訓練して幾何学的画像をB'ezier曲線からなるプログラムに分解する。
参考スコア（独自算出の注目度）: 10.069779545496266
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: While Vision-language Models (VLMs) have demonstrated strong semantic capabilities, their ability to interpret the underlying geometric structure of visual information is less explored. Pictographic characters, which combine visual form with symbolic structure, provide an ideal test case for this capability. We formulate this visual recognition challenge in the mathematical domain, where each character is represented by an executable program of geometric primitives. This is framed as a program synthesis task, training a VLM to decompile raster images into programs composed of B\'ezier curves. Our model, acting as a "visual decompiler", demonstrates performance superior to strong zero-shot baselines, including GPT-4o. The most significant finding is that when trained solely on modern Chinese characters, the model is able to reconstruct ancient Oracle Bone Script in a zero-shot context. This generalization provides strong evidence that the model acquires an abstract and transferable geometric grammar, moving beyond pixel-level pattern recognition to a more structured form of visual understanding.
Abstract（参考訳）: 視覚言語モデル(VLM)は強い意味的能力を示してきたが、視覚情報の幾何学的構造を解釈する能力は研究されていない。視覚形式とシンボル構造を組み合わせた画像文字は、この能力の理想的なテストケースを提供する。この視覚的認識課題を数学的領域で定式化し、各文字は幾何学的プリミティブの実行可能なプログラムによって表現される。これはプログラム合成タスクとしてフレーム化され、VLMを訓練してラスター画像をB\'ezier曲線からなるプログラムに分解する。我々のモデルは「視覚的デコンパイラ」として機能し、GPT-4oを含む強力なゼロショットベースラインよりも優れた性能を示す。最も重要な発見は、現代中国語の文字のみを訓練すると、古いOracle Bone Scriptをゼロショットのコンテキストで再構築できるということだ。この一般化は、モデルが抽象的かつ伝達可能な幾何学文法を取得し、ピクセルレベルのパターン認識からより構造化された視覚的理解へと移動するという強い証拠を与える。

論文の概要: Bridging Vision, Language, and Mathematics: Pictographic Character Reconstruction with Bézier Curves

関連論文リスト