Fugu-MT 論文翻訳(概要): SADGE: Structure and Appearance Domain Gap Estimation of Synthetic and Real Data

論文の概要: SADGE: Structure and Appearance Domain Gap Estimation of Synthetic and Real Data

arxiv url: http://arxiv.org/abs/2605.22467v1
Date: Thu, 21 May 2026 13:27:15 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-22 20:14:18.572641
Title: SADGE: Structure and Appearance Domain Gap Estimation of Synthetic and Real Data
Title（参考訳）: SADGE:合成および実データの構造と出現領域ギャップ推定
Authors: Patryk Bartkowiak, Bartosz Kotrys, Dominik Michels, Soren Pirk, Wojtek Palubicki,
Abstract要約: 本稿では,一般的なコンピュータビジョンタスクのための合成画像データセットの性能を予測する量的類似度指標であるSADGEを提案する。合成画像と実画像の間で計算された外観と幾何学的類似度は、オブジェクト検出、セマンティックセグメンテーション、ポーズ推定における下流のパフォーマンスと相関する。我々は、幾何に基づく手法と外見に基づくアプローチを組み合わせて、すべてのベンチマークファミリでSADGEスコアを計算する。
参考スコア（独自算出の注目度）: 2.2798328260958063
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We propose SADGE, a quantitative similarity metric that predicts the performance of synthetic image datasets for common computer vision tasks without downstream model training. Estimating whether a synthetic dataset will lead to a model that performs well on real-world data remains a bottleneck in model development. Existing evaluation metrics (e.g., PSNR, FID, CLIP) primarily measure semantic alignment between real and synthetic images (Appearance Similarity Score). Less commonly, structural similarity between images is considered to assess the domain gap (Geometric Similarity Score). However, to the best of our knowledge there exists no studies that evaluate which similarity metric is the best downstream predictor for a given synthetic dataset. In this paper, we show over a wide variety of different synthetic datasets and downstream tasks that neither appearance nor geometry alone can reliably predict downstream performance; rather, it is their non-linear interplay that dictates synthetic data utility. Specifically, we measure how commonly used Appearance and Geometric Similarity metrics computed between synthetic and real images correlate with downstream performance in object detection, semantic segmentation, and pose estimation. Across five public synthetic-to-real benchmark families and 15 dataset-level variants (79k image pairs), SADGE achieves the strongest association with downstream transfer performance under both linear and rank-based criteria, reaching Pearson r=0.88 and Spearman rho=0.77. We compute for each combination of geometry-based methods and appearance-based approaches SADGE scores across all benchmark families. The best configuration is obtained by fusing DINOv3 appearance similarity with MASt3R geometric consistency through a constrained bilinear interaction, outperforming both the strongest geometry-only baseline and the strongest appearance-only baseline .
Abstract（参考訳）: ダウンストリームモデルトレーニングを使わずに、一般的なコンピュータビジョンタスクのための合成画像データセットの性能を予測する量的類似度指標であるSADGEを提案する。合成データセットが現実世界のデータでうまく機能するモデルに繋がるかどうかを推定することは、モデル開発のボトルネックのままである。既存の評価指標(例えば、PSNR、FID、CLIP)は、主に実画像と合成画像のセマンティックアライメントを測定する(Appearance similarity Score)。より一般的には、画像間の構造的類似性は領域ギャップ(幾何学的類似度スコア)を評価すると考えられる。しかし、我々の知る限りでは、任意の合成データセットのどの類似度指標が最適な下流予測因子であるかを評価する研究は存在しない。本稿では、様々な合成データセットや下流タスクについて、外観や幾何学だけでは下流のパフォーマンスを確実に予測できず、むしろ合成データの有用性を規定する非線形相互作用であることを示す。具体的には、オブジェクト検出、セマンティックセグメンテーション、ポーズ推定において、合成画像と実画像の間で計算される外観と幾何学的類似度が下流性能とどのように相関しているかを測定する。 SADGEは5つの一般の総合的なベンチマークファミリと15のデータセットレベルの変種(79kイメージペア)にまたがって、線形およびランクベースの基準の下で下流転送性能と最も強い関係を達成し、ピアソン r=0.88 とスピアマン rho=0.77 に到達した。我々は、幾何に基づく手法と外見に基づくアプローチを組み合わせて、すべてのベンチマークファミリでSADGEスコアを計算する。最良の構成は、DINOv3の外観とMASt3Rの幾何学的整合性を制約された双線形相互作用により融合させることにより得られる。

論文の概要: SADGE: Structure and Appearance Domain Gap Estimation of Synthetic and Real Data

関連論文リスト