Fugu-MT 論文翻訳(概要): scShapeBench: Discovering geometry from high dimensional scRNAseq data

論文の概要: scShapeBench: Discovering geometry from high dimensional scRNAseq data

arxiv url: http://arxiv.org/abs/2605.12662v1
Date: Tue, 12 May 2026 19:10:38 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-14 23:30:27.632165
Title: scShapeBench: Discovering geometry from high dimensional scRNAseq data
Title（参考訳）: scShapeBench: 高次元の scRNAseq データから幾何を発見する
Authors: Andrew J Steindl, João Felipe Rocha, Brian Tshilengi Di Bassinga, Zachary Warren, Matthew Scicluna, César Miguel Valdez Córdova, Shabarni Gupta, Leire Torices, Daniel Neumann, Timothy J. Mann, Ihuan Gunawan, Dhananjay Bhaskar, John G Lock, Christine L Chaffer, Guy Wolf, Smita Krishnaswamy,
Abstract要約: 単セルデータセットにおける形状検出のためのベンチマークデータセットである scShapeBench を紹介する。合成データセットは、制御された分散を伴う地層構造スケルトングラフからサンプリングされる。実際のシングルセルデータセットは、さまざまなソースからキュレーションされ、専門家によって注釈付けされる。
参考スコア（独自算出の注目度）: 8.845081957844453
License: http://creativecommons.org/licenses/by/4.0/
Abstract: High-dimensional point cloud data arise across many scientific domains, especially single-cell biology. The shapes or topologies of these datasets determine the types of information that can be extracted. For example, clustered data supports cell-type identification, trajectory structures support transition analysis, and archetypal structures capture continua of cellular behaviors. Existing analysis pipelines often assume a specific shape. The standard Seurat pipeline combines UMAP visualization with Louvain clustering and therefore assumes clustered data, while tools such as Monocle and SPADE assume tree-like structures, and flow-based models such as MIOFlow and Conditional Flow Matching target trajectories. Choosing which pipeline to apply is therefore often left to bioinformaticians who visually inspect datasets before selecting an analysis strategy. With the rise of agentic AI scientists, automating shape detection is increasingly important for selecting downstream analysis pipelines. To address this problem, we introduce scShapeBench, a benchmark dataset for shape detection containing both synthetic and expert-annotated single-cell datasets. Synthetic datasets are sampled from ground-truth skeleton graphs with controlled variance. Real single-cell datasets are curated from diverse sources and annotated by experts into four categories: clusters, single trajectory, multi-branching, and archetypal. We additionally introduce scReebTower, a baseline method that uses diffusion geometry to extract Reeb graphs and connect visualization with pipeline selection. We provide topology-aware evaluation metrics and compare scReebTower against PAGA and Mapper on synthetic and real data. Our results indicate that scReebTower outperforms existing baselines. Overall, our contributions span benchmarks, evaluation metrics, and a baseline for automated shape detection in single-cell data.
Abstract（参考訳）: 高次元の点雲データは、多くの科学領域、特に単細胞生物学にまたがる。これらのデータセットの形状やトポロジは、抽出できる情報のタイプを決定する。例えば、クラスタ化されたデータは細胞型同定をサポートし、軌道構造は遷移解析をサポートし、根尖構造は細胞の挙動の連続を捉えている。既存の分析パイプラインは、しばしば特定の形状を仮定する。標準的なSeuratパイプラインは、UMAP視覚化とLouvainクラスタリングを組み合わせることで、クラスタ化されたデータを想定する一方で、MonocleやSPADEといったツールがツリーのような構造を前提としており、MIOFlowやConditional Flow Matchingといったフローベースモデルがターゲットトラジェクトリを対象としています。したがって、どのパイプラインを適用するかは、分析戦略を選択する前にデータセットを視覚的に検査するバイオインフォマティクスに委ねられる。エージェントAI科学者の台頭により、下流分析パイプラインを選択する上で、形状検出の自動化がますます重要になっている。この問題に対処するために、合成および専門家が注釈付けした単一セルデータセットを含む形状検出のためのベンチマークデータセットである scShapeBench を導入する。合成データセットは、制御された分散を伴う地層構造スケルトングラフからサンプリングされる。実際のシングルセルデータセットは、さまざまなソースからキュレーションされ、専門家によって注釈付けされ、クラスタ、単一軌道、マルチブランチ、アーキティパルの4つのカテゴリに分類される。さらに,拡散幾何学を用いてReebグラフを抽出し,可視化とパイプライン選択を結合するベースライン手法である scReebTower を導入する。トポロジを意識した評価指標を提供し、合成データと実データで scReebTower と PAGA と Mapper を比較した。 scReebTowerは既存のベースラインよりも優れています。全体として、コントリビューションはベンチマーク、評価指標、単一セルデータの自動形状検出のベースラインに及びます。

論文の概要: scShapeBench: Discovering geometry from high dimensional scRNAseq data

関連論文リスト