Fugu-MT 論文翻訳(概要): PointLLM-R: Enhancing 3D Point Cloud Reasoning via Chain-of-Thought

論文の概要: PointLLM-R: Enhancing 3D Point Cloud Reasoning via Chain-of-Thought

arxiv url: http://arxiv.org/abs/2605.22013v1
Date: Thu, 21 May 2026 05:19:51 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-22 16:35:42.105575
Title: PointLLM-R: Enhancing 3D Point Cloud Reasoning via Chain-of-Thought
Title（参考訳）: PointLLM-R:Chain-of-Thoughtによる3Dポイントクラウド推論の強化
Authors: Chaoqi Chen, Qile Xu, Wenjun Zhou, Hui Huang,
Abstract要約: チェイン・オブ・ソート(CoT)推論はLLMや画像ベースMLLMにおいて強い効果を示している。本研究では,3Dポイントクラウド理解に適した大規模CoTインスペクションを構築するためのデータ中心フレームワークを提案する。 PoCoTI上で微調整されたPointLLMは、推論可能な3Dマルチモーダル言語モデルであるPointLLM-Rを得る。
参考スコア（独自算出の注目度）: 17.13654442098613
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Understanding 3D point clouds through language remains a fundamental challenge in computer graphics and visual computing, due to the irregular structure of point cloud data and the lack of explicit reasoning in existing 3D multimodal models. While Chain-of-Thought (CoT) reasoning has shown strong effectiveness in LLMs and image-based MLLMs, its extension to 3D understanding remains largely underexplored. In this paper, we propose a data-centric framework for constructing large-scale CoT supervision tailored to 3D point cloud understanding. Our framework consists of a two-stage pipeline that first refines point-text instruction data via vision-language-model-based quality evaluation and reference-guided refinement, and then synthesizes high-quality reasoning paths through Human-in-the-Loop Prompt Optimization (HiLPO). Using this approach, we build PoCoTI, a CoT-enhanced point-text instruction-following dataset containing 55K samples with explicit reasoning paths. Fine-tuning PointLLM on PoCoTI yields PointLLM-R, a reasoning-capable 3D multimodal language model. Extensive experiments on generative 3D classification and captioning demonstrate that PointLLM-R achieves state-of-the-art performance and generalizes robustly to real-world scanned point clouds and multi-turn dialogue scenarios.
Abstract（参考訳）: ポイントクラウドデータの不規則な構造と、既存の3Dマルチモーダルモデルにおける明確な推論の欠如のため、言語による3Dポイントクラウドの理解は、コンピュータグラフィックスとビジュアルコンピューティングにおける根本的な課題である。 CoT(Chain-of-Thought)推論はLLMや画像ベースMLLMにおいて大きな効果を示したが、その3D理解への拡張は未解明のままである。本稿では,3Dポイントクラウド理解に適した大規模CoTインスペクションを構築するためのデータ中心フレームワークを提案する。我々のフレームワークは、まず視覚言語モデルに基づく品質評価と参照誘導による改善を通じてポイントテキスト命令データを洗練し、次にHuman-in-the-Loop Prompt Optimization (HiLPO)を通して高品質な推論経路を合成する2段階パイプラインで構成されている。このアプローチを用いることで、明示的な推論パスを持つ55Kサンプルを含むCoT強化のポイントテキスト追跡データセットであるPoCoTIを構築する。 PoCoTI上で微調整されたPointLLMは、推論可能な3Dマルチモーダル言語モデルであるPointLLM-Rを得る。生成3D分類とキャプションに関する大規模な実験により、PointLLM-Rは最先端のパフォーマンスを実現し、実世界のスキャンされた点雲とマルチターン対話シナリオに頑健に一般化することを示した。

論文の概要: PointLLM-R: Enhancing 3D Point Cloud Reasoning via Chain-of-Thought

関連論文リスト