Fugu-MT 論文翻訳(概要): CodeMMR: Bridging Natural Language, Code, and Image for Unified Retrieval

論文の概要: CodeMMR: Bridging Natural Language, Code, and Image for Unified Retrieval

arxiv url: http://arxiv.org/abs/2604.15663v1
Date: Fri, 17 Apr 2026 03:35:35 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-20 22:00:19.722245
Title: CodeMMR: Bridging Natural Language, Code, and Image for Unified Retrieval
Title（参考訳）: CodeMMR: 統一検索のための自然言語、コード、イメージのブリッジ
Authors: Jiahui Geng, Qing Li, Fengyu Cai, Fakhri Karray,
Abstract要約: コード検索は情報検索(IR)として構成され、現代のソフトウェア工学の基盤となり、検索強化世代(RAG)の力を強めている。既存のコードIRモデルは、主にテキスト中心であり、Webインターフェース、データ、SVG、スキーマ図、視覚化などのプログラミングアーティファクトに固有の視覚的および構造的側面を見落としていることが多い。 5つの視覚領域、8つのプログラミング言語、11のライブラリにまたがるマルチモーダルコードIRを評価するための最初のベンチマークであるMMCoIRを紹介し、広範囲な評価を通じてタスクの課題を示す。次に、自然言語とコードとを結合して埋め込む統合検索モデルであるCodeMMRを提案する。
参考スコア（独自算出の注目度）: 16.651846645091315
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Code search, framed as information retrieval (IR), underpins modern software engineering and increasingly powers retrieval-augmented generation (RAG), improving code discovery, reuse, and the reliability of LLM-based coding. Yet existing code IR models remain largely text-centric and often overlook the visual and structural aspects inherent in programming artifacts such as web interfaces, data visualizations, SVGs, schematic diagrams, and UML. To bridge this gap, we introduce MMCoIR, the first comprehensive benchmark for evaluating multimodal code IR across five visual domains, eight programming languages, eleven libraries, and show the challenge of the task through extensive evaluation. Therefore, we then propose CodeMMR, a unified retrieval model that jointly embeds natural language, code, and images into a shared semantic space through instruction-based multimodal alignment. CodeMMR achieves strong generalization across modalities and languages, outperforming competitive baselines (e.g., UniIR, GME, VLM2Vec) by an average of 10 points on nDCG@10. Moreover, integrating CodeMMR into RAG enhances code generation fidelity and visual grounding on unseen code generation tasks, underscoring the potential of multimodal retrieval as a core enabler for next-generation intelligent programming systems. Datasets are available at HuggingFace.
Abstract（参考訳）: コード検索は情報検索(IR)と呼ばれ、現代のソフトウェア工学の基盤となり、検索強化世代(RAG)をますます強化し、コードの発見、再利用、LLMベースの符号化の信頼性を改善している。しかし、既存のコードIRモデルは、主にテキスト中心であり、Webインターフェース、データ視覚化、SVG、スキーマ図、UMLといったプログラミングアーティファクトに固有の視覚的および構造的側面を見落としていることが多い。このギャップを埋めるために,5つの視覚領域,8つのプログラミング言語,11のライブラリにまたがるマルチモーダルコードIRを評価するための,最初の総合的なベンチマークであるMMCoIRを紹介する。そこで我々は,自然言語,コード,イメージを,命令ベースのマルチモーダルアライメントを通じて共有意味空間に共同で埋め込む統合検索モデルであるCodeMMRを提案する。 CodeMMRは、モダリティと言語をまたいだ強力な一般化を実現し、nDCG@10上で平均10ポイントの競争ベースライン(例えば、UniIR、GME、VLM2Vec)を上回っている。さらに、CodeMMRをRAGに組み込むことで、コード生成の忠実さと、見えないコード生成タスクの視覚的基盤が向上し、次世代のインテリジェントプログラミングシステムのコアイネーラとしてのマルチモーダル検索の可能性が強調される。データセットはHuggingFaceで入手できる。

論文の概要: CodeMMR: Bridging Natural Language, Code, and Image for Unified Retrieval

関連論文リスト