Fugu-MT 論文翻訳(概要): SODIUM: From Open Web Data to Queryable Databases

論文の概要: SODIUM: From Open Web Data to Queryable Databases

arxiv url: http://arxiv.org/abs/2603.18447v1
Date: Thu, 19 Mar 2026 03:17:56 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-20 17:19:05.936145
Title: SODIUM: From Open Web Data to Queryable Databases
Title（参考訳）: SODium: オープンWebデータからクエリ可能なデータベースへ
Authors: Chuxuan Hu, Philip Li, Maxwell Yang, Daniel Kang,
Abstract要約: 我々はSODiumタスクを形式化し、Webなどのオープンドメインを潜在データベースとして概念化する。既存のシステムはSODiumのタスクに苦労しており、最強のベースラインは46.5%の精度しか達成していない。我々はWebエクスプローラとキャッシュマネージャで構成されるマルチエージェントシステムであるSODium-Agentを開発した。 SODium-AgentはSODium-Benchで91.1%の精度を達成し、最強のベースラインを約2倍、最も弱いものを最大73倍に上回っている。
参考スコア（独自算出の注目度）: 7.809458664810863
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: During research, domain experts often ask analytical questions whose answers require integrating data from a wide range of web sources. Thus, they must spend substantial effort searching, extracting, and organizing raw data before analysis can begin. We formalize this process as the SODIUM task, where we conceptualize open domains such as the web as latent databases that must be systematically instantiated to support downstream querying. Solving SODIUM requires (1) conducting in-depth and specialized exploration of the open web, which is further strengthened by (2) exploiting structural correlations for systematic information extraction and (3) integrating collected information into coherent, queryable database instances. To quantify the challenges in automating SODIUM, we construct SODIUM-Bench, a benchmark of 105 tasks derived from published academic papers across 6 domains, where systems are tasked with exploring the open web to collect and aggregate data from diverse sources into structured tables. Existing systems struggle with SODIUM tasks: we evaluate 6 advanced AI agents on SODIUM-Bench, with the strongest baseline achieving only 46.5% accuracy. To bridge this gap, we develop SODIUM-Agent, a multi-agent system composed of a web explorer and a cache manager. Powered by our proposed ATP-BFS algorithm and optimized through principled management of cached sources and navigation paths, SODIUM-Agent conducts deep and comprehensive web exploration and performs structurally coherent information extraction. SODIUM-Agent achieves 91.1% accuracy on SODIUM-Bench, outperforming the strongest baseline by approximately 2 times and the weakest by up to 73 times.
Abstract（参考訳）: 調査中、ドメインの専門家は、広範囲のWebソースからのデータを統合する必要があるという分析的な質問をすることが多い。そのため、分析を開始する前に、生データの検索、抽出、整理にかなりの時間を費やしなければならない。我々はこのプロセスをSODiumタスクとして形式化し、ダウンストリームクエリをサポートするために体系的にインスタンス化する必要がある潜在データベースとしてWebのようなオープンドメインを概念化する。 SODium の解決には,(1) オープン Web の奥深く及び専門的な探索を行うことが必要であり,(2) 体系的情報抽出のための構造的相関を利用して,(3) 収集した情報を一貫性のあるクエリ可能なデータベースインスタンスに統合することによってさらに強化される。 SODium-Benchは、6つのドメインにわたる学術論文から得られた105のタスクのベンチマークであり、システムは様々な情報源からデータを収集して構造化テーブルに集約するオープンウェブを探索する。我々はSODium-Bench上で6つの高度なAIエージェントを評価し、最強のベースラインは46.5%の精度しか達成していない。このギャップを埋めるために、WebエクスプローラとキャッシュマネージャからなるマルチエージェントシステムであるSODium-Agentを開発した。 SODium-Agentは,提案したATP-BFSアルゴリズムを用いて,キャッシュされたソースとナビゲーションパスの管理を原則として最適化し,深層かつ包括的なWeb探索を行い,構造的に一貫性のある情報抽出を行う。 SODium-AgentはSODium-Benchで91.1%の精度を達成し、最強のベースラインを約2倍、最も弱いものを最大73倍に上回っている。

論文の概要: SODIUM: From Open Web Data to Queryable Databases

関連論文リスト