Fugu-MT 論文翻訳(概要): Automated Extraction of Pharmacokinetic Parameters from Structured XML Scientific Articles: Enhancing Data Accessibility at Scale

論文の概要: Automated Extraction of Pharmacokinetic Parameters from Structured XML Scientific Articles: Enhancing Data Accessibility at Scale

arxiv url: http://arxiv.org/abs/2604.21063v1
Date: Wed, 22 Apr 2026 20:09:50 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-24 14:40:06.16762
Title: Automated Extraction of Pharmacokinetic Parameters from Structured XML Scientific Articles: Enhancing Data Accessibility at Scale
Title（参考訳）: 構造化XML科学論文からの薬物動態パラメータの自動抽出:大規模データのアクセシビリティ向上
Authors: Remya Ampadi Ramachandran, Lisa A. Tell, Sidharth Rai, Nuwan Millagaha Gedara, Hossein Sholehrasa, Jim E. Riviere, Majid Jaberi-Douraki,
Abstract要約: 薬理学では、PKデータの集中的、包括的、最新のリポジトリが存在しない。これは、必要な量的PKパラメータを全て集めるのに、時間がかかり、難しい作業になり得るため、R&Dにとって大きな課題となる。これにより、テーブルは科学または規制文書の重要な構成要素と情報要素の1つとなる。
参考スコア（独自算出の注目度）: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In the field of pharmacology, there is a notable absence of centralized, comprehensive, and up-to-date repositories of PK data. This poses a significant challenge for R&D as it can be a time-consuming and challenging task to collect all the required quantitative PK parameters from diverse scientific publications. This quantitative PK information is predominantly organized in tabular format, mostly available as XML, HTML, or PDF files within various online repositories and scientific publications, including supplementary materials. This makes tables one of the crucial components and information elements of scientific or regulatory documents as they are commonly utilized to present quantitative information. Extracting data from tables is typically a labor-intensive process, and alternative automated machine learning models may struggle to accurately detect and extract the relevant data due to the complex nature and diverse layouts of tabular data. The difficulty of information extraction and reading order detection is largely dependent on the structural complexity of the tables. Efforts to understand tables should prioritize capturing the content of table cells in a manner that aligns with how a human reader naturally comprehends the information. FARAD has been manually extracting tabular data and other information from literature and regulatory agencies for over 40 years. However, there is now an urgent need to automate this process due to the large volume of publications released daily. The accuracy of this task has become increasingly challenging, as manual extraction is tedious and prone to errors, especially given the staffing shortages we are currently facing. This necessitates the development of AI algorithms for table detection and extraction that are able to precisely handle cells organized according to the table structure, as indicated by column and/or row header information.
Abstract（参考訳）: 薬理学の分野では、PKデータの集中的、包括的、最新のリポジトリが欠如している。これはR&Dにとって重要な課題であり、様々な科学的出版物から必要な量的PKパラメータをすべて収集するのに時間がかかり、難しい課題である。この定量的PK情報は、主に表形式で整理され、主にXML、HTML、PDFファイルとして様々なオンラインリポジトリや、補足資料を含む科学出版物で利用可能である。これにより、テーブルは科学的または規制的な文書の重要な構成要素と情報要素の1つとなり、定量情報の提示に一般的に使用される。テーブルからデータを抽出することは、通常、労働集約的なプロセスであり、別の機械学習モデルでは、複雑な性質と多彩な表データのレイアウトのために、関連するデータを正確に検出し、抽出するのに苦労する場合がある。情報抽出と読み出し順序検出の難しさは、表の構造的複雑さに大きく依存する。テーブルを理解するための努力は、人間が自然に情報を理解する方法と整合して、テーブルセルの内容を取得することを優先すべきである。 FARADは40年以上にわたって、文献や規制機関から文書データやその他の情報を手作業で抽出してきた。しかし、毎日大量の出版物が発行されているため、このプロセスを自動化する必要がある。手作業による抽出が面倒で,特に現在直面している人員不足を考えると,このタスクの正確性はますます難しくなっている。これは、列や行のヘッダ情報によって示されるように、テーブル構造に従って組織されたセルを正確に処理できるテーブル検出と抽出のためのAIアルゴリズムの開発を必要とする。

論文の概要: Automated Extraction of Pharmacokinetic Parameters from Structured XML Scientific Articles: Enhancing Data Accessibility at Scale

関連論文リスト