Fugu-MT 論文翻訳(概要): An LLM Agent-Based Complex Semantic Table Annotation Approach

論文の概要: An LLM Agent-Based Complex Semantic Table Annotation Approach

arxiv url: http://arxiv.org/abs/2508.12868v1
Date: Mon, 18 Aug 2025 12:09:20 GMT
ステータス: 翻訳完了
システム内更新日: 2025-08-19 14:49:11.28001
Title: An LLM Agent-Based Complex Semantic Table Annotation Approach
Title（参考訳）: LLMエージェントを用いた複合意味表アノテーション手法
Authors: Yilin Geng, Shujing Wang, Chuan Wang, Keqing He, Yanfei Lv, Ying Wang, Zaiwen Feng, Xiaoying Bai,
Abstract要約: 本稿では,LLMを用いたカラム型エージェント手法を提案する。 CTAとCell Entity。 CEA。 ReActフレームワークに基づいて、調整されたプロンプトで5つの外部メトリクスを設計、実装します。冗長アノテーションを減らすためにLevenshtein距離を利用することで、時間コストの70%削減とLLMトークンの使用率の60%削減を実現した。
参考スコア（独自算出の注目度）: 13.427066390210538
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: The Semantic Table Annotation (STA) task, which includes Column Type Annotation (CTA) and Cell Entity Annotation (CEA), maps table contents to ontology entities and plays important roles in various semantic applications. However, complex tables often pose challenges such as semantic loss of column names or cell values, strict ontological hierarchy requirements, homonyms, spelling errors, and abbreviations, which hinder annotation accuracy. To address these issues, this paper proposes an LLM-based agent approach for CTA and CEA. We design and implement five external tools with tailored prompts based on the ReAct framework, enabling the STA agent to dynamically select suitable annotation strategies depending on table characteristics. Experiments are conducted on the Tough Tables and BiodivTab datasets from the SemTab challenge, which contain the aforementioned challenges. Our method outperforms existing approaches across various metrics. Furthermore, by leveraging Levenshtein distance to reduce redundant annotations, we achieve a 70% reduction in time costs and a 60% reduction in LLM token usage, providing an efficient and cost-effective solution for STA.
Abstract（参考訳）: 列型アノテーション(CTA)とセルエンティティアノテーション(CEA)を含むセマンティックテーブルアノテーション(STA)タスクは、テーブル内容をオントロジーエンティティにマップし、さまざまなセマンティックアプリケーションで重要な役割を果たす。しかし、複雑なテーブルは、列名やセル値のセマンティックな損失、厳密なオンロジカルな階層構造要件、ホモニム、スペルエラー、短縮といった課題を生じさせ、アノテーションの精度を損なう。これらの問題に対処するために,本論文では,CTA と CEA のための LLM ベースのエージェントアプローチを提案する。我々は、ReActフレームワークに基づいて、調整されたプロンプトを持つ5つの外部ツールを設計、実装し、STAエージェントがテーブル特性に応じて適切なアノテーション戦略を動的に選択できるようにする。実験は、前述の課題を含むSemTabチャレンジのTough TablesとBiodivTabデータセットで実施されている。提案手法は,様々な指標において既存手法より優れている。さらに, 冗長アノテーションの削減にLevenshtein距離を活用することで, 時間コストの70%削減, LLMトークンの使用率の60%削減を実現し, STAの効率的で費用対効果の高いソリューションを提供する。

論文の概要: An LLM Agent-Based Complex Semantic Table Annotation Approach

関連論文リスト