STRUCTSENSE: A Task-Agnostic Agentic Framework for Structured Information Extraction with Human-In-The-Loop Evaluation and Benchmarking
- URL: http://arxiv.org/abs/2507.03674v1
- Date: Fri, 04 Jul 2025 15:51:07 GMT
- Title: STRUCTSENSE: A Task-Agnostic Agentic Framework for Structured Information Extraction with Human-In-The-Loop Evaluation and Benchmarking
- Authors: Tek Raj Chhetri, Yibei Chen, Puja Trivedi, Dorota Jarecka, Saif Haobsh, Patrick Ray, Lydia Ng, Satrajit S. Ghosh,
- Abstract summary: StructSense is a modular, task-agnostic, open-source framework for structured information extraction built on Large Language Models.<n>It is guided by domain-specific symbolic knowledge enabling it encoded complex domain content effectively.<n>We demonstrate that StructSense can overcome both the limitations of domain sensitivity and the lack of cross-task generalizability.
- Score: 2.355572228890207
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The ability to extract structured information from unstructured sources-such as free-text documents and scientific literature-is critical for accelerating scientific discovery and knowledge synthesis. Large Language Models (LLMs) have demonstrated remarkable capabilities in various natural language processing tasks, including structured information extraction. However, their effectiveness often diminishes in specialized, domain-specific contexts that require nuanced understanding and expert-level domain knowledge. In addition, existing LLM-based approaches frequently exhibit poor transferability across tasks and domains, limiting their scalability and adaptability. To address these challenges, we introduce StructSense, a modular, task-agnostic, open-source framework for structured information extraction built on LLMs. StructSense is guided by domain-specific symbolic knowledge encoded in ontologies, enabling it to navigate complex domain content more effectively. It further incorporates agentic capabilities through self-evaluative judges that form a feedback loop for iterative refinement, and includes human-in-the-loop mechanisms to ensure quality and validation. We demonstrate that StructSense can overcome both the limitations of domain sensitivity and the lack of cross-task generalizability, as shown through its application to diverse neuroscience information extraction tasks.
Related papers
- Leveraging Large Language Models for Tacit Knowledge Discovery in Organizational Contexts [0.4499833362998487]
We propose an agent-based framework to iteratively reconstruct dataset descriptions through interactions with employees.<n>Our results show that the agent achieves 94.9% full-knowledge recall, with self-critical feedback scores strongly correlating with external literature critic scores.<n>These findings highlight the agent's ability to navigate organizational complexity and capture fragmented knowledge that would otherwise remain inaccessible.
arXiv Detail & Related papers (2025-07-04T21:09:32Z) - KnowCoder-V2: Deep Knowledge Analysis [64.63893361811968]
We propose a textbfKnowledgeable textbfDeep textbfResearch (textbfKDR) framework that empowers deep research with deep knowledge analysis capability.<n>It introduces an independent knowledge organization phase to preprocess large-scale, domain-relevant data into systematic knowledge offline.<n>It then extends deep research with an additional kind of reasoning steps that perform complex knowledge computation in an online manner.
arXiv Detail & Related papers (2025-06-07T18:01:25Z) - Structural Entropy Guided Agent for Detecting and Repairing Knowledge Deficiencies in LLMs [11.724887822269528]
Large language models (LLMs) have achieved unprecedented performance by leveraging vast pretraining corpora.<n>Their performance remains suboptimal in knowledge-intensive domains such as medicine and scientific research.<n>We propose a novel Structural Entropy-guided Knowledge Navigator (SENATOR) framework that addresses the intrinsic knowledge deficiencies of LLMs.
arXiv Detail & Related papers (2025-05-12T02:21:36Z) - StructRAG: Boosting Knowledge Intensive Reasoning of LLMs via Inference-time Hybrid Information Structurization [94.31508613367296]
Retrieval-augmented generation (RAG) is a key means to effectively enhance large language models (LLMs)
We propose StructRAG, which can identify the optimal structure type for the task at hand, reconstruct original documents into this structured format, and infer answers based on the resulting structure.
Experiments show that StructRAG achieves state-of-the-art performance, particularly excelling in challenging scenarios.
arXiv Detail & Related papers (2024-10-11T13:52:44Z) - GIVE: Structured Reasoning of Large Language Models with Knowledge Graph Inspired Veracity Extrapolation [108.2008975785364]
Graph Inspired Veracity Extrapolation (GIVE) is a novel reasoning method that merges parametric and non-parametric memories to improve accurate reasoning with minimal external input.<n>GIVE guides the LLM agent to select the most pertinent expert data (observe), engage in query-specific divergent thinking (reflect), and then synthesize this information to produce the final output (speak)
arXiv Detail & Related papers (2024-10-11T03:05:06Z) - Domain-Specific Retrieval-Augmented Generation Using Vector Stores, Knowledge Graphs, and Tensor Factorization [7.522493227357079]
Large Language Models (LLMs) are pre-trained on large-scale corpora.
LLMs suffer from hallucinations, knowledge cut-offs, and lack of knowledge attributions.
We introduce SMART-SLIC, a highly domain-specific LLM framework.
arXiv Detail & Related papers (2024-10-03T17:40:55Z) - Neurosymbolic AI approach to Attribution in Large Language Models [5.3454230926797734]
Neurosymbolic AI (NesyAI) combines the strengths of neural networks with structured symbolic reasoning.
This paper explores how NesyAI frameworks can enhance existing attribution models, offering more reliable, interpretable, and adaptable systems.
arXiv Detail & Related papers (2024-09-30T02:20:36Z) - Pruning neural network models for gene regulatory dynamics using data and domain knowledge [24.670514977455202]
We propose DASH, a framework that guides network pruning by using domain-specific structural information in model fitting.
We show that DASH, using knowledge about gene interaction partners within the putative regulatory network, outperforms general pruning methods by a large margin.
arXiv Detail & Related papers (2024-03-05T23:02:55Z) - DIVKNOWQA: Assessing the Reasoning Ability of LLMs via Open-Domain
Question Answering over Knowledge Base and Text [73.68051228972024]
Large Language Models (LLMs) have exhibited impressive generation capabilities, but they suffer from hallucinations when relying on their internal knowledge.
Retrieval-augmented LLMs have emerged as a potential solution to ground LLMs in external knowledge.
arXiv Detail & Related papers (2023-10-31T04:37:57Z) - Towards Verifiable Generation: A Benchmark for Knowledge-aware Language Model Attribution [48.86322922826514]
This paper defines a new task of Knowledge-aware Language Model Attribution (KaLMA)
First, we extend attribution source from unstructured texts to Knowledge Graph (KG), whose rich structures benefit both the attribution performance and working scenarios.
Second, we propose a new Conscious Incompetence" setting considering the incomplete knowledge repository.
Third, we propose a comprehensive automatic evaluation metric encompassing text quality, citation quality, and text citation alignment.
arXiv Detail & Related papers (2023-10-09T11:45:59Z) - Structural and Statistical Texture Knowledge Distillation for Semantic Segmentation [72.19108372163868]
We propose a novel Structural and Statistical Texture Knowledge Distillation (SSTKD) framework for semantic segmentation.<n>For structural texture knowledge, we introduce a Contourlet Decomposition Module (CDM) that decomposes low-level features.<n>For statistical texture knowledge, we propose a Denoised Texture Intensity Equalization Module (DTIEM) to adaptively extract and enhance statistical texture knowledge.
arXiv Detail & Related papers (2023-05-06T06:01:11Z) - A Dependency Syntactic Knowledge Augmented Interactive Architecture for
End-to-End Aspect-based Sentiment Analysis [73.74885246830611]
We propose a novel dependency syntactic knowledge augmented interactive architecture with multi-task learning for end-to-end ABSA.
This model is capable of fully exploiting the syntactic knowledge (dependency relations and types) by leveraging a well-designed Dependency Relation Embedded Graph Convolutional Network (DreGcn)
Extensive experimental results on three benchmark datasets demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2020-04-04T14:59:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.