Related papers: A Survey on Open Information Extraction from Rule-based Model to Large Language Model

A Survey on Open Information Extraction from Rule-based Model to Large Language Model

URL: http://arxiv.org/abs/2208.08690v7
Date: Wed, 23 Oct 2024 19:36:50 GMT
Title: A Survey on Open Information Extraction from Rule-based Model to Large Language Model
Authors: Pai Liu, Wenyang Gao, Wenjie Dong, Lin Ai, Ziwei Gong, Songfang Huang, Zongsheng Li, Ehsan Hoque, Julia Hirschberg, Yue Zhang,
Abstract summary: Open Information Extraction (OpenIE) represents a crucial NLP task aimed at deriving structured information from unstructured text. This survey paper provides an overview of OpenIE technologies spanning from 2007 to 2024, emphasizing a chronological perspective. The paper categorizes OpenIE approaches into rule-based, neural, and pre-trained large language models, discussing each within a chronological framework.
Score: 29.017823043117144
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Open Information Extraction (OpenIE) represents a crucial NLP task aimed at deriving structured information from unstructured text, unrestricted by relation type or domain. This survey paper provides an overview of OpenIE technologies spanning from 2007 to 2024, emphasizing a chronological perspective absent in prior surveys. It examines the evolution of task settings in OpenIE to align with the advances in recent technologies. The paper categorizes OpenIE approaches into rule-based, neural, and pre-trained large language models, discussing each within a chronological framework. Additionally, it highlights prevalent datasets and evaluation metrics currently in use. Building on this extensive review, the paper outlines potential future directions in terms of datasets, information sources, output formats, methodologies, and evaluation metrics.

Related papers

Scaling Beyond Context: A Survey of Multimodal Retrieval-Augmented Generation for Document Understanding [61.36285696607487]
Document understanding is critical for applications from financial analysis to scientific discovery.<n>Current approaches, whether OCR-based pipelines feeding Large Language Models (LLMs) or native Multimodal LLMs (MLLMs) face key limitations.<n>Retrieval-Augmented Generation (RAG) helps ground models in external data, but documents' multimodal nature, combining text, tables, charts, and layout, demands a more advanced paradigm: Multimodal RAG.
arXiv Detail & Related papers (2025-10-17T02:33:16Z)
Challenges in Expanding Portuguese Resources: A View from Open Information Extraction [0.774971301405295]
We present a high-quality manually annotated corpus for Open Information Extraction in the Portuguese language. We discuss the challenges encountered in the annotation process, propose a set of structural and contextual annotation rules, and validate our corpus.
arXiv Detail & Related papers (2025-01-21T03:08:37Z)
Towards Robust Evaluation: A Comprehensive Taxonomy of Datasets and Metrics for Open Domain Question Answering in the Era of Large Language Models [0.0]
Open Domain Question Answering (ODQA) within natural language processing involves building systems that answer factual questions using large-scale knowledge corpora. High-quality datasets are used to train models on realistic scenarios. Standardized metrics facilitate comparisons between different ODQA systems.
arXiv Detail & Related papers (2024-06-19T05:43:02Z)
From Pixels to Insights: A Survey on Automatic Chart Understanding in the Era of Large Foundation Models [98.41645229835493]
Data visualization in the form of charts plays a pivotal role in data analysis, offering critical insights and aiding in informed decision-making. Large foundation models, such as large language models, have revolutionized various natural language processing tasks. This survey paper serves as a comprehensive resource for researchers and practitioners in the fields of natural language processing, computer vision, and data analysis.
arXiv Detail & Related papers (2024-03-18T17:57:09Z)
Large Language Models for Generative Information Extraction: A Survey [89.71273968283616]
Large Language Models (LLMs) have demonstrated remarkable capabilities in text understanding and generation. We present an extensive overview by categorizing these works in terms of various IE subtasks and techniques. We empirically analyze the most advanced methods and discover the emerging trend of IE tasks with LLMs.
arXiv Detail & Related papers (2023-12-29T14:25:22Z)
Deep learning for table detection and structure recognition: A survey [49.09628624903334]
The goal of this survey is to provide a profound comprehension of the major developments in the field of Table Detection. We provide an analysis of both classic and new applications in the field. The datasets and source code of the existing models are organized to provide the reader with a compass on this vast literature.
arXiv Detail & Related papers (2022-11-15T19:42:27Z)
A Survey on Neural Open Information Extraction: Current Status and Future Directions [87.30702606041407]
Open Information Extraction (OpenIE) facilitates domain-independent discovery of relational facts from large corpora. We provide an overview of the-state-of-the-art neural OpenIE models, their key design decisions, strengths and weakness.
arXiv Detail & Related papers (2022-05-24T02:24:55Z)
Document AI: Benchmarks, Models and Applications [35.46858492311289]
Document AI refers to the techniques for automatically reading, understanding, and analyzing business documents. In recent years, the popularity of deep learning technology has greatly advanced the development of Document AI. This paper briefly reviews some of the representative models, tasks, and benchmark datasets.
arXiv Detail & Related papers (2021-11-16T16:43:07Z)
Deep Learning Schema-based Event Extraction: Literature Review and Current Trends [60.29289298349322]
Event extraction technology based on deep learning has become a research hotspot. This paper fills the gap by reviewing the state-of-the-art approaches, focusing on deep learning-based models.
arXiv Detail & Related papers (2021-07-05T16:32:45Z)
Retrieving and Reading: A Comprehensive Survey on Open-domain Question Answering [62.88322725956294]
We review the latest research trends in OpenQA, with particular attention to systems that incorporate neural MRC techniques. We introduce modern OpenQA architecture named Retriever-Reader'' and analyze the various systems that follow this architecture. We then discuss key challenges to developing OpenQA systems and offer an analysis of benchmarks that are commonly used.
arXiv Detail & Related papers (2021-01-04T04:47:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.