CEntRE: A paragraph-level Chinese dataset for Relation Extraction among
Enterprises
- URL: http://arxiv.org/abs/2210.10581v1
- Date: Wed, 19 Oct 2022 14:22:10 GMT
- Title: CEntRE: A paragraph-level Chinese dataset for Relation Extraction among
Enterprises
- Authors: Peipei Liu, Hong Li, Zhiyu Wang, Yimo Ren, Jie Liu, Fei Lyu, Hongsong
Zhu, Limin Sun
- Abstract summary: Enterprise relation extraction aims to detect pairs of enterprise entities and identify the business relations between them from unstructured or semi-structured text data.
We introduce the CEntRE, a new dataset constructed from publicly available business news data with careful human annotation and intelligent data processing.
- Score: 11.596083874633
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Enterprise relation extraction aims to detect pairs of enterprise entities
and identify the business relations between them from unstructured or
semi-structured text data, and it is crucial for several real-world
applications such as risk analysis, rating research and supply chain security.
However, previous work mainly focuses on getting attribute information about
enterprises like personnel and corporate business, and pays little attention to
enterprise relation extraction. To encourage further progress in the research,
we introduce the CEntRE, a new dataset constructed from publicly available
business news data with careful human annotation and intelligent data
processing. Extensive experiments on CEntRE with six excellent models
demonstrate the challenges of our proposed dataset.
Related papers
- SciER: An Entity and Relation Extraction Dataset for Datasets, Methods, and Tasks in Scientific Documents [49.54155332262579]
We release a new entity and relation extraction dataset for entities related to datasets, methods, and tasks in scientific articles.
Our dataset contains 106 manually annotated full-text scientific publications with over 24k entities and 12k relations.
arXiv Detail & Related papers (2024-10-28T15:56:49Z) - Data-Centric AI in the Age of Large Language Models [51.20451986068925]
This position paper proposes a data-centric viewpoint of AI research, focusing on large language models (LLMs)
We make the key observation that data is instrumental in the developmental (e.g., pretraining and fine-tuning) and inferential stages (e.g., in-context learning) of LLMs.
We identify four specific scenarios centered around data, covering data-centric benchmarks and data curation, data attribution, knowledge transfer, and inference contextualization.
arXiv Detail & Related papers (2024-06-20T16:34:07Z) - EnterpriseEM: Fine-tuned Embeddings for Enterprise Semantic Search [1.2097014193871654]
We propose a methodology for contextualizing pre-trained embedding models to enterprise environments.
By adapting the embeddings to better suit the retrieval tasks prevalent in enterprises, we aim to enhance the performance of AI-driven information retrieval solutions.
arXiv Detail & Related papers (2024-05-18T14:06:53Z) - Capture the Flag: Uncovering Data Insights with Large Language Models [90.47038584812925]
This study explores the potential of using Large Language Models (LLMs) to automate the discovery of insights in data.
We propose a new evaluation methodology based on a "capture the flag" principle, measuring the ability of such models to recognize meaningful and pertinent information (flags) in a dataset.
arXiv Detail & Related papers (2023-12-21T14:20:06Z) - STAR: Boosting Low-Resource Information Extraction by Structure-to-Text
Data Generation with Large Language Models [56.27786433792638]
STAR is a data generation method that leverages Large Language Models (LLMs) to synthesize data instances.
We design fine-grained step-by-step instructions to obtain the initial data instances.
Our experiments show that the data generated by STAR significantly improve the performance of low-resource event extraction and relation extraction tasks.
arXiv Detail & Related papers (2023-05-24T12:15:19Z) - Towards Avoiding the Data Mess: Industry Insights from Data Mesh Implementations [1.5029560229270191]
Data mesh is a socio-technical, decentralized, distributed concept for enterprise data management.
We conduct 15 semi-structured interviews with industry experts.
Our findings synthesize insights from industry experts and provide researchers and professionals with preliminary guidelines for the successful adoption of data mesh.
arXiv Detail & Related papers (2023-02-03T13:09:57Z) - A Survey of Data Marketplaces and Their Business Models [0.0]
"Data" is becoming an indispensable production factor, just like land, infrastructure, labor or capital.
Tasks ranging from automating certain functions to facilitating decision-making in data-driven organizations increasingly benefit from acquiring data inputs from third parties.
New entities and novel business models have appeared with the aim of matching such data requirements with the right providers.
arXiv Detail & Related papers (2022-01-11T12:27:37Z) - SAIS: Supervising and Augmenting Intermediate Steps for Document-Level
Relation Extraction [51.27558374091491]
We propose to explicitly teach the model to capture relevant contexts and entity types by supervising and augmenting intermediate steps (SAIS) for relation extraction.
Based on a broad spectrum of carefully designed tasks, our proposed SAIS method not only extracts relations of better quality due to more effective supervision, but also retrieves the corresponding supporting evidence more accurately.
arXiv Detail & Related papers (2021-09-24T17:37:35Z) - Big Data Generated by Connected and Automated Vehicles for Safety
Monitoring, Assessment and Improvement, Final Report (Year 3) [0.654475763573891]
This report focuses on safety aspects of connected and automated vehicles (CAVs)
The goal is to systematically synthesize studies related to Big Data for safety monitoring and improvement.
arXiv Detail & Related papers (2021-01-09T20:00:26Z) - Improving Company Valuations with Automated Knowledge Discovery,
Extraction and Fusion [0.15293427903448023]
This paper illustrates how automated knowledge discovery, extraction and data fusion can be used to obtain additional indicators.
We apply deep web knowledge acquisition methods to identify and harvest data on clinical trials that is hidden behind proprietary search interfaces.
arXiv Detail & Related papers (2020-10-19T06:33:12Z) - Cross-Supervised Joint-Event-Extraction with Heterogeneous Information
Networks [61.950353376870154]
Joint-event-extraction is a sequence-to-sequence labeling task with a tag set composed of tags of triggers and entities.
We propose a Cross-Supervised Mechanism (CSM) to alternately supervise the extraction of triggers or entities.
Our approach outperforms the state-of-the-art methods in both entity and trigger extraction.
arXiv Detail & Related papers (2020-10-13T11:51:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.