REFinD: Relation Extraction Financial Dataset
- URL: http://arxiv.org/abs/2305.18322v1
- Date: Mon, 22 May 2023 22:40:11 GMT
- Title: REFinD: Relation Extraction Financial Dataset
- Authors: Simerjot Kaur, Charese Smiley, Akshat Gupta, Joy Sain, Dongsheng Wang,
Suchetha Siddagangappa, Toyin Aguda, Sameena Shah
- Abstract summary: We propose REFinD, the first large-scale annotated dataset of relations, with $sim$29K instances and 22 relations amongst 8 types of entity pairs, generated entirely over financial documents.
We observed that various state-of-the-art deep learning models struggle with numeric inference, relational and directional ambiguity.
- Score: 7.207699035400335
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: A number of datasets for Relation Extraction (RE) have been created to aide
downstream tasks such as information retrieval, semantic search, question
answering and textual entailment. However, these datasets fail to capture
financial-domain specific challenges since most of these datasets are compiled
using general knowledge sources such as Wikipedia, web-based text and news
articles, hindering real-life progress and adoption within the financial world.
To address this limitation, we propose REFinD, the first large-scale annotated
dataset of relations, with $\sim$29K instances and 22 relations amongst 8 types
of entity pairs, generated entirely over financial documents. We also provide
an empirical evaluation with various state-of-the-art models as benchmarks for
the RE task and highlight the challenges posed by our dataset. We observed that
various state-of-the-art deep learning models struggle with numeric inference,
relational and directional ambiguity.
Related papers
- SciER: An Entity and Relation Extraction Dataset for Datasets, Methods, and Tasks in Scientific Documents [49.54155332262579]
We release a new entity and relation extraction dataset for entities related to datasets, methods, and tasks in scientific articles.
Our dataset contains 106 manually annotated full-text scientific publications with over 24k entities and 12k relations.
arXiv Detail & Related papers (2024-10-28T15:56:49Z) - DiscoveryBench: Towards Data-Driven Discovery with Large Language Models [50.36636396660163]
We present DiscoveryBench, the first comprehensive benchmark that formalizes the multi-step process of data-driven discovery.
Our benchmark contains 264 tasks collected across 6 diverse domains, such as sociology and engineering.
Our benchmark, thus, illustrates the challenges in autonomous data-driven discovery and serves as a valuable resource for the community to make progress.
arXiv Detail & Related papers (2024-07-01T18:58:22Z) - FinLLMs: A Framework for Financial Reasoning Dataset Generation with
Large Language Models [12.367548338910744]
FinLLMs is a method for generating financial question-answering data based on common financial formulas using Large Language Models.
Our experiments demonstrate that synthetic data generated by FinLLMs effectively enhances the performance of several large-scale numerical reasoning models in the financial domain.
arXiv Detail & Related papers (2024-01-19T15:09:39Z) - FinDiff: Diffusion Models for Financial Tabular Data Generation [5.824064631226058]
FinDiff is a diffusion model designed to generate real-world financial data for a variety of regulatory downstream tasks.
It is evaluated against state-of-the-art baseline models using three real-world financial datasets.
arXiv Detail & Related papers (2023-09-04T09:30:15Z) - iMETRE: Incorporating Markers of Entity Types for Relation Extraction [0.0]
Sentence-level relation extraction aims to identify the relationship between 2 entities given a contextual sentence.
In this paper, we approach the task of relationship extraction in the financial dataset REFinD.
arXiv Detail & Related papers (2023-06-30T20:54:41Z) - MMSum: A Dataset for Multimodal Summarization and Thumbnail Generation
of Videos [106.06278332186106]
Multimodal summarization with multimodal output (MSMO) has emerged as a promising research direction.
Numerous limitations exist within existing public MSMO datasets.
We have meticulously curated the textbfMMSum dataset.
arXiv Detail & Related papers (2023-06-07T07:43:11Z) - FinRED: A Dataset for Relation Extraction in Financial Domain [23.700539609170015]
FinRED is a relation extraction dataset curated from financial news and earning call transcripts containing relations from the finance domain.
We see a significant drop in their performance on FinRED compared to the general relation extraction datasets.
arXiv Detail & Related papers (2023-06-06T14:52:47Z) - Modeling Entities as Semantic Points for Visual Information Extraction
in the Wild [55.91783742370978]
We propose an alternative approach to precisely and robustly extract key information from document images.
We explicitly model entities as semantic points, i.e., center points of entities are enriched with semantic information describing the attributes and relationships of different entities.
The proposed method can achieve significantly enhanced performance on entity labeling and linking, compared with previous state-of-the-art models.
arXiv Detail & Related papers (2023-03-23T08:21:16Z) - FinQA: A Dataset of Numerical Reasoning over Financial Data [52.7249610894623]
We focus on answering deep questions over financial data, aiming to automate the analysis of a large corpus of financial documents.
We propose a new large-scale dataset, FinQA, with Question-Answering pairs over Financial reports, written by financial experts.
The results demonstrate that popular, large, pre-trained models fall far short of expert humans in acquiring finance knowledge.
arXiv Detail & Related papers (2021-09-01T00:08:14Z) - Learning from Context or Names? An Empirical Study on Neural Relation
Extraction [112.06614505580501]
We study the effect of two main information sources in text: textual context and entity mentions (names)
We propose an entity-masked contrastive pre-training framework for relation extraction (RE)
Our framework can improve the effectiveness and robustness of neural models in different RE scenarios.
arXiv Detail & Related papers (2020-10-05T11:21:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.