FinRED: A Dataset for Relation Extraction in Financial Domain
- URL: http://arxiv.org/abs/2306.03736v1
- Date: Tue, 6 Jun 2023 14:52:47 GMT
- Title: FinRED: A Dataset for Relation Extraction in Financial Domain
- Authors: Soumya Sharma, Tapas Nayak, Arusarka Bose, Ajay Kumar Meena, Koustuv
Dasgupta, Niloy Ganguly, Pawan Goyal
- Abstract summary: FinRED is a relation extraction dataset curated from financial news and earning call transcripts containing relations from the finance domain.
We see a significant drop in their performance on FinRED compared to the general relation extraction datasets.
- Score: 23.700539609170015
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Relation extraction models trained on a source domain cannot be applied on a
different target domain due to the mismatch between relation sets. In the
current literature, there is no extensive open-source relation extraction
dataset specific to the finance domain. In this paper, we release FinRED, a
relation extraction dataset curated from financial news and earning call
transcripts containing relations from the finance domain. FinRED has been
created by mapping Wikidata triplets using distance supervision method. We
manually annotate the test data to ensure proper evaluation. We also experiment
with various state-of-the-art relation extraction models on this dataset to
create the benchmark. We see a significant drop in their performance on FinRED
compared to the general relation extraction datasets which tells that we need
better models for financial relation extraction.
Related papers
- Enhancing Language Models for Financial Relation Extraction with Named Entities and Part-of-Speech [5.104305392215512]
FinRE task involves identifying the entities and their relation, given a piece of financial statement/text.
We propose a strategy that improves the performance of pre-trained language models by augmenting them with Named Entity Recognition (NER) and Part-Of-Speech (POS)
Experiments on a financial relations dataset show promising results and highlights the benefits of incorporating NER and POS in existing models.
arXiv Detail & Related papers (2024-05-02T14:33:05Z) - Information Extraction: An application to the domain of hyper-local financial data on developing countries [0.0]
We develop and evaluate two Natural Language Processing (NLP) based techniques to address this issue.
First, we curate a custom dataset specific to the domain of financial text data on developing countries.
We then explore a text-to-text approach with the transformer-based T5 model with the goal of undertaking simultaneous NER and relation extraction.
arXiv Detail & Related papers (2024-03-14T03:49:36Z) - FinTree: Financial Dataset Pretrain Transformer Encoder for Relation
Extraction [0.0]
We pretrain FinTree on the financial dataset, adapting the model in financial tasks.
FinTree stands out with its novel structure that predicts a masked token instead of the conventional domain [an] token.
Our experiments demonstrate that FinTree outperforms on the REFinD, a large-scale financial relation extraction dataset.
arXiv Detail & Related papers (2023-07-26T01:48:52Z) - GPT-FinRE: In-context Learning for Financial Relation Extraction using
Large Language Models [1.9559144041082446]
This paper describes our solution to relation extraction on one such dataset REFinD.
In this paper, we employed OpenAI models under the framework of in-context learning (ICL)
We were able to achieve 3rd rank overall. Our best F1-score is 0.718.
arXiv Detail & Related papers (2023-06-30T10:12:30Z) - REFinD: Relation Extraction Financial Dataset [7.207699035400335]
We propose REFinD, the first large-scale annotated dataset of relations, with $sim$29K instances and 22 relations amongst 8 types of entity pairs, generated entirely over financial documents.
We observed that various state-of-the-art deep learning models struggle with numeric inference, relational and directional ambiguity.
arXiv Detail & Related papers (2023-05-22T22:40:11Z) - SAIS: Supervising and Augmenting Intermediate Steps for Document-Level
Relation Extraction [51.27558374091491]
We propose to explicitly teach the model to capture relevant contexts and entity types by supervising and augmenting intermediate steps (SAIS) for relation extraction.
Based on a broad spectrum of carefully designed tasks, our proposed SAIS method not only extracts relations of better quality due to more effective supervision, but also retrieves the corresponding supporting evidence more accurately.
arXiv Detail & Related papers (2021-09-24T17:37:35Z) - D-REX: Dialogue Relation Extraction with Explanations [65.3862263565638]
This work focuses on extracting explanations that indicate that a relation exists while using only partially labeled data.
We propose our model-agnostic framework, D-REX, a policy-guided semi-supervised algorithm that explains and ranks relations.
We find that about 90% of the time, human annotators prefer D-REX's explanations over a strong BERT-based joint relation extraction and explanation model.
arXiv Detail & Related papers (2021-09-10T22:30:48Z) - FinQA: A Dataset of Numerical Reasoning over Financial Data [52.7249610894623]
We focus on answering deep questions over financial data, aiming to automate the analysis of a large corpus of financial documents.
We propose a new large-scale dataset, FinQA, with Question-Answering pairs over Financial reports, written by financial experts.
The results demonstrate that popular, large, pre-trained models fall far short of expert humans in acquiring finance knowledge.
arXiv Detail & Related papers (2021-09-01T00:08:14Z) - Learning Relation Prototype from Unlabeled Texts for Long-tail Relation
Extraction [84.64435075778988]
We propose a general approach to learn relation prototypes from unlabeled texts.
We learn relation prototypes as an implicit factor between entities.
We conduct experiments on two publicly available datasets: New York Times and Google Distant Supervision.
arXiv Detail & Related papers (2020-11-27T06:21:12Z) - Do We Really Need to Access the Source Data? Source Hypothesis Transfer
for Unsupervised Domain Adaptation [102.67010690592011]
Unsupervised adaptationUDA (UDA) aims to leverage the knowledge learned from a labeled source dataset to solve similar tasks in a new unlabeled domain.
Prior UDA methods typically require to access the source data when learning to adapt the model.
This work tackles a practical setting where only a trained source model is available and how we can effectively utilize such a model without source data to solve UDA problems.
arXiv Detail & Related papers (2020-02-20T03:13:58Z) - Gaussian process imputation of multiple financial series [71.08576457371433]
Multiple time series such as financial indicators, stock prices and exchange rates are strongly coupled due to their dependence on the latent state of the market.
We focus on learning the relationships among financial time series by modelling them through a multi-output Gaussian process.
arXiv Detail & Related papers (2020-02-11T19:18:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.