Term Expansion and FinBERT fine-tuning for Hypernym and Synonym Ranking
of Financial Terms
- URL: http://arxiv.org/abs/2107.13764v1
- Date: Thu, 29 Jul 2021 06:17:44 GMT
- Title: Term Expansion and FinBERT fine-tuning for Hypernym and Synonym Ranking
of Financial Terms
- Authors: Ankush Chopra and Sohom Ghosh
- Abstract summary: We present systems that attempt to solve Hypernym and synonym matching problem.
We designed these systems to participate in the FinSim-3, a shared task of FinNLP workshop at IJCAI-2021.
Our best performing model (Accuracy: 0.917, Rank: 1.156) was developed by fine-tuning SentenceBERT [Reimers et al., 2019] over an extended labelled set created using the hierarchy of labels present in FIBO.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Hypernym and synonym matching are one of the mainstream Natural Language
Processing (NLP) tasks. In this paper, we present systems that attempt to solve
this problem. We designed these systems to participate in the FinSim-3, a
shared task of FinNLP workshop at IJCAI-2021. The shared task is focused on
solving this problem for the financial domain. We experimented with various
transformer based pre-trained embeddings by fine-tuning these for either
classification or phrase similarity tasks. We also augmented the provided
dataset with abbreviations derived from prospectus provided by the organizers
and definitions of the financial terms from DBpedia [Auer et al., 2007],
Investopedia, and the Financial Industry Business Ontology (FIBO). Our best
performing system uses both FinBERT [Araci, 2019] and data augmentation from
the afore-mentioned sources. We observed that term expansion using data
augmentation in conjunction with semantic similarity is beneficial for this
task and could be useful for the other tasks that deal with short phrases. Our
best performing model (Accuracy: 0.917, Rank: 1.156) was developed by
fine-tuning SentenceBERT [Reimers et al., 2019] (with FinBERT at the backend)
over an extended labelled set created using the hierarchy of labels present in
FIBO.
Related papers
- Advancing Anomaly Detection: Non-Semantic Financial Data Encoding with LLMs [49.57641083688934]
We introduce a novel approach to anomaly detection in financial data using Large Language Models (LLMs) embeddings.
Our experiments demonstrate that LLMs contribute valuable information to anomaly detection as our models outperform the baselines.
arXiv Detail & Related papers (2024-06-05T20:19:09Z) - NativE: Multi-modal Knowledge Graph Completion in the Wild [51.80447197290866]
We propose a comprehensive framework NativE to achieve MMKGC in the wild.
NativE proposes a relation-guided dual adaptive fusion module that enables adaptive fusion for any modalities.
We construct a new benchmark called WildKGC with five datasets to evaluate our method.
arXiv Detail & Related papers (2024-03-28T03:04:00Z) - Learning Semantic Text Similarity to rank Hypernyms of Financial Terms [0.23940819037450983]
We propose a system capable of extracting and ranking hypernyms for a given financial term.
The system has been trained with financial text corpora obtained from various sources like DBpedia.
A novel approach has been used to augment the training set with negative samples.
arXiv Detail & Related papers (2023-03-20T16:53:36Z) - FinBERT-MRC: financial named entity recognition using BERT under the
machine reading comprehension paradigm [8.17576814961648]
We formulate the FinNER task as a machine reading comprehension (MRC) problem and propose a new model termed FinBERT-MRC.
This formulation introduces significant prior information by utilizing well-designed queries, and extracts start index and end index of target entities.
We conduct experiments on a publicly available Chinese financial dataset ChFinAnn and a real-word dataset AdminPunish.
arXiv Detail & Related papers (2022-05-31T00:44:57Z) - DICoE@FinSim-3: Financial Hypernym Detection using Augmented Terms and
Distance-based Features [2.6599014990168834]
We present the submission of team DICoE for FinSim-3, the 3rd Shared Task on Learning Semantic Similarities for the Financial Domain.
The task provides a set of terms in the financial domain and requires to classify them into the most relevant hypernym from a financial ontology.
Our best-performing submission ranked 4th on the task's leaderboard.
arXiv Detail & Related papers (2021-09-30T08:01:48Z) - FinQA: A Dataset of Numerical Reasoning over Financial Data [52.7249610894623]
We focus on answering deep questions over financial data, aiming to automate the analysis of a large corpus of financial documents.
We propose a new large-scale dataset, FinQA, with Question-Answering pairs over Financial reports, written by financial experts.
The results demonstrate that popular, large, pre-trained models fall far short of expert humans in acquiring finance knowledge.
arXiv Detail & Related papers (2021-09-01T00:08:14Z) - Yseop at FinSim-3 Shared Task 2021: Specializing Financial Domain
Learning with Phrase Representations [0.0]
We present our approaches for the FinSim-3 Shared Task 2021: Learning Semantic Similarities for the Financial Domain.
The aim of this task is to correctly classify a list of given terms from the financial domain into the most relevant hypernym.
Our system ranks 2nd overall on both metrics, scoring 0.917 on Average Accuracy and 1.141 on Mean Rank.
arXiv Detail & Related papers (2021-08-21T10:53:12Z) - TAT-QA: A Question Answering Benchmark on a Hybrid of Tabular and
Textual Content in Finance [71.76018597965378]
We build a new large-scale Question Answering dataset containing both Tabular And Textual data, named TAT-QA.
We propose a novel QA model termed TAGOP, which is capable of reasoning over both tables and text.
arXiv Detail & Related papers (2021-05-17T06:12:06Z) - Removing Bias in Multi-modal Classifiers: Regularization by Maximizing
Functional Entropies [88.0813215220342]
Some modalities can more easily contribute to the classification results than others.
We develop a method based on the log-Sobolev inequality, which bounds the functional entropy with the functional-Fisher-information.
On the two challenging multi-modal datasets VQA-CPv2 and SocialIQ, we obtain state-of-the-art results while more uniformly exploiting the modalities.
arXiv Detail & Related papers (2020-10-21T07:40:33Z) - SynSetExpan: An Iterative Framework for Joint Entity Set Expansion and
Synonym Discovery [66.24624547470175]
SynSetExpan is a novel framework that enables two tasks to mutually enhance each other.
We create the first large-scale Synonym-Enhanced Set Expansion dataset via crowdsourcing.
Experiments on the SE2 dataset and previous benchmarks demonstrate the effectiveness of SynSetExpan for both entity set expansion and synonym discovery tasks.
arXiv Detail & Related papers (2020-09-29T07:32:17Z) - IITK at the FinSim Task: Hypernym Detection in Financial Domain via
Context-Free and Contextualized Word Embeddings [2.515934533974176]
FinSim 2020 task is to classify financial terms into the most relevant hypernym (or top-level) concept in an external ontology.
We leverage both context-dependent and context-independent word embeddings in our analysis.
Our system ranks 1st based on both the metrics, i.e. mean rank and accuracy.
arXiv Detail & Related papers (2020-07-22T04:56:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.