Related papers: Term Expansion and FinBERT fine-tuning for Hypernym and Synonym Ranking of Financial Terms

Term Expansion and FinBERT fine-tuning for Hypernym and Synonym Ranking of Financial Terms

URL: http://arxiv.org/abs/2107.13764v1
Date: Thu, 29 Jul 2021 06:17:44 GMT
Title: Term Expansion and FinBERT fine-tuning for Hypernym and Synonym Ranking of Financial Terms
Authors: Ankush Chopra and Sohom Ghosh
Abstract summary: We present systems that attempt to solve Hypernym and synonym matching problem. We designed these systems to participate in the FinSim-3, a shared task of FinNLP workshop at IJCAI-2021. Our best performing model (Accuracy: 0.917, Rank: 1.156) was developed by fine-tuning SentenceBERT [Reimers et al., 2019] over an extended labelled set created using the hierarchy of labels present in FIBO.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Hypernym and synonym matching are one of the mainstream Natural Language Processing (NLP) tasks. In this paper, we present systems that attempt to solve this problem. We designed these systems to participate in the FinSim-3, a shared task of FinNLP workshop at IJCAI-2021. The shared task is focused on solving this problem for the financial domain. We experimented with various transformer based pre-trained embeddings by fine-tuning these for either classification or phrase similarity tasks. We also augmented the provided dataset with abbreviations derived from prospectus provided by the organizers and definitions of the financial terms from DBpedia [Auer et al., 2007], Investopedia, and the Financial Industry Business Ontology (FIBO). Our best performing system uses both FinBERT [Araci, 2019] and data augmentation from the afore-mentioned sources. We observed that term expansion using data augmentation in conjunction with semantic similarity is beneficial for this task and could be useful for the other tasks that deal with short phrases. Our best performing model (Accuracy: 0.917, Rank: 1.156) was developed by fine-tuning SentenceBERT [Reimers et al., 2019] (with FinBERT at the backend) over an extended labelled set created using the hierarchy of labels present in FIBO.

Related papers

Fin-Ally: Pioneering the Development of an Advanced, Commonsense-Embedded Conversational AI for Money Matters [11.602195183951068]
Fin-Solution 2.O is an advanced solution that introduces the multi-turn financial conversational dataset, Fin-Vault.<n>It incorporates a unified model, Fin-Ally, which integrates commonsense reasoning, politeness, and human-like conversational dynamics.<n>The novel Fin-Vault dataset, consisting of 1,417 annotated multi-turn dialogues, enables Fin-Ally to extend beyond basic account management to provide personalized budgeting, real-time expense tracking, and automated financial planning.
arXiv Detail & Related papers (2025-09-29T06:44:47Z)
Structural Entropy Guided Probabilistic Coding [52.01765333755793]
We propose a novel structural entropy-guided probabilistic coding model, named SEPC. We incorporate the relationship between latent variables into the optimization by proposing a structural entropy regularization loss. Experimental results across 12 natural language understanding tasks, including both classification and regression tasks, demonstrate the superior performance of SEPC.
arXiv Detail & Related papers (2024-12-12T00:37:53Z)
Advancing Anomaly Detection: Non-Semantic Financial Data Encoding with LLMs [49.57641083688934]
We introduce a novel approach to anomaly detection in financial data using Large Language Models (LLMs) embeddings. Our experiments demonstrate that LLMs contribute valuable information to anomaly detection as our models outperform the baselines.
arXiv Detail & Related papers (2024-06-05T20:19:09Z)
NativE: Multi-modal Knowledge Graph Completion in the Wild [51.80447197290866]
We propose a comprehensive framework NativE to achieve MMKGC in the wild. NativE proposes a relation-guided dual adaptive fusion module that enables adaptive fusion for any modalities. We construct a new benchmark called WildKGC with five datasets to evaluate our method.
arXiv Detail & Related papers (2024-03-28T03:04:00Z)
Learning Semantic Text Similarity to rank Hypernyms of Financial Terms [0.23940819037450983]
We propose a system capable of extracting and ranking hypernyms for a given financial term. The system has been trained with financial text corpora obtained from various sources like DBpedia. A novel approach has been used to augment the training set with negative samples.
arXiv Detail & Related papers (2023-03-20T16:53:36Z)
FinBERT-MRC: financial named entity recognition using BERT under the machine reading comprehension paradigm [8.17576814961648]
We formulate the FinNER task as a machine reading comprehension (MRC) problem and propose a new model termed FinBERT-MRC. This formulation introduces significant prior information by utilizing well-designed queries, and extracts start index and end index of target entities. We conduct experiments on a publicly available Chinese financial dataset ChFinAnn and a real-word dataset AdminPunish.
arXiv Detail & Related papers (2022-05-31T00:44:57Z)
DICoE@FinSim-3: Financial Hypernym Detection using Augmented Terms and Distance-based Features [2.6599014990168834]
We present the submission of team DICoE for FinSim-3, the 3rd Shared Task on Learning Semantic Similarities for the Financial Domain. The task provides a set of terms in the financial domain and requires to classify them into the most relevant hypernym from a financial ontology. Our best-performing submission ranked 4th on the task's leaderboard.
arXiv Detail & Related papers (2021-09-30T08:01:48Z)
FinQA: A Dataset of Numerical Reasoning over Financial Data [52.7249610894623]
We focus on answering deep questions over financial data, aiming to automate the analysis of a large corpus of financial documents. We propose a new large-scale dataset, FinQA, with Question-Answering pairs over Financial reports, written by financial experts. The results demonstrate that popular, large, pre-trained models fall far short of expert humans in acquiring finance knowledge.
arXiv Detail & Related papers (2021-09-01T00:08:14Z)
Yseop at FinSim-3 Shared Task 2021: Specializing Financial Domain Learning with Phrase Representations [0.0]
We present our approaches for the FinSim-3 Shared Task 2021: Learning Semantic Similarities for the Financial Domain. The aim of this task is to correctly classify a list of given terms from the financial domain into the most relevant hypernym. Our system ranks 2nd overall on both metrics, scoring 0.917 on Average Accuracy and 1.141 on Mean Rank.
arXiv Detail & Related papers (2021-08-21T10:53:12Z)
TAT-QA: A Question Answering Benchmark on a Hybrid of Tabular and Textual Content in Finance [71.76018597965378]
We build a new large-scale Question Answering dataset containing both Tabular And Textual data, named TAT-QA. We propose a novel QA model termed TAGOP, which is capable of reasoning over both tables and text.
arXiv Detail & Related papers (2021-05-17T06:12:06Z)
Removing Bias in Multi-modal Classifiers: Regularization by Maximizing Functional Entropies [88.0813215220342]
Some modalities can more easily contribute to the classification results than others. We develop a method based on the log-Sobolev inequality, which bounds the functional entropy with the functional-Fisher-information. On the two challenging multi-modal datasets VQA-CPv2 and SocialIQ, we obtain state-of-the-art results while more uniformly exploiting the modalities.
arXiv Detail & Related papers (2020-10-21T07:40:33Z)
SynSetExpan: An Iterative Framework for Joint Entity Set Expansion and Synonym Discovery [66.24624547470175]
SynSetExpan is a novel framework that enables two tasks to mutually enhance each other. We create the first large-scale Synonym-Enhanced Set Expansion dataset via crowdsourcing. Experiments on the SE2 dataset and previous benchmarks demonstrate the effectiveness of SynSetExpan for both entity set expansion and synonym discovery tasks.
arXiv Detail & Related papers (2020-09-29T07:32:17Z)
IITK at the FinSim Task: Hypernym Detection in Financial Domain via Context-Free and Contextualized Word Embeddings [2.515934533974176]
FinSim 2020 task is to classify financial terms into the most relevant hypernym (or top-level) concept in an external ontology. We leverage both context-dependent and context-independent word embeddings in our analysis. Our system ranks 1st based on both the metrics, i.e. mean rank and accuracy.
arXiv Detail & Related papers (2020-07-22T04:56:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.