SEntFiN 1.0: Entity-Aware Sentiment Analysis for Financial News
- URL: http://arxiv.org/abs/2305.12257v1
- Date: Sat, 20 May 2023 18:20:39 GMT
- Title: SEntFiN 1.0: Entity-Aware Sentiment Analysis for Financial News
- Authors: Ankur Sinha, Satishwar Kedas, Rishu Kumar, Pekka Malo
- Abstract summary: We make publicly available SEntFiN 1.0, a human-annotated dataset of 10,753 news headlines with entity-sentiment annotations.
We propose a framework that enables the extraction of entity-relevant sentiments using a feature-based approach rather than an expression-based approach.
- Score: 0.03018439717785794
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Fine-grained financial sentiment analysis on news headlines is a challenging
task requiring human-annotated datasets to achieve high performance. Limited
studies have tried to address the sentiment extraction task in a setting where
multiple entities are present in a news headline. In an effort to further
research in this area, we make publicly available SEntFiN 1.0, a
human-annotated dataset of 10,753 news headlines with entity-sentiment
annotations, of which 2,847 headlines contain multiple entities, often with
conflicting sentiments. We augment our dataset with a database of over 1,000
financial entities and their various representations in news media amounting to
over 5,000 phrases. We propose a framework that enables the extraction of
entity-relevant sentiments using a feature-based approach rather than an
expression-based approach. For sentiment extraction, we utilize 12 different
learning schemes utilizing lexicon-based and pre-trained sentence
representations and five classification approaches. Our experiments indicate
that lexicon-based n-gram ensembles are above par with pre-trained word
embedding schemes such as GloVe. Overall, RoBERTa and finBERT (domain-specific
BERT) achieve the highest average accuracy of 94.29% and F1-score of 93.27%.
Further, using over 210,000 entity-sentiment predictions, we validate the
economic effect of sentiments on aggregate market movements over a long
duration.
Related papers
- Towards Enhancing Coherence in Extractive Summarization: Dataset and Experiments with LLMs [70.15262704746378]
We propose a systematically created human-annotated dataset consisting of coherent summaries for five publicly available datasets and natural language user feedback.
Preliminary experiments with Falcon-40B and Llama-2-13B show significant performance improvements (10% Rouge-L) in terms of producing coherent summaries.
arXiv Detail & Related papers (2024-07-05T20:25:04Z) - Seed-Guided Fine-Grained Entity Typing in Science and Engineering
Domains [51.02035914828596]
We study the task of seed-guided fine-grained entity typing in science and engineering domains.
We propose SEType which first enriches the weak supervision by finding more entities for each seen type from an unlabeled corpus.
It then matches the enriched entities to unlabeled text to get pseudo-labeled samples and trains a textual entailment model that can make inferences for both seen and unseen types.
arXiv Detail & Related papers (2024-01-23T22:36:03Z) - Optimal Strategies to Perform Multilingual Analysis of Social Content
for a Novel Dataset in the Tourism Domain [5.848712585343905]
We evaluate few-shot, pattern-exploiting and fine-tuning machine learning techniques on large multilingual language models.
We aim to ascertain the quantity of annotated examples required to achieve good performance in 3 common NLP tasks.
This work paves the way for applying NLP to new domain-specific applications.
arXiv Detail & Related papers (2023-11-20T13:08:21Z) - FinEntity: Entity-level Sentiment Classification for Financial Texts [15.467477195487763]
In the financial domain, conducting entity-level sentiment analysis is crucial for accurately assessing the sentiment directed toward a specific financial entity.
We introduce an entity-level sentiment classification dataset, called textbfFinEntity, that annotates financial entity spans and their sentiment in financial news.
arXiv Detail & Related papers (2023-10-19T01:38:40Z) - USB: A Unified Summarization Benchmark Across Tasks and Domains [68.82726887802856]
We introduce a Wikipedia-derived benchmark, complemented by a rich set of crowd-sourced annotations, that supports $8$ interrelated tasks.
We compare various methods on this benchmark and discover that on multiple tasks, moderately-sized fine-tuned models consistently outperform much larger few-shot prompted language models.
arXiv Detail & Related papers (2023-05-23T17:39:54Z) - You can't pick your neighbors, or can you? When and how to rely on
retrieval in the $k$NN-LM [65.74934004876914]
Retrieval-enhanced language models (LMs) condition their predictions on text retrieved from large external datastores.
One such approach, the $k$NN-LM, interpolates any existing LM's predictions with the output of a $k$-nearest neighbors model.
We empirically measure the effectiveness of our approach on two English language modeling datasets.
arXiv Detail & Related papers (2022-10-28T02:57:40Z) - Entity Disambiguation with Entity Definitions [50.01142092276296]
Local models have recently attained astounding performances in Entity Disambiguation (ED)
Previous works limited their studies to using, as the textual representation of each candidate, only its Wikipedia title.
In this paper, we address this limitation and investigate to what extent more expressive textual representations can mitigate it.
We report a new state of the art on 2 out of 6 benchmarks we consider and strongly improve the generalization capability over unseen patterns.
arXiv Detail & Related papers (2022-10-11T17:46:28Z) - Context-NER : Contextual Phrase Generation at Scale [4.7947627446578025]
We introduce CONTEXT-NER, a task that aims to generate relevant context for entities in a sentence.
We present the EDGAR10-Q dataset, containing 1M sentences, 2.8M entities, and an average of 35 tokens per sentence.
We find that T5-large, when pre-finetuned on EDGAR10-Q, achieve SOTA results on downstream finance tasks such as Headline, FPB, and FiQA SA, outperforming vanilla version by 10.81 points.
arXiv Detail & Related papers (2021-09-16T16:10:05Z) - T-BERT -- Model for Sentiment Analysis of Micro-blogs Integrating Topic
Model and BERT [0.0]
The effectiveness of BERT(Bidirectional Representations from Transformers) in sentiment classification tasks from a raw live dataset is demonstrated.
A novel T-BERT framework is proposed to show the enhanced performance obtainable by combining latent topics with contextual BERT embeddings.
arXiv Detail & Related papers (2021-06-02T12:01:47Z) - Author's Sentiment Prediction [13.459029439420872]
PerSenT is a dataset of crowd-sourced annotations of the sentiment expressed by the authors towards the main entities in news articles.
The dataset also includes paragraph-level sentiment annotations to provide more fine-grained supervision for the task.
We release this dataset with 5.3k documents and 38k paragraphs covering 3.2k unique entities as a challenge in entity sentiment analysis.
arXiv Detail & Related papers (2020-11-12T00:03:26Z) - Weakly-Supervised Aspect-Based Sentiment Analysis via Joint
Aspect-Sentiment Topic Embedding [71.2260967797055]
We propose a weakly-supervised approach for aspect-based sentiment analysis.
We learn sentiment, aspect> joint topic embeddings in the word embedding space.
We then use neural models to generalize the word-level discriminative information.
arXiv Detail & Related papers (2020-10-13T21:33:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.