Multi-Task Learning for Features Extraction in Financial Annual Reports
- URL: http://arxiv.org/abs/2404.05281v1
- Date: Mon, 8 Apr 2024 08:13:40 GMT
- Title: Multi-Task Learning for Features Extraction in Financial Annual Reports
- Authors: Syrielle Montariol, Matej Martinc, Andraž Pelicon, Senja Pollak, Boshko Koloski, Igor Lončarski, Aljoša Valentinčič,
- Abstract summary: We use various multi-task learning methods for financial text classification.
We focus on financial sentiment, objectivity, forward-looking sentence prediction and ESG-content detection.
We extract textual features from annual reports of FTSE350 companies and investigate the link between ESG quantitative scores and these features.
- Score: 6.442186704634672
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: For assessing various performance indicators of companies, the focus is shifting from strictly financial (quantitative) publicly disclosed information to qualitative (textual) information. This textual data can provide valuable weak signals, for example through stylistic features, which can complement the quantitative data on financial performance or on Environmental, Social and Governance (ESG) criteria. In this work, we use various multi-task learning methods for financial text classification with the focus on financial sentiment, objectivity, forward-looking sentence prediction and ESG-content detection. We propose different methods to combine the information extracted from training jointly on different tasks; our best-performing method highlights the positive effect of explicitly adding auxiliary task predictions as features for the final target task during the multi-task training. Next, we use these classifiers to extract textual features from annual reports of FTSE350 companies and investigate the link between ESG quantitative scores and these features.
Related papers
- Empowering Large Language Models in Wireless Communication: A Novel Dataset and Fine-Tuning Framework [81.29965270493238]
We develop a specialized dataset aimed at enhancing the evaluation and fine-tuning of large language models (LLMs) for wireless communication applications.
The dataset includes a diverse set of multi-hop questions, including true/false and multiple-choice types, spanning varying difficulty levels from easy to hard.
We introduce a Pointwise V-Information (PVI) based fine-tuning method, providing a detailed theoretical analysis and justification for its use in quantifying the information content of training data.
arXiv Detail & Related papers (2025-01-16T16:19:53Z) - Active Prompt Learning with Vision-Language Model Priors [9.173468790066956]
We introduce a class-guided clustering that leverages the pre-trained image and text encoders of vision-language models.
We propose a budget-saving selective querying based on adaptive class-wise thresholds.
arXiv Detail & Related papers (2024-11-23T02:34:33Z) - Context is Key: A Benchmark for Forecasting with Essential Textual Information [87.3175915185287]
"Context is Key" (CiK) is a forecasting benchmark that pairs numerical data with diverse types of carefully crafted textual context.
We evaluate a range of approaches, including statistical models, time series foundation models, and LLM-based forecasters.
We propose a simple yet effective LLM prompting method that outperforms all other tested methods on our benchmark.
arXiv Detail & Related papers (2024-10-24T17:56:08Z) - Contrastive Learning of Asset Embeddings from Financial Time Series [8.595725772518332]
We propose a novel contrastive learning framework to generate asset embeddings from financial time series data.
Our approach leverages the similarity of asset returns over many subwindows to generate informative positive and negative samples.
Experiments on real-world datasets demonstrate the effectiveness of the learned asset embeddings on benchmark industry classification and portfolio optimization tasks.
arXiv Detail & Related papers (2024-07-26T10:26:44Z) - Enhancing Text Classification through LLM-Driven Active Learning and Human Annotation [2.0411082897313984]
This study introduces a novel methodology that integrates human annotators and Large Language Models.
The proposed framework integrates human annotation with the output of LLMs, depending on the model uncertainty levels.
The empirical results show a substantial decrease in the costs associated with data annotation while either maintaining or improving model accuracy.
arXiv Detail & Related papers (2024-06-17T21:45:48Z) - A Survey of Large Language Models for Financial Applications: Progress, Prospects and Challenges [60.546677053091685]
Large language models (LLMs) have unlocked novel opportunities for machine learning applications in the financial domain.
We explore the application of LLMs on various financial tasks, focusing on their potential to transform traditional practices and drive innovation.
We highlight this survey for categorizing the existing literature into key application areas, including linguistic tasks, sentiment analysis, financial time series, financial reasoning, agent-based modeling, and other applications.
arXiv Detail & Related papers (2024-06-15T16:11:35Z) - C-ICL: Contrastive In-context Learning for Information Extraction [54.39470114243744]
c-ICL is a novel few-shot technique that leverages both correct and incorrect sample constructions to create in-context learning demonstrations.
Our experiments on various datasets indicate that c-ICL outperforms previous few-shot in-context learning methods.
arXiv Detail & Related papers (2024-02-17T11:28:08Z) - Distribution Matching for Multi-Task Learning of Classification Tasks: a
Large-Scale Study on Faces & Beyond [62.406687088097605]
Multi-Task Learning (MTL) is a framework, where multiple related tasks are learned jointly and benefit from a shared representation space.
We show that MTL can be successful with classification tasks with little, or non-overlapping annotations.
We propose a novel approach, where knowledge exchange is enabled between the tasks via distribution matching.
arXiv Detail & Related papers (2024-01-02T14:18:11Z) - FinGPT: Instruction Tuning Benchmark for Open-Source Large Language
Models in Financial Datasets [9.714447724811842]
This paper introduces a distinctive approach anchored in the Instruction Tuning paradigm for open-source large language models.
We capitalize on the interoperability of open-source models, ensuring a seamless and transparent integration.
The paper presents a benchmarking scheme designed for end-to-end training and testing, employing a cost-effective progression.
arXiv Detail & Related papers (2023-10-07T12:52:58Z) - Stock Embeddings: Learning Distributed Representations for Financial
Assets [11.67728795230542]
We propose a neural model for training stock embeddings, which harnesses the dynamics of historical returns data.
We describe our approach in detail and discuss a number of ways that it can be used in the financial domain.
arXiv Detail & Related papers (2022-02-14T15:39:06Z) - Multitask Learning for Class-Imbalanced Discourse Classification [74.41900374452472]
We show that a multitask approach can improve 7% Micro F1-score upon current state-of-the-art benchmarks.
We also offer a comparative review of additional techniques proposed to address resource-poor problems in NLP.
arXiv Detail & Related papers (2021-01-02T07:13:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.