SusGen-GPT: A Data-Centric LLM for Financial NLP and Sustainability Report Generation
- URL: http://arxiv.org/abs/2412.10906v1
- Date: Sat, 14 Dec 2024 17:30:33 GMT
- Title: SusGen-GPT: A Data-Centric LLM for Financial NLP and Sustainability Report Generation
- Authors: Qilong Wu, Xiaoneng Xiang, Hejia Huang, Xuan Wang, Yeo Wei Jie, Ranjan Satapathy, Ricardo Shirota Filho, Bharadwaj Veeravalli,
- Abstract summary: SusGen-30K is a category-balanced dataset comprising seven financial NLP tasks and ESG report generation.
We developed SusGen-GPT, a suite of models achieving state-of-the-art performance across six adapted and two off-the-shelf tasks.
Based on this, we propose the SusGen system, integrated with Retrieval-Augmented Generation (RAG) to assist in sustainability report generation.
- Score: 8.400304053291938
- License:
- Abstract: The rapid growth of the financial sector and the rising focus on Environmental, Social, and Governance (ESG) considerations highlight the need for advanced NLP tools. However, open-source LLMs proficient in both finance and ESG domains remain scarce. To address this gap, we introduce SusGen-30K, a category-balanced dataset comprising seven financial NLP tasks and ESG report generation, and propose TCFD-Bench, a benchmark for evaluating sustainability report generation. Leveraging this dataset, we developed SusGen-GPT, a suite of models achieving state-of-the-art performance across six adapted and two off-the-shelf tasks, trailing GPT-4 by only 2% despite using 7-8B parameters compared to GPT-4's 1,700B. Based on this, we propose the SusGen system, integrated with Retrieval-Augmented Generation (RAG), to assist in sustainability report generation. This work demonstrates the efficiency of our approach, advancing research in finance and ESG.
Related papers
- GFM-RAG: Graph Foundation Model for Retrieval Augmented Generation [84.41557981816077]
We introduce GFM-RAG, a novel graph foundation model (GFM) for retrieval augmented generation.
GFM-RAG is powered by an innovative graph neural network that reasons over graph structure to capture complex query-knowledge relationships.
It achieves state-of-the-art performance while maintaining efficiency and alignment with neural scaling laws.
arXiv Detail & Related papers (2025-02-03T07:04:29Z) - Trustworthiness in Retrieval-Augmented Generation Systems: A Survey [59.26328612791924]
Retrieval-Augmented Generation (RAG) has quickly grown into a pivotal paradigm in the development of Large Language Models (LLMs)
We propose a unified framework that assesses the trustworthiness of RAG systems across six key dimensions: factuality, robustness, fairness, transparency, accountability, and privacy.
arXiv Detail & Related papers (2024-09-16T09:06:44Z) - Leveraging Natural Language and Item Response Theory Models for ESG Scoring [0.0]
The study utilizes a comprehensive dataset of news articles in Portuguese related to Petrobras, a major oil company in Brazil.
The data is filtered and classified for ESG-related sentiments using advanced NLP methods.
The Rasch model is then applied to evaluate the psychometric properties of these ESG measures.
arXiv Detail & Related papers (2024-07-29T19:02:51Z) - FinBen: A Holistic Financial Benchmark for Large Language Models [75.09474986283394]
FinBen is the first extensive open-source evaluation benchmark, including 36 datasets spanning 24 financial tasks.
FinBen offers several key innovations: a broader range of tasks and datasets, the first evaluation of stock trading, novel agent and Retrieval-Augmented Generation (RAG) evaluation, and three novel open-source evaluation datasets for text summarization, question answering, and stock trading.
arXiv Detail & Related papers (2024-02-20T02:16:16Z) - ESGReveal: An LLM-based approach for extracting structured data from ESG
reports [5.467389155759699]
ESGReveal is an innovative method proposed for efficiently extracting and analyzing Environmental, Social, and Governance (ESG) data from corporate reports.
This approach utilizes Large Language Models (LLM) enhanced with Retrieval Augmented Generation (RAG) techniques.
Its efficacy was appraised using ESG reports from 166 companies across various sectors listed on the Hong Kong Stock Exchange in 2022.
arXiv Detail & Related papers (2023-12-25T06:44:32Z) - Harnessing the Web and Knowledge Graphs for Automated Impact Investing
Scoring [2.4107880640624706]
We describe a data-driven system that seeks to automate the process of creating an Sustainable Development Goals framework.
We propose a novel method for collecting and filtering a dataset of texts from different web sources and a knowledge graph relevant to a set of companies.
Our results indicate that our best performing model can accurately predict SDG scores with a micro average F1 score of 0.89.
arXiv Detail & Related papers (2023-08-04T15:14:16Z) - PIXIU: A Large Language Model, Instruction Data and Evaluation Benchmark
for Finance [63.51545277822702]
PIXIU is a comprehensive framework including the first financial large language model (LLMs) based on fine-tuning LLaMA with instruction data.
We propose FinMA by fine-tuning LLaMA with the constructed dataset to be able to follow instructions for various financial tasks.
We conduct a detailed analysis of FinMA and several existing LLMs, uncovering their strengths and weaknesses in handling critical financial tasks.
arXiv Detail & Related papers (2023-06-08T14:20:29Z) - Enabling and Analyzing How to Efficiently Extract Information from
Hybrid Long Documents with LLMs [48.87627426640621]
This research focuses on harnessing the potential of Large Language Models to comprehend critical information from financial reports.
We propose an Automated Financial Information Extraction framework that enhances LLMs' ability to comprehend and extract information from financial reports.
Our framework is effectively validated on GPT-3.5 and GPT-4, yielding average accuracy increases of 53.94% and 33.77%, respectively.
arXiv Detail & Related papers (2023-05-24T10:35:58Z) - ESGBERT: Language Model to Help with Classification Tasks Related to
Companies Environmental, Social, and Governance Practices [0.0]
Non-financial factors such as environmental, social, and governance (ESG) are attracting attention from investors.
We see a need for sophisticated NLP techniques for classification tasks for ESG text.
We explore doing this by fine-tuning BERTs pre-trained weights using ESG specific text and then further fine-tuning the model for a classification task.
arXiv Detail & Related papers (2022-03-31T04:22:44Z) - SustainBench: Benchmarks for Monitoring the Sustainable Development
Goals with Machine Learning [63.192289553021816]
Progress toward the United Nations Sustainable Development Goals has been hindered by a lack of data on key environmental and socioeconomic indicators.
Recent advances in machine learning have made it possible to utilize abundant, frequently-updated, and globally available data, such as from satellites or social media.
In this paper, we introduce SustainBench, a collection of 15 benchmark tasks across 7 SDGs.
arXiv Detail & Related papers (2021-11-08T18:59:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.