ESGReveal: An LLM-based approach for extracting structured data from ESG
reports
- URL: http://arxiv.org/abs/2312.17264v1
- Date: Mon, 25 Dec 2023 06:44:32 GMT
- Title: ESGReveal: An LLM-based approach for extracting structured data from ESG
reports
- Authors: Yi Zou, Mengying Shi, Zhongjie Chen, Zhu Deng, ZongXiong Lei, Zihan
Zeng, Shiming Yang, HongXiang Tong, Lei Xiao, Wenwen Zhou
- Abstract summary: ESGReveal is an innovative method proposed for efficiently extracting and analyzing Environmental, Social, and Governance (ESG) data from corporate reports.
This approach utilizes Large Language Models (LLM) enhanced with Retrieval Augmented Generation (RAG) techniques.
Its efficacy was appraised using ESG reports from 166 companies across various sectors listed on the Hong Kong Stock Exchange in 2022.
- Score: 5.467389155759699
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: ESGReveal is an innovative method proposed for efficiently extracting and
analyzing Environmental, Social, and Governance (ESG) data from corporate
reports, catering to the critical need for reliable ESG information retrieval.
This approach utilizes Large Language Models (LLM) enhanced with Retrieval
Augmented Generation (RAG) techniques. The ESGReveal system includes an ESG
metadata module for targeted queries, a preprocessing module for assembling
databases, and an LLM agent for data extraction. Its efficacy was appraised
using ESG reports from 166 companies across various sectors listed on the Hong
Kong Stock Exchange in 2022, ensuring comprehensive industry and market
capitalization representation. Utilizing ESGReveal unearthed significant
insights into ESG reporting with GPT-4, demonstrating an accuracy of 76.9% in
data extraction and 83.7% in disclosure analysis, which is an improvement over
baseline models. This highlights the framework's capacity to refine ESG data
analysis precision. Moreover, it revealed a demand for reinforced ESG
disclosures, with environmental and social data disclosures standing at 69.5%
and 57.2%, respectively, suggesting a pursuit for more corporate transparency.
While current iterations of ESGReveal do not process pictorial information, a
functionality intended for future enhancement, the study calls for continued
research to further develop and compare the analytical capabilities of various
LLMs. In summary, ESGReveal is a stride forward in ESG data processing,
offering stakeholders a sophisticated tool to better evaluate and advance
corporate sustainability efforts. Its evolution is promising in promoting
transparency in corporate reporting and aligning with broader sustainable
development aims.
Related papers
- Trustworthiness in Retrieval-Augmented Generation Systems: A Survey [59.26328612791924]
Retrieval-Augmented Generation (RAG) has quickly grown into a pivotal paradigm in the development of Large Language Models (LLMs)
We propose a unified framework that assesses the trustworthiness of RAG systems across six key dimensions: factuality, robustness, fairness, transparency, accountability, and privacy.
arXiv Detail & Related papers (2024-09-16T09:06:44Z) - DSBench: How Far Are Data Science Agents to Becoming Data Science Experts? [58.330879414174476]
We introduce DSBench, a benchmark designed to evaluate data science agents with realistic tasks.
This benchmark includes 466 data analysis tasks and 74 data modeling tasks, sourced from Eloquence and Kaggle competitions.
Our evaluation of state-of-the-art LLMs, LVLMs, and agents shows that they struggle with most tasks, with the best agent solving only 34.12% of data analysis tasks and achieving a 34.74% Relative Performance Gap (RPG)
arXiv Detail & Related papers (2024-09-12T02:08:00Z) - Leveraging Natural Language and Item Response Theory Models for ESG Scoring [0.0]
The study utilizes a comprehensive dataset of news articles in Portuguese related to Petrobras, a major oil company in Brazil.
The data is filtered and classified for ESG-related sentiments using advanced NLP methods.
The Rasch model is then applied to evaluate the psychometric properties of these ESG measures.
arXiv Detail & Related papers (2024-07-29T19:02:51Z) - InsightBench: Evaluating Business Analytics Agents Through Multi-Step Insight Generation [79.09622602860703]
We introduce InsightBench, a benchmark dataset with three key features.
It consists of 100 datasets representing diverse business use cases such as finance and incident management.
Unlike existing benchmarks focusing on answering single queries, InsightBench evaluates agents based on their ability to perform end-to-end data analytics.
arXiv Detail & Related papers (2024-07-08T22:06:09Z) - DiscoveryBench: Towards Data-Driven Discovery with Large Language Models [50.36636396660163]
We present DiscoveryBench, the first comprehensive benchmark that formalizes the multi-step process of data-driven discovery.
Our benchmark contains 264 tasks collected across 6 diverse domains, such as sociology and engineering.
Our benchmark, thus, illustrates the challenges in autonomous data-driven discovery and serves as a valuable resource for the community to make progress.
arXiv Detail & Related papers (2024-07-01T18:58:22Z) - DACO: Towards Application-Driven and Comprehensive Data Analysis via Code Generation [83.30006900263744]
Data analysis is a crucial analytical process to generate in-depth studies and conclusive insights.
We propose to automatically generate high-quality answer annotations leveraging the code-generation capabilities of LLMs.
Our DACO-RL algorithm is evaluated by human annotators to produce more helpful answers than SFT model in 57.72% cases.
arXiv Detail & Related papers (2024-03-04T22:47:58Z) - InfoRM: Mitigating Reward Hacking in RLHF via Information-Theoretic Reward Modeling [66.3072381478251]
Reward hacking, also termed reward overoptimization, remains a critical challenge.
We propose a framework for reward modeling, namely InfoRM, by introducing a variational information bottleneck objective.
We show that InfoRM's overoptimization detection mechanism is not only effective but also robust across a broad range of datasets.
arXiv Detail & Related papers (2024-02-14T17:49:07Z) - Glitter or Gold? Deriving Structured Insights from Sustainability
Reports via Large Language Models [16.231171704561714]
This study uses Information Extraction (IE) methods to extract structured insights related to ESG aspects from companies' sustainability reports.
We then leverage graph-based representations to conduct statistical analyses concerning the extracted insights.
arXiv Detail & Related papers (2023-10-09T11:34:41Z) - Creating a Systematic ESG (Environmental Social Governance) Scoring
System Using Social Network Analysis and Machine Learning for More
Sustainable Company Practices [0.0]
This project aims to create a data-driven ESG evaluation system that can provide better guidance and more systemized scores by incorporating social sentiment.
Python web scrapers were developed to collect data from Wikipedia, Twitter, LinkedIn, and Google News for the S&P 500 companies.
Machine-learning algorithms were trained and calibrated to S&P Global ESG Ratings to test their predictive capabilities.
arXiv Detail & Related papers (2023-09-07T20:03:45Z) - Leveraging BERT Language Models for Multi-Lingual ESG Issue
Identification [0.30254881201174333]
Investors have increasingly recognized the significance of ESG criteria in their investment choices.
The Multi-Lingual ESG Issue Identification (ML-ESG) task encompasses the classification of news documents into 35 distinct ESG issue labels.
In this study, we explored multiple strategies harnessing BERT language models to achieve accurate classification of news documents across these labels.
arXiv Detail & Related papers (2023-09-05T12:48:21Z) - Predicting Companies' ESG Ratings from News Articles Using Multivariate
Timeseries Analysis [17.332692582748408]
We build a model to predict ESG ratings from news articles using the combination of multivariate timeseries construction and deep learning techniques.
A news dataset for about 3,000 US companies together with their ratings is also created and released for training.
Our approach provides accurate results outperforming the state-of-the-art, and can be used in practice to support a manual determination or analysis of ESG ratings.
arXiv Detail & Related papers (2022-11-13T11:23:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.