Nano-ESG: Extracting Corporate Sustainability Information from News Articles
- URL: http://arxiv.org/abs/2412.15093v1
- Date: Thu, 19 Dec 2024 17:43:27 GMT
- Title: Nano-ESG: Extracting Corporate Sustainability Information from News Articles
- Authors: Fabian Billert, Stefan Conrad,
- Abstract summary: We present a novel dataset of more than 840,000 news articles which were gathered for major German companies between January 2023 and September 2024.
By applying a mixture of Natural Language Processing techniques, we first identify relevant articles, before summarizing them and extracting their sustainability-related sentiment and aspect.
- Score: 0.0
- License:
- Abstract: Determining the sustainability impact of companies is a highly complex subject which has garnered more and more attention over the past few years. Today, investors largely rely on sustainability-ratings from established rating-providers in order to analyze how responsibly a company acts. However, those ratings have recently been criticized for being hard to understand and nearly impossible to reproduce. An independent way to find out about the sustainability practices of companies lies in the rich landscape of news article data. In this paper, we explore a different approach to identify key opportunities and challenges of companies in the sustainability domain. We present a novel dataset of more than 840,000 news articles which were gathered for major German companies between January 2023 and September 2024. By applying a mixture of Natural Language Processing techniques, we first identify relevant articles, before summarizing them and extracting their sustainability-related sentiment and aspect using Large Language Models (LLMs). Furthermore, we conduct an evaluation of the obtained data and determine that the LLM-produced answers are accurate. We release both datasets at https://github.com/Bailefan/Nano-ESG.
Related papers
- Glitter or Gold? Deriving Structured Insights from Sustainability
Reports via Large Language Models [16.231171704561714]
This study uses Information Extraction (IE) methods to extract structured insights related to ESG aspects from companies' sustainability reports.
We then leverage graph-based representations to conduct statistical analyses concerning the extracted insights.
arXiv Detail & Related papers (2023-10-09T11:34:41Z) - CSPRD: A Financial Policy Retrieval Dataset for Chinese Stock Market [61.59326951366202]
We propose a new task, policy retrieval, by introducing the Chinese Stock Policy Retrieval dataset (CSPRD)
CSPRD provides 700+ passages labeled by experienced experts with relevant articles from 10k+ entries in our collected Chinese policy corpus.
Our best performing baseline achieves 56.1% MRR@10, 28.5% NDCG@10, 37.5% Recall@10 and 80.6% Precision@10 on dev set.
arXiv Detail & Related papers (2023-09-08T15:40:54Z) - CHATREPORT: Democratizing Sustainability Disclosure Analysis through
LLM-based Tools [10.653984116770234]
ChatReport is a novel LLM-based system to automate the analysis of corporate sustainability reports.
We make our methodology, annotated datasets, and generated analyses of 1015 reports publicly available.
arXiv Detail & Related papers (2023-07-28T18:58:16Z) - L-Eval: Instituting Standardized Evaluation for Long Context Language
Models [91.05820785008527]
We propose L-Eval to institute a more standardized evaluation for long context language models (LCLMs)
We build a new evaluation suite containing 20 sub-tasks, 508 long documents, and over 2,000 human-labeled query-response pairs.
Results show that popular n-gram matching metrics generally can not correlate well with human judgment.
arXiv Detail & Related papers (2023-07-20T17:59:41Z) - Paradigm Shift in Sustainability Disclosure Analysis: Empowering
Stakeholders with CHATREPORT, a Language Model-Based Tool [10.653984116770234]
This paper introduces a novel approach to enhance Large Language Models (LLMs) with expert knowledge to automate the analysis of corporate sustainability reports.
We christen our tool CHATREPORT, and apply it in a first use case to assess corporate climate risk disclosures.
arXiv Detail & Related papers (2023-06-27T14:46:47Z) - Sentiment Analysis in the Era of Large Language Models: A Reality Check [69.97942065617664]
This paper investigates the capabilities of large language models (LLMs) in performing various sentiment analysis tasks.
We evaluate performance across 13 tasks on 26 datasets and compare the results against small language models (SLMs) trained on domain-specific datasets.
arXiv Detail & Related papers (2023-05-24T10:45:25Z) - sustain.AI: a Recommender System to analyze Sustainability Reports [0.2479153065703935]
sustainAI is an intelligent, context-aware recommender system that assists auditors and financial investors.
We evaluate our model on two novel German sustainability reporting data sets and consistently achieve a significantly higher recommendation performance.
arXiv Detail & Related papers (2023-05-15T15:16:19Z) - MER 2023: Multi-label Learning, Modality Robustness, and Semi-Supervised
Learning [90.17500229142755]
The first Multimodal Emotion Recognition Challenge (MER 2023) was successfully held at ACM Multimedia.
This paper introduces the motivation behind this challenge, describe the benchmark dataset, and provide some statistics about participants.
We believe this high-quality dataset can become a new benchmark in multimodal emotion recognition, especially for the Chinese research community.
arXiv Detail & Related papers (2023-04-18T13:23:42Z) - Predicting Companies' ESG Ratings from News Articles Using Multivariate
Timeseries Analysis [17.332692582748408]
We build a model to predict ESG ratings from news articles using the combination of multivariate timeseries construction and deep learning techniques.
A news dataset for about 3,000 US companies together with their ratings is also created and released for training.
Our approach provides accurate results outperforming the state-of-the-art, and can be used in practice to support a manual determination or analysis of ESG ratings.
arXiv Detail & Related papers (2022-11-13T11:23:02Z) - GreenDB -- A Dataset and Benchmark for Extraction of Sustainability
Information of Consumer Goods [58.31888171187044]
We present GreenDB, a database that collects products from European online shops on a weekly basis.
As proxy for the products' sustainability, it relies on sustainability labels, which are evaluated by experts.
We present initial results demonstrating that ML models trained with our data can reliably predict the sustainability label of products.
arXiv Detail & Related papers (2022-07-21T19:59:42Z) - Algorithmic Fairness Datasets: the Story so Far [68.45921483094705]
Data-driven algorithms are studied in diverse domains to support critical decisions, directly impacting people's well-being.
A growing community of researchers has been investigating the equity of existing algorithms and proposing novel ones, advancing the understanding of risks and opportunities of automated decision-making for historically disadvantaged populations.
Progress in fair Machine Learning hinges on data, which can be appropriately used only if adequately documented.
Unfortunately, the algorithmic fairness community suffers from a collective data documentation debt caused by a lack of information on specific resources (opacity) and scatteredness of available information (sparsity)
arXiv Detail & Related papers (2022-02-03T17:25:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.