ESG-FTSE: A corpus of news articles with ESG relevance labels and use cases
- URL: http://arxiv.org/abs/2405.20218v1
- Date: Thu, 30 May 2024 16:19:02 GMT
- Title: ESG-FTSE: A corpus of news articles with ESG relevance labels and use cases
- Authors: Mariya Pavlova, Bernard Casey, Miaosen Wang,
- Abstract summary: We present ESG-FTSE, the first corpus comprised of news articles with Environmental, Social and Governance (ESG) relevance annotations.
This has led to the rise of ESG scores to evaluate an investment's credentials as socially responsible.
Quantitative techniques can be applied to improve ESG scores, thus, responsible investing.
- Score: 1.3937696730884712
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We present ESG-FTSE, the first corpus comprised of news articles with Environmental, Social and Governance (ESG) relevance annotations. In recent years, investors and regulators have pushed ESG investing to the mainstream due to the urgency of climate change. This has led to the rise of ESG scores to evaluate an investment's credentials as socially responsible. While demand for ESG scores is high, their quality varies wildly. Quantitative techniques can be applied to improve ESG scores, thus, responsible investing. To contribute to resource building for ESG and financial text mining, we pioneer the ESG-FTSE corpus. We further present the first of its kind ESG annotation schema. It has three levels: a binary classification (relevant versus irrelevant news articles), ESG classification (ESG-related news articles), and target company. Both supervised and unsupervised learning experiments for ESG relevance detection were conducted to demonstrate that the corpus can be used in different settings to derive accurate ESG predictions. Keywords: corpus annotation, ESG labels, annotation schema, news article, natural language processing
Related papers
- Towards Fair RAG: On the Impact of Fair Ranking in Retrieval-Augmented Generation [53.285436927963865]
This paper presents the first systematic evaluation of RAG systems integrated with fair rankings.
We focus specifically on measuring the fair exposure of each relevant item across the rankings utilized by RAG systems.
Our findings indicate that RAG systems with fair rankings can maintain a high level of generation quality and, in many cases, even outperform traditional RAG systems.
arXiv Detail & Related papers (2024-09-17T23:10:04Z) - Leveraging Natural Language and Item Response Theory Models for ESG Scoring [0.0]
The study utilizes a comprehensive dataset of news articles in Portuguese related to Petrobras, a major oil company in Brazil.
The data is filtered and classified for ESG-related sentiments using advanced NLP methods.
The Rasch model is then applied to evaluate the psychometric properties of these ESG measures.
arXiv Detail & Related papers (2024-07-29T19:02:51Z) - Leveraging BERT Language Models for Multi-Lingual ESG Issue
Identification [0.30254881201174333]
Investors have increasingly recognized the significance of ESG criteria in their investment choices.
The Multi-Lingual ESG Issue Identification (ML-ESG) task encompasses the classification of news documents into 35 distinct ESG issue labels.
In this study, we explored multiple strategies harnessing BERT language models to achieve accurate classification of news documents across these labels.
arXiv Detail & Related papers (2023-09-05T12:48:21Z) - EaSyGuide : ESG Issue Identification Framework leveraging Abilities of
Generative Large Language Models [5.388543737855513]
This paper presents our participation in the FinNLP-2023 shared task on multi-lingual environmental, social, and corporate governance issue identification (ML-ESG)
The task's objective is to classify news articles based on the 35 ESG key issues defined by the MSCI ESG rating guidelines.
Our approach focuses on the English and French subtasks, employing the CerebrasGPT, OPT, and Pythia models, along with the zero-shot and GPT3Mix Augmentation techniques.
arXiv Detail & Related papers (2023-06-11T12:25:02Z) - GPT-NER: Named Entity Recognition via Large Language Models [58.609582116612934]
GPT-NER transforms the sequence labeling task to a generation task that can be easily adapted by Language Models.
We find that GPT-NER exhibits a greater ability in the low-resource and few-shot setups, when the amount of training data is extremely scarce.
This demonstrates the capabilities of GPT-NER in real-world NER applications where the number of labeled examples is limited.
arXiv Detail & Related papers (2023-04-20T16:17:26Z) - Predicting Companies' ESG Ratings from News Articles Using Multivariate
Timeseries Analysis [17.332692582748408]
We build a model to predict ESG ratings from news articles using the combination of multivariate timeseries construction and deep learning techniques.
A news dataset for about 3,000 US companies together with their ratings is also created and released for training.
Our approach provides accurate results outperforming the state-of-the-art, and can be used in practice to support a manual determination or analysis of ESG ratings.
arXiv Detail & Related papers (2022-11-13T11:23:02Z) - ESGBERT: Language Model to Help with Classification Tasks Related to
Companies Environmental, Social, and Governance Practices [0.0]
Non-financial factors such as environmental, social, and governance (ESG) are attracting attention from investors.
We see a need for sophisticated NLP techniques for classification tasks for ESG text.
We explore doing this by fine-tuning BERTs pre-trained weights using ESG specific text and then further fine-tuning the model for a classification task.
arXiv Detail & Related papers (2022-03-31T04:22:44Z) - OAG-BERT: Pre-train Heterogeneous Entity-augmented Academic Language
Model [45.419270950610624]
OAG-BERT integrates massive heterogeneous entities including paper, author, concept, venue, and affiliation.
We develop novel pre-training strategies including heterogeneous entity type embedding, entity-aware 2D positional encoding, and span-aware entity masking.
OAG-BERT has been deployed to multiple real-world applications, such as reviewer recommendations for NSFC (National Nature Science Foundation of China) and paper tagging in the AMiner system.
arXiv Detail & Related papers (2021-03-03T14:00:57Z) - The GEM Benchmark: Natural Language Generation, its Evaluation and
Metrics [66.96150429230035]
We introduce GEM, a living benchmark for natural language Generation (NLG), its Evaluation, and Metrics.
Regular updates to the benchmark will help NLG research become more multilingual and evolve the challenge alongside models.
arXiv Detail & Related papers (2021-02-02T18:42:05Z) - Generalized Few-shot Semantic Segmentation [68.69434831359669]
We introduce a new benchmark called Generalized Few-Shot Semantic (GFS-Seg) to analyze the ability of simultaneously segmenting the novel categories.
It is the first study showing that previous representative state-of-the-art generalizations fall short in GFS-Seg.
We propose the Context-Aware Prototype Learning (CAPL) that significantly improves performance by 1) leveraging the co-occurrence prior knowledge from support samples, and 2) dynamically enriching contextual information to the conditioned, on the content of each query image.
arXiv Detail & Related papers (2020-10-11T10:13:21Z) - End-to-end Named Entity Recognition from English Speech [51.22888702264816]
We introduce a first publicly available NER annotated dataset for English speech and present an E2E approach, which jointly optimize the ASR and NER tagger components.
We also discuss how NER from speech can be used to handle out of vocabulary (OOV) words in an ASR system.
arXiv Detail & Related papers (2020-05-22T13:39:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.