Leveraging BERT Language Models for Multi-Lingual ESG Issue
Identification
- URL: http://arxiv.org/abs/2309.02189v1
- Date: Tue, 5 Sep 2023 12:48:21 GMT
- Title: Leveraging BERT Language Models for Multi-Lingual ESG Issue
Identification
- Authors: Elvys Linhares Pontes, Mohamed Benjannet, Lam Kim Ming
- Abstract summary: Investors have increasingly recognized the significance of ESG criteria in their investment choices.
The Multi-Lingual ESG Issue Identification (ML-ESG) task encompasses the classification of news documents into 35 distinct ESG issue labels.
In this study, we explored multiple strategies harnessing BERT language models to achieve accurate classification of news documents across these labels.
- Score: 0.30254881201174333
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Environmental, Social, and Governance (ESG) has been used as a metric to
measure the negative impacts and enhance positive outcomes of companies in
areas such as the environment, society, and governance. Recently, investors
have increasingly recognized the significance of ESG criteria in their
investment choices, leading businesses to integrate ESG principles into their
operations and strategies. The Multi-Lingual ESG Issue Identification (ML-ESG)
shared task encompasses the classification of news documents into 35 distinct
ESG issue labels. In this study, we explored multiple strategies harnessing
BERT language models to achieve accurate classification of news documents
across these labels. Our analysis revealed that the RoBERTa classifier emerged
as one of the most successful approaches, securing the second-place position
for the English test dataset, and sharing the fifth-place position for the
French test dataset. Furthermore, our SVM-based binary model tailored for the
Chinese language exhibited exceptional performance, earning the second-place
rank on the test dataset.
Related papers
- Evaluating the performance of state-of-the-art esg domain-specific pre-trained large language models in text classification against existing models and traditional machine learning techniques [0.0]
This research investigates the classification of Environmental, Social, and Governance (ESG) information within textual disclosures.
The aim is to develop and evaluate binary classification models capable of accurately identifying and categorizing E, S and G-related content respectively.
The motivation for this research stems from the growing importance of ESG considerations in investment decisions and corporate accountability.
arXiv Detail & Related papers (2024-09-30T20:08:32Z) - ESG-FTSE: A corpus of news articles with ESG relevance labels and use cases [1.3937696730884712]
We present ESG-FTSE, the first corpus comprised of news articles with Environmental, Social and Governance (ESG) relevance annotations.
This has led to the rise of ESG scores to evaluate an investment's credentials as socially responsible.
Quantitative techniques can be applied to improve ESG scores, thus, responsible investing.
arXiv Detail & Related papers (2024-05-30T16:19:02Z) - Enhancing ESG Impact Type Identification through Early Fusion and
Multilingual Models [4.97890110201934]
We present a comprehensive system leveraging ensemble learning techniques, capitalizing on early and late fusion approaches.
Our approach employs four distinct models: mBERT, FlauBERT-base, ALBERT-base-v2, and a Multi-Layer Perceptron (MLP) incorporating Latent Semantic Analysis (LSA) and Term Frequency-Inverse Document Frequency (TF-IDF) features.
Through extensive experimentation, we find that our early fusion ensemble approach, featuring the integration of LSA, TF-IDF, mBERT, FlauBERT-base, and ALBERT-base-v2, delivers the
arXiv Detail & Related papers (2024-02-16T15:54:24Z) - EaSyGuide : ESG Issue Identification Framework leveraging Abilities of
Generative Large Language Models [5.388543737855513]
This paper presents our participation in the FinNLP-2023 shared task on multi-lingual environmental, social, and corporate governance issue identification (ML-ESG)
The task's objective is to classify news articles based on the 35 ESG key issues defined by the MSCI ESG rating guidelines.
Our approach focuses on the English and French subtasks, employing the CerebrasGPT, OPT, and Pythia models, along with the zero-shot and GPT3Mix Augmentation techniques.
arXiv Detail & Related papers (2023-06-11T12:25:02Z) - IXA/Cogcomp at SemEval-2023 Task 2: Context-enriched Multilingual Named
Entity Recognition using Knowledge Bases [53.054598423181844]
We present a novel NER cascade approach comprising three steps.
We empirically demonstrate the significance of external knowledge bases in accurately classifying fine-grained and emerging entities.
Our system exhibits robust performance in the MultiCoNER2 shared task, even in the low-resource language setting.
arXiv Detail & Related papers (2023-04-20T20:30:34Z) - Few-Shot Cross-lingual Transfer for Coarse-grained De-identification of
Code-Mixed Clinical Texts [56.72488923420374]
Pre-trained language models (LMs) have shown great potential for cross-lingual transfer in low-resource settings.
We show the few-shot cross-lingual transfer property of LMs for named recognition (NER) and apply it to solve a low-resource and real-world challenge of code-mixed (Spanish-Catalan) clinical notes de-identification in the stroke.
arXiv Detail & Related papers (2022-04-10T21:46:52Z) - ESGBERT: Language Model to Help with Classification Tasks Related to
Companies Environmental, Social, and Governance Practices [0.0]
Non-financial factors such as environmental, social, and governance (ESG) are attracting attention from investors.
We see a need for sophisticated NLP techniques for classification tasks for ESG text.
We explore doing this by fine-tuning BERTs pre-trained weights using ESG specific text and then further fine-tuning the model for a classification task.
arXiv Detail & Related papers (2022-03-31T04:22:44Z) - A Unified Strategy for Multilingual Grammatical Error Correction with
Pre-trained Cross-Lingual Language Model [100.67378875773495]
We propose a generic and language-independent strategy for multilingual Grammatical Error Correction.
Our approach creates diverse parallel GEC data without any language-specific operations.
It achieves the state-of-the-art results on the NLPCC 2018 Task 2 dataset (Chinese) and obtains competitive performance on Falko-Merlin (German) and RULEC-GEC (Russian)
arXiv Detail & Related papers (2022-01-26T02:10:32Z) - CUGE: A Chinese Language Understanding and Generation Evaluation
Benchmark [144.05723617401674]
General-purpose language intelligence evaluation has been a longstanding goal for natural language processing.
We argue that for general-purpose language intelligence evaluation, the benchmark itself needs to be comprehensive and systematic.
We propose CUGE, a Chinese Language Understanding and Generation Evaluation benchmark with the following features.
arXiv Detail & Related papers (2021-12-27T11:08:58Z) - Guiding Generative Language Models for Data Augmentation in Few-Shot
Text Classification [59.698811329287174]
We leverage GPT-2 for generating artificial training instances in order to improve classification performance.
Our results show that fine-tuning GPT-2 in a handful of label instances leads to consistent classification improvements.
arXiv Detail & Related papers (2021-11-17T12:10:03Z) - DAGA: Data Augmentation with a Generation Approach for Low-resource
Tagging Tasks [88.62288327934499]
We propose a novel augmentation method with language models trained on the linearized labeled sentences.
Our method is applicable to both supervised and semi-supervised settings.
arXiv Detail & Related papers (2020-11-03T07:49:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.