Harnessing the Web and Knowledge Graphs for Automated Impact Investing
Scoring
- URL: http://arxiv.org/abs/2308.02622v1
- Date: Fri, 4 Aug 2023 15:14:16 GMT
- Title: Harnessing the Web and Knowledge Graphs for Automated Impact Investing
Scoring
- Authors: Qingzhi Hu, Daniel Daza, Laurens Swinkels, Kristina \=Usait\.e,
Robbert-Jan 't Hoen, Paul Groth
- Abstract summary: We describe a data-driven system that seeks to automate the process of creating an Sustainable Development Goals framework.
We propose a novel method for collecting and filtering a dataset of texts from different web sources and a knowledge graph relevant to a set of companies.
Our results indicate that our best performing model can accurately predict SDG scores with a micro average F1 score of 0.89.
- Score: 2.4107880640624706
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The Sustainable Development Goals (SDGs) were introduced by the United
Nations in order to encourage policies and activities that help guarantee human
prosperity and sustainability. SDG frameworks produced in the finance industry
are designed to provide scores that indicate how well a company aligns with
each of the 17 SDGs. This scoring enables a consistent assessment of
investments that have the potential of building an inclusive and sustainable
economy. As a result of the high quality and reliability required by such
frameworks, the process of creating and maintaining them is time-consuming and
requires extensive domain expertise. In this work, we describe a data-driven
system that seeks to automate the process of creating an SDG framework. First,
we propose a novel method for collecting and filtering a dataset of texts from
different web sources and a knowledge graph relevant to a set of companies. We
then implement and deploy classifiers trained with this data for predicting
scores of alignment with SDGs for a given company. Our results indicate that
our best performing model can accurately predict SDG scores with a micro
average F1 score of 0.89, demonstrating the effectiveness of the proposed
solution. We further describe how the integration of the models for its use by
humans can be facilitated by providing explanations in the form of data
relevant to a predicted score. We find that our proposed solution enables
access to a large amount of information that analysts would normally not be
able to process, resulting in an accurate prediction of SDG scores at a
fraction of the cost.
Related papers
- A Scalable Data-Driven Framework for Systematic Analysis of SEC 10-K Filings Using Large Language Models [0.0]
We propose a novel data-driven approach to analyze and rate the performance of companies based on their SEC 10-K filings.
The proposed scheme is then implemented on an interactive GUI as a no-code solution for running the data pipeline and creating the visualizations.
The application showcases the rating results and provides year-on-year comparisons of company performance.
arXiv Detail & Related papers (2024-09-26T06:57:22Z) - How Much Data are Enough? Investigating Dataset Requirements for Patch-Based Brain MRI Segmentation Tasks [74.21484375019334]
Training deep neural networks reliably requires access to large-scale datasets.
To mitigate both the time and financial costs associated with model development, a clear understanding of the amount of data required to train a satisfactory model is crucial.
This paper proposes a strategic framework for estimating the amount of annotated data required to train patch-based segmentation networks.
arXiv Detail & Related papers (2024-04-04T13:55:06Z) - FinGPT: Instruction Tuning Benchmark for Open-Source Large Language
Models in Financial Datasets [9.714447724811842]
This paper introduces a distinctive approach anchored in the Instruction Tuning paradigm for open-source large language models.
We capitalize on the interoperability of open-source models, ensuring a seamless and transparent integration.
The paper presents a benchmarking scheme designed for end-to-end training and testing, employing a cost-effective progression.
arXiv Detail & Related papers (2023-10-07T12:52:58Z) - Confidence Ranking for CTR Prediction [11.071444869776725]
We propose a novel framework, named Confidence Ranking, which designs the optimization objective as a ranking function.
Our experiments show that the introduction of confidence ranking loss can outperform all baselines on the CTR prediction tasks of public and industrial datasets.
This framework has been deployed in the advertisement system of JD.com to serve the main traffic in the fine-rank stage.
arXiv Detail & Related papers (2023-06-28T07:31:00Z) - Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions.
In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data.
We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z) - Using Sampling to Estimate and Improve Performance of Automated Scoring
Systems with Guarantees [63.62448343531963]
We propose a combination of the existing paradigms, sampling responses to be scored by humans intelligently.
We observe significant gains in accuracy (19.80% increase on average) and quadratic weighted kappa (QWK) (25.60% on average) with a relatively small human budget.
arXiv Detail & Related papers (2021-11-17T05:00:51Z) - SustainBench: Benchmarks for Monitoring the Sustainable Development
Goals with Machine Learning [63.192289553021816]
Progress toward the United Nations Sustainable Development Goals has been hindered by a lack of data on key environmental and socioeconomic indicators.
Recent advances in machine learning have made it possible to utilize abundant, frequently-updated, and globally available data, such as from satellites or social media.
In this paper, we introduce SustainBench, a collection of 15 benchmark tasks across 7 SDGs.
arXiv Detail & Related papers (2021-11-08T18:59:04Z) - DAGA: Data Augmentation with a Generation Approach for Low-resource
Tagging Tasks [88.62288327934499]
We propose a novel augmentation method with language models trained on the linearized labeled sentences.
Our method is applicable to both supervised and semi-supervised settings.
arXiv Detail & Related papers (2020-11-03T07:49:15Z) - Bandit Data-Driven Optimization [62.01362535014316]
There are four major pain points that a machine learning pipeline must overcome in order to be useful in settings.
We introduce bandit data-driven optimization, the first iterative prediction-prescription framework to address these pain points.
We propose PROOF, a novel algorithm for this framework and formally prove that it has no-regret.
arXiv Detail & Related papers (2020-08-26T17:50:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.