Related papers: Harnessing the Web and Knowledge Graphs for Automated Impact Investing Scoring

Harnessing the Web and Knowledge Graphs for Automated Impact Investing Scoring

URL: http://arxiv.org/abs/2308.02622v1
Date: Fri, 4 Aug 2023 15:14:16 GMT
Title: Harnessing the Web and Knowledge Graphs for Automated Impact Investing Scoring
Authors: Qingzhi Hu, Daniel Daza, Laurens Swinkels, Kristina \=Usait\.e, Robbert-Jan 't Hoen, Paul Groth
Abstract summary: We describe a data-driven system that seeks to automate the process of creating an Sustainable Development Goals framework. We propose a novel method for collecting and filtering a dataset of texts from different web sources and a knowledge graph relevant to a set of companies. Our results indicate that our best performing model can accurately predict SDG scores with a micro average F1 score of 0.89.
Score: 2.4107880640624706
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The Sustainable Development Goals (SDGs) were introduced by the United Nations in order to encourage policies and activities that help guarantee human prosperity and sustainability. SDG frameworks produced in the finance industry are designed to provide scores that indicate how well a company aligns with each of the 17 SDGs. This scoring enables a consistent assessment of investments that have the potential of building an inclusive and sustainable economy. As a result of the high quality and reliability required by such frameworks, the process of creating and maintaining them is time-consuming and requires extensive domain expertise. In this work, we describe a data-driven system that seeks to automate the process of creating an SDG framework. First, we propose a novel method for collecting and filtering a dataset of texts from different web sources and a knowledge graph relevant to a set of companies. We then implement and deploy classifiers trained with this data for predicting scores of alignment with SDGs for a given company. Our results indicate that our best performing model can accurately predict SDG scores with a micro average F1 score of 0.89, demonstrating the effectiveness of the proposed solution. We further describe how the integration of the models for its use by humans can be facilitated by providing explanations in the form of data relevant to a predicted score. We find that our proposed solution enables access to a large amount of information that analysts would normally not be able to process, resulting in an accurate prediction of SDG scores at a fraction of the cost.

Related papers

Financial Data Analysis with Robust Federated Logistic Regression [7.68275287892947]
In this study, we focus on the analysis of financial data in a federated setting, wherein data is distributed across multiple clients or locations. We propose a robust federated logistic regression-based framework that strives to strike a balance between these goals.
arXiv Detail & Related papers (2025-04-28T20:42:24Z)
Enhancing Financial Time-Series Forecasting with Retrieval-Augmented Large Language Models [29.769616823587594]
We propose the first retrieval-augmented generation (RAG) framework specifically designed for financial time-series forecasting. Our framework incorporates three key innovations: a fine-tuned 1B large language model (StockLLM) as its backbone, a novel candidate selection method enhanced by LLM feedback, and a training objective that maximizes the similarity between queries and historically significant sequences.
arXiv Detail & Related papers (2025-02-09T12:26:05Z)
FinRobot: AI Agent for Equity Research and Valuation with Large Language Models [6.2474959166074955]
This paper presents FinRobot, the first AI agent framework specifically designed for equity research. FinRobot employs a multi-agent Chain of Thought (CoT) system, integrating both quantitative and qualitative analyses to emulate the comprehensive reasoning of a human analyst. Unlike existing automated research tools, such as CapitalCube and Wright Reports, FinRobot delivers insights comparable to those produced by major brokerage firms and fundamental research vendors.
arXiv Detail & Related papers (2024-11-13T17:38:07Z)
A Scalable Data-Driven Framework for Systematic Analysis of SEC 10-K Filings Using Large Language Models [0.0]
We propose a novel data-driven approach to analyze and rate the performance of companies based on their SEC 10-K filings. The proposed scheme is then implemented on an interactive GUI as a no-code solution for running the data pipeline and creating the visualizations. The application showcases the rating results and provides year-on-year comparisons of company performance.
arXiv Detail & Related papers (2024-09-26T06:57:22Z)
InsightBench: Evaluating Business Analytics Agents Through Multi-Step Insight Generation [79.09622602860703]
We introduce InsightBench, a benchmark dataset with three key features. It consists of 100 datasets representing diverse business use cases such as finance and incident management. Unlike existing benchmarks focusing on answering single queries, InsightBench evaluates agents based on their ability to perform end-to-end data analytics.
arXiv Detail & Related papers (2024-07-08T22:06:09Z)
How Much Data are Enough? Investigating Dataset Requirements for Patch-Based Brain MRI Segmentation Tasks [74.21484375019334]
Training deep neural networks reliably requires access to large-scale datasets. To mitigate both the time and financial costs associated with model development, a clear understanding of the amount of data required to train a satisfactory model is crucial. This paper proposes a strategic framework for estimating the amount of annotated data required to train patch-based segmentation networks.
arXiv Detail & Related papers (2024-04-04T13:55:06Z)
Confidence Ranking for CTR Prediction [11.071444869776725]
We propose a novel framework, named Confidence Ranking, which designs the optimization objective as a ranking function. Our experiments show that the introduction of confidence ranking loss can outperform all baselines on the CTR prediction tasks of public and industrial datasets. This framework has been deployed in the advertisement system of JD.com to serve the main traffic in the fine-rank stage.
arXiv Detail & Related papers (2023-06-28T07:31:00Z)
Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions. In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data. We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z)
Using Sampling to Estimate and Improve Performance of Automated Scoring Systems with Guarantees [63.62448343531963]
We propose a combination of the existing paradigms, sampling responses to be scored by humans intelligently. We observe significant gains in accuracy (19.80% increase on average) and quadratic weighted kappa (QWK) (25.60% on average) with a relatively small human budget.
arXiv Detail & Related papers (2021-11-17T05:00:51Z)
SustainBench: Benchmarks for Monitoring the Sustainable Development Goals with Machine Learning [63.192289553021816]
Progress toward the United Nations Sustainable Development Goals has been hindered by a lack of data on key environmental and socioeconomic indicators. Recent advances in machine learning have made it possible to utilize abundant, frequently-updated, and globally available data, such as from satellites or social media. In this paper, we introduce SustainBench, a collection of 15 benchmark tasks across 7 SDGs.
arXiv Detail & Related papers (2021-11-08T18:59:04Z)
DAGA: Data Augmentation with a Generation Approach for Low-resource Tagging Tasks [88.62288327934499]
We propose a novel augmentation method with language models trained on the linearized labeled sentences. Our method is applicable to both supervised and semi-supervised settings.
arXiv Detail & Related papers (2020-11-03T07:49:15Z)
Bandit Data-Driven Optimization [62.01362535014316]
There are four major pain points that a machine learning pipeline must overcome in order to be useful in settings. We introduce bandit data-driven optimization, the first iterative prediction-prescription framework to address these pain points. We propose PROOF, a novel algorithm for this framework and formally prove that it has no-regret.
arXiv Detail & Related papers (2020-08-26T17:50:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.