sustain.AI: a Recommender System to analyze Sustainability Reports
- URL: http://arxiv.org/abs/2305.08711v3
- Date: Fri, 26 May 2023 07:49:33 GMT
- Title: sustain.AI: a Recommender System to analyze Sustainability Reports
- Authors: Lars Hillebrand, Maren Pielka, David Leonhard, Tobias Deu{\ss}er, Tim
Dilmaghani, Bernd Kliem, R\"udiger Loitz, Milad Morad, Christian Temath,
Thiago Bell, Robin Stenzel, Rafet Sifa
- Abstract summary: sustainAI is an intelligent, context-aware recommender system that assists auditors and financial investors.
We evaluate our model on two novel German sustainability reporting data sets and consistently achieve a significantly higher recommendation performance.
- Score: 0.2479153065703935
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present sustainAI, an intelligent, context-aware recommender system that
assists auditors and financial investors as well as the general public to
efficiently analyze companies' sustainability reports. The tool leverages an
end-to-end trainable architecture that couples a BERT-based encoding module
with a multi-label classification head to match relevant text passages from
sustainability reports to their respective law regulations from the Global
Reporting Initiative (GRI) standards. We evaluate our model on two novel German
sustainability reporting data sets and consistently achieve a significantly
higher recommendation performance compared to multiple strong baselines.
Furthermore, sustainAI is publicly available for everyone at
https://sustain.ki.nrw/.
Related papers
- AILuminate: Introducing v1.0 of the AI Risk and Reliability Benchmark from MLCommons [62.374792825813394]
This paper introduces AILuminate v1.0, the first comprehensive industry-standard benchmark for assessing AI-product risk and reliability.
The benchmark evaluates an AI system's resistance to prompts designed to elicit dangerous, illegal, or undesirable behavior in 12 hazard categories.
arXiv Detail & Related papers (2025-02-19T05:58:52Z) - Scenario-Wise Rec: A Multi-Scenario Recommendation Benchmark [54.93461228053298]
We introduce our benchmark, textbfScenario-Wise Rec, which comprises 6 public datasets and 12 benchmark models, along with a training and evaluation pipeline.
We aim for this benchmark to offer researchers valuable insights from prior work, enabling the development of novel models.
arXiv Detail & Related papers (2024-12-23T08:15:34Z) - Nano-ESG: Extracting Corporate Sustainability Information from News Articles [0.0]
We present a novel dataset of more than 840,000 news articles which were gathered for major German companies between January 2023 and September 2024.
By applying a mixture of Natural Language Processing techniques, we first identify relevant articles, before summarizing them and extracting their sustainability-related sentiment and aspect.
arXiv Detail & Related papers (2024-12-19T17:43:27Z) - OmniEval: An Omnidirectional and Automatic RAG Evaluation Benchmark in Financial Domain [62.89809156574998]
We introduce an omnidirectional and automatic RAG benchmark, OmniEval, in the financial domain.
Our benchmark is characterized by its multi-dimensional evaluation framework.
Our experiments demonstrate the comprehensiveness of OmniEval, which includes extensive test datasets.
arXiv Detail & Related papers (2024-12-17T15:38:42Z) - InsightBench: Evaluating Business Analytics Agents Through Multi-Step Insight Generation [79.09622602860703]
We introduce InsightBench, a benchmark dataset with three key features.
It consists of 100 datasets representing diverse business use cases such as finance and incident management.
Unlike existing benchmarks focusing on answering single queries, InsightBench evaluates agents based on their ability to perform end-to-end data analytics.
arXiv Detail & Related papers (2024-07-08T22:06:09Z) - PROXYQA: An Alternative Framework for Evaluating Long-Form Text Generation with Large Language Models [72.57329554067195]
ProxyQA is an innovative framework dedicated to assessing longtext generation.
It comprises in-depth human-curated meta-questions spanning various domains, each accompanied by specific proxy-questions with pre-annotated answers.
It assesses the generated content's quality through the evaluator's accuracy in addressing the proxy-questions.
arXiv Detail & Related papers (2024-01-26T18:12:25Z) - Glitter or Gold? Deriving Structured Insights from Sustainability
Reports via Large Language Models [16.231171704561714]
This study uses Information Extraction (IE) methods to extract structured insights related to ESG aspects from companies' sustainability reports.
We then leverage graph-based representations to conduct statistical analyses concerning the extracted insights.
arXiv Detail & Related papers (2023-10-09T11:34:41Z) - FinGPT: Instruction Tuning Benchmark for Open-Source Large Language
Models in Financial Datasets [9.714447724811842]
This paper introduces a distinctive approach anchored in the Instruction Tuning paradigm for open-source large language models.
We capitalize on the interoperability of open-source models, ensuring a seamless and transparent integration.
The paper presents a benchmarking scheme designed for end-to-end training and testing, employing a cost-effective progression.
arXiv Detail & Related papers (2023-10-07T12:52:58Z) - Evaluation of Faithfulness Using the Longest Supported Subsequence [52.27522262537075]
We introduce a novel approach to evaluate faithfulness of machine-generated text by computing the longest noncontinuous of the claim that is supported by the context.
Using a new human-annotated dataset, we finetune a model to generate Longest Supported Subsequence (LSS)
Our proposed metric demonstrates an 18% enhancement over the prevailing state-of-the-art metric for faithfulness on our dataset.
arXiv Detail & Related papers (2023-08-23T14:18:44Z) - CHATREPORT: Democratizing Sustainability Disclosure Analysis through
LLM-based Tools [10.653984116770234]
ChatReport is a novel LLM-based system to automate the analysis of corporate sustainability reports.
We make our methodology, annotated datasets, and generated analyses of 1015 reports publicly available.
arXiv Detail & Related papers (2023-07-28T18:58:16Z) - Paradigm Shift in Sustainability Disclosure Analysis: Empowering
Stakeholders with CHATREPORT, a Language Model-Based Tool [10.653984116770234]
This paper introduces a novel approach to enhance Large Language Models (LLMs) with expert knowledge to automate the analysis of corporate sustainability reports.
We christen our tool CHATREPORT, and apply it in a first use case to assess corporate climate risk disclosures.
arXiv Detail & Related papers (2023-06-27T14:46:47Z) - On the Robustness of Aspect-based Sentiment Analysis: Rethinking Model,
Data, and Training [109.9218185711916]
Aspect-based sentiment analysis (ABSA) aims at automatically inferring the specific sentiment polarities toward certain aspects of products or services behind social media texts or reviews.
We propose to enhance the ABSA robustness by systematically rethinking the bottlenecks from all possible angles, including model, data, and training.
arXiv Detail & Related papers (2023-04-19T11:07:43Z) - GreenDB -- A Dataset and Benchmark for Extraction of Sustainability
Information of Consumer Goods [58.31888171187044]
We present GreenDB, a database that collects products from European online shops on a weekly basis.
As proxy for the products' sustainability, it relies on sustainability labels, which are evaluated by experts.
We present initial results demonstrating that ML models trained with our data can reliably predict the sustainability label of products.
arXiv Detail & Related papers (2022-07-21T19:59:42Z) - CUGE: A Chinese Language Understanding and Generation Evaluation
Benchmark [144.05723617401674]
General-purpose language intelligence evaluation has been a longstanding goal for natural language processing.
We argue that for general-purpose language intelligence evaluation, the benchmark itself needs to be comprehensive and systematic.
We propose CUGE, a Chinese Language Understanding and Generation Evaluation benchmark with the following features.
arXiv Detail & Related papers (2021-12-27T11:08:58Z) - Multisource AI Scorecard Table for System Evaluation [3.74397577716445]
The paper describes a Multisource AI Scorecard Table (MAST) that provides the developer and user of an artificial intelligence (AI)/machine learning (ML) system with a standard checklist.
The paper explores how the analytic tradecraft standards outlined in Intelligence Community Directive (ICD) 203 can provide a framework for assessing the performance of an AI system.
arXiv Detail & Related papers (2021-02-08T03:37:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.