Climate AI for Corporate Decarbonization Metrics Extraction
- URL: http://arxiv.org/abs/2411.03402v1
- Date: Tue, 05 Nov 2024 18:37:51 GMT
- Title: Climate AI for Corporate Decarbonization Metrics Extraction
- Authors: Aditya Dave, Mengchen Zhu, Dapeng Hu, Sachin Tiwari,
- Abstract summary: We introduce the Climate Artificial Intelligence for Corporate Decarbonization Metrics Extraction (CAI) model and pipeline.
We demonstrate that the process improves data collection efficiency and accuracy by automating data curation, validation, and metric scoring from public corporate disclosures.
- Score: 7.522638089716454
- License:
- Abstract: Corporate Greenhouse Gas (GHG) emission targets are important metrics in sustainable investing [12, 16]. To provide a comprehensive view of company emission objectives, we propose an approach to source these metrics from company public disclosures. Without automation, curating these metrics manually is a labor-intensive process that requires combing through lengthy corporate sustainability disclosures that often do not follow a standard format. Furthermore, the resulting dataset needs to be validated thoroughly by Subject Matter Experts (SMEs), further lengthening the time-to-market. We introduce the Climate Artificial Intelligence for Corporate Decarbonization Metrics Extraction (CAI) model and pipeline, a novel approach utilizing Large Language Models (LLMs) to extract and validate linked metrics from corporate disclosures. We demonstrate that the process improves data collection efficiency and accuracy by automating data curation, validation, and metric scoring from public corporate disclosures. We further show that our results are agnostic to the choice of LLMs. This framework can be applied broadly to information extraction from textual data.
Related papers
- A Scalable Data-Driven Framework for Systematic Analysis of SEC 10-K Filings Using Large Language Models [0.0]
We propose a novel data-driven approach to analyze and rate the performance of companies based on their SEC 10-K filings.
The proposed scheme is then implemented on an interactive GUI as a no-code solution for running the data pipeline and creating the visualizations.
The application showcases the rating results and provides year-on-year comparisons of company performance.
arXiv Detail & Related papers (2024-09-26T06:57:22Z) - Data-Centric AI in the Age of Large Language Models [51.20451986068925]
This position paper proposes a data-centric viewpoint of AI research, focusing on large language models (LLMs)
We make the key observation that data is instrumental in the developmental (e.g., pretraining and fine-tuning) and inferential stages (e.g., in-context learning) of LLMs.
We identify four specific scenarios centered around data, covering data-centric benchmarks and data curation, data attribution, knowledge transfer, and inference contextualization.
arXiv Detail & Related papers (2024-06-20T16:34:07Z) - Automating Customer Needs Analysis: A Comparative Study of Large Language Models in the Travel Industry [2.4244694855867275]
Large Language Models (LLMs) have emerged as powerful tools for extracting valuable insights from vast amounts of textual data.
In this study, we conduct a comparative analysis of LLMs for the extraction of travel customer needs from TripAdvisor posts.
Our findings highlight the efficacy of opensource LLMs, particularly Mistral 7B, in achieving comparable performance to larger closed models.
arXiv Detail & Related papers (2024-04-27T18:28:10Z) - Data Acquisition: A New Frontier in Data-centric AI [65.90972015426274]
We first present an investigation of current data marketplaces, revealing lack of platforms offering detailed information about datasets.
We then introduce the DAM challenge, a benchmark to model the interaction between the data providers and acquirers.
Our evaluation of the submitted strategies underlines the need for effective data acquisition strategies in Machine Learning.
arXiv Detail & Related papers (2023-11-22T22:15:17Z) - From Quantity to Quality: Boosting LLM Performance with Self-Guided Data Selection for Instruction Tuning [52.257422715393574]
We introduce a self-guided methodology for Large Language Models (LLMs) to autonomously discern and select cherry samples from open-source datasets.
Our key innovation, the Instruction-Following Difficulty (IFD) metric, emerges as a pivotal metric to identify discrepancies between a model's expected responses and its intrinsic generation capability.
arXiv Detail & Related papers (2023-08-23T09:45:29Z) - Harnessing the Web and Knowledge Graphs for Automated Impact Investing
Scoring [2.4107880640624706]
We describe a data-driven system that seeks to automate the process of creating an Sustainable Development Goals framework.
We propose a novel method for collecting and filtering a dataset of texts from different web sources and a knowledge graph relevant to a set of companies.
Our results indicate that our best performing model can accurately predict SDG scores with a micro average F1 score of 0.89.
arXiv Detail & Related papers (2023-08-04T15:14:16Z) - Benchmarking Automated Machine Learning Methods for Price Forecasting
Applications [58.720142291102135]
We show the possibility of substituting manually created ML pipelines with automated machine learning (AutoML) solutions.
Based on the CRISP-DM process, we split the manual ML pipeline into a machine learning and non-machine learning part.
We show in a case study for the industrial use case of price forecasting, that domain knowledge combined with AutoML can weaken the dependence on ML experts.
arXiv Detail & Related papers (2023-04-28T10:27:38Z) - METAM: Goal-Oriented Data Discovery [9.73435089036831]
METAM is a goal-oriented framework that queries the downstream task with a candidate dataset, forming a feedback loop that automatically steers the discovery and augmentation process.
We show METAM's theoretical guarantees and demonstrate those empirically on a broad set of tasks.
arXiv Detail & Related papers (2023-04-18T15:42:25Z) - Greenhouse gases emissions: estimating corporate non-reported emissions
using interpretable machine learning [0.0]
As of 2022, greenhouse gases (GHG) emissions reporting and auditing are not yet compulsory for all companies.
We propose a machine learning-based model to estimate scope 1 and scope 2 GHG emissions of companies not reporting them yet.
arXiv Detail & Related papers (2022-12-21T08:36:02Z) - Privacy Adhering Machine Un-learning in NLP [66.17039929803933]
In real world industry use Machine Learning to build models on user data.
Such mandates require effort both in terms of data as well as model retraining.
continuous removal of data and model retraining steps do not scale.
We propose textitMachine Unlearning to tackle this challenge.
arXiv Detail & Related papers (2022-12-19T16:06:45Z) - Exploring validation metrics for offline model-based optimisation with
diffusion models [50.404829846182764]
In model-based optimisation (MBO) we are interested in using machine learning to design candidates that maximise some measure of reward with respect to a black box function called the (ground truth) oracle.
While an approximation to the ground oracle can be trained and used in place of it during model validation to measure the mean reward over generated candidates, the evaluation is approximate and vulnerable to adversarial examples.
This is encapsulated under our proposed evaluation framework which is also designed to measure extrapolation.
arXiv Detail & Related papers (2022-11-19T16:57:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.