Group Reasoning Emission Estimation Networks
- URL: http://arxiv.org/abs/2502.06874v1
- Date: Sat, 08 Feb 2025 09:02:43 GMT
- Title: Group Reasoning Emission Estimation Networks
- Authors: Yanming Guo, Xiao Qian, Kevin Credit, Jin Ma,
- Abstract summary: We introduce an AI-driven carbon accounting framework that standardizes enterprise-level emission estimation.
We use a novel reasoning approach with large language models (LLMs)
Experiments on 1,114 NAICS categories yield state-of-the-art performance.
- Score: 11.479035866165926
- License:
- Abstract: Accurate greenhouse gas (GHG) emission reporting is critical for governments, businesses, and investors. However, adoption remains limited particularly among small and medium enterprises due to high implementation costs, fragmented emission factor databases, and a lack of robust sector classification methods. To address these challenges, we introduce Group Reasoning Emission Estimation Networks (GREEN), an AI-driven carbon accounting framework that standardizes enterprise-level emission estimation, constructs a large-scale benchmark dataset, and leverages a novel reasoning approach with large language models (LLMs). Specifically, we compile textual descriptions for 20,850 companies with validated North American Industry Classification System (NAICS) labels and align these with an economic model of carbon intensity factors. By reframing sector classification as an information retrieval task, we fine-tune Sentence-BERT models using a contrastive learning loss. To overcome the limitations of single-stage models in handling thousands of hierarchical categories, we propose a Group Reasoning method that ensembles LLM classifiers based on the natural NAICS ontology, decomposing the task into multiple sub-classification steps. We theoretically prove that this approach reduces classification uncertainty and computational complexity. Experiments on 1,114 NAICS categories yield state-of-the-art performance (83.68% Top-1, 91.47% Top-10 accuracy), and case studies on 20 companies report a mean absolute percentage error (MAPE) of 45.88%. The project is available at: https://huggingface.co/datasets/Yvnminc/ExioNAICS.
Related papers
- Are Large Language Models Good Classifiers? A Study on Edit Intent Classification in Scientific Document Revisions [62.12545440385489]
Large language models (LLMs) have brought substantial advancements in text generation, but their potential for enhancing classification tasks remains underexplored.
We propose a framework for thoroughly investigating fine-tuning LLMs for classification, including both generation- and encoding-based approaches.
We instantiate this framework in edit intent classification (EIC), a challenging and underexplored classification task.
arXiv Detail & Related papers (2024-10-02T20:48:28Z) - Explainable automatic industrial carbon footprint estimation from bank transaction classification using natural language processing [6.354358255072839]
The proposed solution estimates the CO2 emissions associated with bank transactions.
It is based on an evaluation of the influence of the input terms extracted from the descriptions of transactions using locally interpretable models.
arXiv Detail & Related papers (2024-05-23T12:43:06Z) - Language Model Cascades: Token-level uncertainty and beyond [65.38515344964647]
Recent advances in language models (LMs) have led to significant improvements in quality on complex NLP tasks.
Cascading offers a simple strategy to achieve more favorable cost-quality tradeoffs.
We show that incorporating token-level uncertainty through learned post-hoc deferral rules can significantly outperform simple aggregation strategies.
arXiv Detail & Related papers (2024-04-15T21:02:48Z) - Investigating the Limitation of CLIP Models: The Worst-Performing
Categories [53.360239882501325]
Contrastive Language-Image Pre-training (CLIP) provides a foundation model by integrating natural language into visual concepts.
It is usually expected that satisfactory overall accuracy can be achieved across numerous domains through well-designed textual prompts.
However, we found that their performance in the worst categories is significantly inferior to the overall performance.
arXiv Detail & Related papers (2023-10-05T05:37:33Z) - Prompt Tuned Embedding Classification for Multi-Label Industry Sector Allocation [2.024620791810963]
This study benchmarks the performance of Prompt Tuning and baselines for multi-label text classification.
It is applied to classifying companies into an investment firm's proprietary industry taxonomy.
We confirm that the model's performance is consistent across both well-known and less-known companies.
arXiv Detail & Related papers (2023-09-21T13:45:32Z) - Supply chain emission estimation using large language models [15.605998085195314]
We propose a first-of-a-kind framework that uses domain-adapted NLP foundation models to estimate Scope 3 emissions.
We compare the performance of the proposed framework with the state-of-the-art text classification models such as TF-IDF, word2Vec, and Zero shot learning.
arXiv Detail & Related papers (2023-08-03T13:06:37Z) - Text Classification via Large Language Models [63.1874290788797]
We introduce Clue And Reasoning Prompting (CARP) to address complex linguistic phenomena involved in text classification.
Remarkably, CARP yields new SOTA performances on 4 out of 5 widely-used text-classification benchmarks.
More importantly, we find that CARP delivers impressive abilities on low-resource and domain-adaptation setups.
arXiv Detail & Related papers (2023-05-15T06:24:45Z) - Open World Classification with Adaptive Negative Samples [89.2422451410507]
Open world classification is a task in natural language processing with key practical relevance and impact.
We propose an approach based on underlineadaptive underlinesamples (ANS) designed to generate effective synthetic open category samples in the training stage.
ANS achieves significant improvements over state-of-the-art methods.
arXiv Detail & Related papers (2023-03-09T21:12:46Z) - A Lottery Ticket Hypothesis Framework for Low-Complexity Device-Robust
Neural Acoustic Scene Classification [78.04177357888284]
We propose a novel neural model compression strategy combining data augmentation, knowledge transfer, pruning, and quantization for device-robust acoustic scene classification (ASC)
We report an efficient joint framework for low-complexity multi-device ASC, called Acoustic Lottery.
arXiv Detail & Related papers (2021-07-03T16:25:24Z) - Large Scale Legal Text Classification Using Transformer Models [0.0]
We study the performance of transformer-based models in combination with strategies such as generative pretraining, gradual unfreezing and discriminative learning rates.
WeLEX quantify the impact of individual steps, such as language model fine-tuning or gradual unfreezing in an ablation study.
arXiv Detail & Related papers (2020-10-24T11:03:01Z) - Needle in a Haystack: Label-Efficient Evaluation under Extreme Class
Imbalance [20.491690754953943]
This paper develops a framework for online evaluation based on adaptive importance sampling.
Experiments demonstrate an average MSE superior to state-of-the-art on fixed label budgets.
arXiv Detail & Related papers (2020-06-12T06:17:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.