Eco-Amazon: Enriching E-commerce Datasets with Product Carbon Footprint for Sustainable Recommendations
- URL: http://arxiv.org/abs/2602.15508v1
- Date: Tue, 17 Feb 2026 11:30:11 GMT
- Title: Eco-Amazon: Enriching E-commerce Datasets with Product Carbon Footprint for Sustainable Recommendations
- Authors: Giuseppe Spillo, Allegra De Filippo, Cataldo Musto, Michela Milano, Giovanni Semeraro,
- Abstract summary: This paper introduces Eco-Amazon, a novel resource designed to bridge the gap between item-level environmental impact data and standard benchmarks.<n>Our contribution is three-fold: (i) the release of the Eco-Amazon datasets, enriching item metadata with PCF signals; (ii) the LLM-based PCF estimation script, which allows researchers to enrich any product catalogue and reproduce our results; (iii) a use case demonstrating how PCF estimates can be exploited to promote more sustainable products.
- Score: 7.062728225568673
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the era of responsible and sustainable AI, information retrieval and recommender systems must expand their scope beyond traditional accuracy metrics to incorporate environmental sustainability. However, this research line is severely limited by the lack of item-level environmental impact data in standard benchmarks. This paper introduces Eco-Amazon, a novel resource designed to bridge this gap. Our resource consists of an enriched version of three widely used Amazon datasets (i.e., Home, Clothing, and Electronics) augmented with Product Carbon Footprint (PCF) metadata. CO2e emission scores were generated using a zero-shot framework that leverages Large Language Models (LLMs) to estimate item-level PCF based on product attributes. Our contribution is three-fold: (i) the release of the Eco-Amazon datasets, enriching item metadata with PCF signals; (ii) the LLM-based PCF estimation script, which allows researchers to enrich any product catalogue and reproduce our results; (iii) a use case demonstrating how PCF estimates can be exploited to promote more sustainable products. By providing these environmental signals, Eco-Amazon enables the community to develop, benchmark, and evaluate the next generation of sustainable retrieval and recommendation models. Our resource is available at https://doi.org/10.5281/zenodo.18549130, while our source code is available at: http://github.com/giuspillo/EcoAmazon/.
Related papers
- Towards Autonomous Sustainability Assessment via Multimodal AI Agents [46.77807327332175]
We introduce multimodal AI agents to calculate cradle-to-gate carbon emissions of electronic devices.<n>The approach reduces weeks or months of expert time to under one minute and closes data availability gaps.<n>It yields carbon footprint estimates within 19% of expert LCAs with zero proprietary data.
arXiv Detail & Related papers (2025-07-22T20:49:25Z) - How Hungry is AI? Benchmarking Energy, Water, and Carbon Footprint of LLM Inference [0.0]
This paper introduces a novel infrastructure-aware benchmarking framework for quantifying the environmental footprint of AI inference across 30 state-of-the-art models as deployed in commercial data centers.<n>Our results show that o3 and DeepSeek-R1 emerge as the most energy-intensive models, consuming over 33 Wh per long prompt, more than 70 times the consumption of GPT-4.1 nano, and that Claude-3.7 Sonnet ranks highest in eco-efficiency.<n>These findings illustrate a growing paradox: Although AI is becoming cheaper and faster, its global adoption drives disproportionate resource consumption.
arXiv Detail & Related papers (2025-05-14T17:47:00Z) - The Dual-use Dilemma in LLMs: Do Empowering Ethical Capacities Make a Degraded Utility? [54.18519360412294]
Large Language Models (LLMs) must balance between rejecting harmful requests for safety and accommodating legitimate ones for utility.<n>This paper presents a Direct Preference Optimization (DPO) based alignment framework that achieves better overall performance.<n>We analyze experimental results obtained from testing DeepSeek-R1 on our benchmark and reveal the critical ethical concerns raised by this highly acclaimed model.
arXiv Detail & Related papers (2025-01-20T06:35:01Z) - Environmental large language model Evaluation (ELLE) dataset: A Benchmark for Evaluating Generative AI applications in Eco-environment Domain [6.246205449407889]
Generative AI holds significant potential for ecological and environmental applications.<n>The Environmental Large Language model Evaluation (ELLE) dataset is the first benchmark designed to assess large language models.<n>ELLE dataset includes 1,130 question answer pairs across 16 environmental topics, categorized by domain, difficulty, and type.
arXiv Detail & Related papers (2025-01-10T12:48:29Z) - Exploring the Escalation of Source Bias in User, Data, and Recommender System Feedback Loop [65.23044868332693]
We explore how AI-generated content (AIGC) affects the performance and dynamics of recommender systems.<n>In the short term, bias toward AIGC encourages LLM-based content creation, increasing AIGC content, and causing unfair traffic distribution.<n>We propose a debiasing method based on L1-loss optimization to maintain long-term content ecosystem balance.
arXiv Detail & Related papers (2024-05-28T09:34:50Z) - EcoVerse: An Annotated Twitter Dataset for Eco-Relevance Classification, Environmental Impact Analysis, and Stance Detection [0.0]
EcoVerse is an annotated English Twitter dataset of 3,023 tweets spanning a wide spectrum of environmental topics.
We propose a three-level annotation scheme designed for Eco-Relevance Classification, Stance Detection, and introducing an original approach for Environmental Impact Analysis.
arXiv Detail & Related papers (2024-04-08T01:21:11Z) - FREE: The Foundational Semantic Recognition for Modeling Environmental Ecosystems [56.0640340392818]
We introduce a framework, FREE, that enables the use of varying features and available information to train a universal model.<n>The core idea is to map available environmental data into a text space and then convert the traditional predictive modeling task in environmental science to a semantic recognition problem.<n>Our evaluation on two societally important real-world applications, stream water temperature prediction and crop yield prediction, demonstrates the superiority of FREE over multiple baselines.
arXiv Detail & Related papers (2023-11-17T00:53:09Z) - A Comparative Study of Machine Learning Algorithms for Anomaly Detection
in Industrial Environments: Performance and Environmental Impact [62.997667081978825]
This study seeks to address the demands of high-performance machine learning models with environmental sustainability.
Traditional machine learning algorithms, such as Decision Trees and Random Forests, demonstrate robust efficiency and performance.
However, superior outcomes were obtained with optimised configurations, albeit with a commensurate increase in resource consumption.
arXiv Detail & Related papers (2023-07-01T15:18:00Z) - GreenDB -- A Dataset and Benchmark for Extraction of Sustainability
Information of Consumer Goods [58.31888171187044]
We present GreenDB, a database that collects products from European online shops on a weekly basis.
As proxy for the products' sustainability, it relies on sustainability labels, which are evaluated by experts.
We present initial results demonstrating that ML models trained with our data can reliably predict the sustainability label of products.
arXiv Detail & Related papers (2022-07-21T19:59:42Z) - GreenDB: Toward a Product-by-Product Sustainability Database [2.9971739294416717]
Modern retail platforms rely heavily on Machine Learning (ML) for their search and recommender systems.
No open and publicly available database integrates sustainability information on a product-by-product basis.
We present our proof of concept implementation of a scraping system that creates the GreenDB dataset.
arXiv Detail & Related papers (2022-05-05T20:24:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.