EcoVerse: An Annotated Twitter Dataset for Eco-Relevance Classification, Environmental Impact Analysis, and Stance Detection
- URL: http://arxiv.org/abs/2404.05133v1
- Date: Mon, 8 Apr 2024 01:21:11 GMT
- Title: EcoVerse: An Annotated Twitter Dataset for Eco-Relevance Classification, Environmental Impact Analysis, and Stance Detection
- Authors: Francesca Grasso, Stefano Locci, Giovanni Siragusa, Luigi Di Caro,
- Abstract summary: EcoVerse is an annotated English Twitter dataset of 3,023 tweets spanning a wide spectrum of environmental topics.
We propose a three-level annotation scheme designed for Eco-Relevance Classification, Stance Detection, and introducing an original approach for Environmental Impact Analysis.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Anthropogenic ecological crisis constitutes a significant challenge that all within the academy must urgently face, including the Natural Language Processing (NLP) community. While recent years have seen increasing work revolving around climate-centric discourse, crucial environmental and ecological topics outside of climate change remain largely unaddressed, despite their prominent importance. Mainstream NLP tasks, such as sentiment analysis, dominate the scene, but there remains an untouched space in the literature involving the analysis of environmental impacts of certain events and practices. To address this gap, this paper presents EcoVerse, an annotated English Twitter dataset of 3,023 tweets spanning a wide spectrum of environmental topics. We propose a three-level annotation scheme designed for Eco-Relevance Classification, Stance Detection, and introducing an original approach for Environmental Impact Analysis. We detail the data collection, filtering, and labeling process that led to the creation of the dataset. Remarkable Inter-Annotator Agreement indicates that the annotation scheme produces consistent annotations of high quality. Subsequent classification experiments using BERT-based models, including ClimateBERT, are presented. These yield encouraging results, while also indicating room for a model specifically tailored for environmental texts. The dataset is made freely available to stimulate further research.
Related papers
- Multi-environment Topic Models [8.609587510471943]
We introduce the Multi-environment Topic Model (MTM), an unsupervised probabilistic model that separates global and environment-specific terms.
We show that the MTM produces interpretable global topics with distinct environment-specific words.
It also enables the discovery of accurate causal effects.
arXiv Detail & Related papers (2024-10-31T16:50:39Z) - Combining Observational Data and Language for Species Range Estimation [63.65684199946094]
We propose a novel approach combining millions of citizen science species observations with textual descriptions from Wikipedia.
Our framework maps locations, species, and text descriptions into a common space, enabling zero-shot range estimation from textual descriptions.
Our approach also acts as a strong prior when combined with observational data, resulting in more accurate range estimation with less data.
arXiv Detail & Related papers (2024-10-14T17:22:55Z) - VegeDiff: Latent Diffusion Model for Geospatial Vegetation Forecasting [58.12667617617306]
We propose VegeDiff for the geospatial vegetation forecasting task.
VegeDiff is the first to employ a diffusion model to probabilistically capture the uncertainties in vegetation change processes.
By capturing the uncertainties in vegetation changes and modeling the complex influence of relevant variables, VegeDiff outperforms existing deterministic methods.
arXiv Detail & Related papers (2024-07-17T14:15:52Z) - Towards A Comprehensive Assessment of AI's Environmental Impact [0.5982922468400899]
Recent surge of interest in machine learning has sparked a trend towards large-scale adoption of AI/ML.
There is a need for a framework that monitors the environmental impact and degradation from AI/ML throughout its lifecycle.
This study proposes a methodology to track environmental variables relating to the multifaceted impact of AI around datacenters using openly available energy data and globally acquired satellite observations.
arXiv Detail & Related papers (2024-05-22T21:19:35Z) - FREE: The Foundational Semantic Recognition for Modeling Environmental Ecosystems [28.166089112650926]
FREE maps available environmental data into a text space and then converts the traditional predictive modeling task in environmental science to the semantic recognition problem.
When used for long-term prediction, FREE has the flexibility to incorporate newly collected observations to enhance future prediction.
The efficacy of FREE is evaluated in the context of two societally important real-world applications, predicting stream water temperature in the Delaware River Basin and predicting annual corn yield in Illinois and Iowa.
arXiv Detail & Related papers (2023-11-17T00:53:09Z) - SatBird: Bird Species Distribution Modeling with Remote Sensing and
Citizen Science Data [68.2366021016172]
We present SatBird, a satellite dataset of locations in the USA with labels derived from presence-absence observation data from the citizen science database eBird.
We also provide a dataset in Kenya representing low-data regimes.
We benchmark a set of baselines on our dataset, including SOTA models for remote sensing tasks.
arXiv Detail & Related papers (2023-11-02T02:00:27Z) - A Comparative Study of Machine Learning Algorithms for Anomaly Detection
in Industrial Environments: Performance and Environmental Impact [62.997667081978825]
This study seeks to address the demands of high-performance machine learning models with environmental sustainability.
Traditional machine learning algorithms, such as Decision Trees and Random Forests, demonstrate robust efficiency and performance.
However, superior outcomes were obtained with optimised configurations, albeit with a commensurate increase in resource consumption.
arXiv Detail & Related papers (2023-07-01T15:18:00Z) - Delving into High-Quality Synthetic Face Occlusion Segmentation Datasets [83.749895930242]
We propose two techniques for producing high-quality naturalistic synthetic occluded faces.
We empirically show the effectiveness and robustness of both methods, even for unseen occlusions.
We present two high-resolution real-world occluded face datasets with fine-grained annotations, RealOcc and RealOcc-Wild.
arXiv Detail & Related papers (2022-05-12T17:03:57Z) - Unraveling the hidden environmental impacts of AI solutions for
environment [0.04588028371034406]
In the past ten years artificial intelligence has encountered such dramatic progress that it is seen now as a tool of choice to solve environmental issues.
The deep learning community began to realize that training models with more and more parameters required a lot of energy and as a consequence GHG emissions.
This article proposes to study the possible negative impact of "AI for green"
arXiv Detail & Related papers (2021-10-22T14:56:47Z) - Analyzing Sustainability Reports Using Natural Language Processing [68.8204255655161]
In recent years, companies have increasingly been aiming to both mitigate their environmental impact and adapt to the changing climate context.
This is reported via increasingly exhaustive reports, which cover many types of climate risks and exposures under the umbrella of Environmental, Social, and Governance (ESG)
We present this tool and the methodology that we used to develop it in the present article.
arXiv Detail & Related papers (2020-11-03T21:22:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.