Using novel data and ensemble models to improve automated labeling of
Sustainable Development Goals
- URL: http://arxiv.org/abs/2301.11353v1
- Date: Wed, 25 Jan 2023 07:44:46 GMT
- Title: Using novel data and ensemble models to improve automated labeling of
Sustainable Development Goals
- Authors: Dirk U. Wulff, Dominik S. Meier, Rui Mata
- Abstract summary: A number of labeling systems based on text have been proposed to help monitor work on the United Nations (UN) Sustainable Development Goals.
We show that systems differ considerably in their specificity (i.e., true-positive rate) and sensitivity (i.e., true-negative rate)
We then show that an ensemble model that pools labeling systems alleviates some of these limitations, exceeding the labeling performance of all currently available systems.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: A number of labeling systems based on text have been proposed to help monitor
work on the United Nations (UN) Sustainable Development Goals (SDGs). Here, we
present a systematic comparison of systems using a variety of text sources and
show that systems differ considerably in their specificity (i.e., true-positive
rate) and sensitivity (i.e., true-negative rate), have systematic biases (e.g.,
are more sensitive to specific SDGs relative to others), and are susceptible to
the type and amount of text analyzed. We then show that an ensemble model that
pools labeling systems alleviates some of these limitations, exceeding the
labeling performance of all currently available systems. We conclude that
researchers and policymakers should care about the choice of labeling system
and that ensemble methods should be favored when drawing conclusions about the
absolute and relative prevalence of work on the SDGs based on automated
methods.
Related papers
- VERA: Validation and Evaluation of Retrieval-Augmented Systems [5.709401805125129]
VERA is a framework designed to enhance the transparency and reliability of outputs from large language models (LLMs)
We show how VERA can strengthen decision-making processes and trust in AI applications.
arXiv Detail & Related papers (2024-08-16T21:59:59Z) - RAGEval: Scenario Specific RAG Evaluation Dataset Generation Framework [69.4501863547618]
This paper introduces RAGEval, a framework designed to assess RAG systems across diverse scenarios.
With a focus on factual accuracy, we propose three novel metrics Completeness, Hallucination, and Irrelevance.
Experimental results show that RAGEval outperforms zero-shot and one-shot methods in terms of clarity, safety, conformity, and richness of generated samples.
arXiv Detail & Related papers (2024-08-02T13:35:11Z) - Interactive System-wise Anomaly Detection [66.3766756452743]
Anomaly detection plays a fundamental role in various applications.
It is challenging for existing methods to handle the scenarios where the instances are systems whose characteristics are not readily observed as data.
We develop an end-to-end approach which includes an encoder-decoder module that learns system embeddings.
arXiv Detail & Related papers (2023-04-21T02:20:24Z) - Stable Bias: Analyzing Societal Representations in Diffusion Models [72.27121528451528]
We propose a new method for exploring the social biases in Text-to-Image (TTI) systems.
Our approach relies on characterizing the variation in generated images triggered by enumerating gender and ethnicity markers in the prompts.
We leverage this method to analyze images generated by 3 popular TTI systems and find that while all of their outputs show correlations with US labor demographics, they also consistently under-represent marginalized identities to different extents.
arXiv Detail & Related papers (2023-03-20T19:32:49Z) - Label-Efficient Interactive Time-Series Anomaly Detection [17.799924009674694]
We propose a Label-Efficient Interactive Time-Series Anomaly Detection (LEIAD) system.
To achieve this goal, the system integrates weak supervision and active learning collaboratively.
We conduct experiments on three time-series anomaly detection datasets, demonstrating that the proposed system is superior to existing solutions.
arXiv Detail & Related papers (2022-12-30T10:16:15Z) - Quality-Based Conditional Processing in Multi-Biometrics: Application to
Sensor Interoperability [63.05238390013457]
We describe and evaluate the ATVS-UAM fusion approach submitted to the quality-based evaluation of the 2007 BioSecure Multimodal Evaluation Campaign.
Our approach is based on linear logistic regression, in which fused scores tend to be log-likelihood-ratios.
Results show that the proposed approach outperforms all the rule-based fusion schemes.
arXiv Detail & Related papers (2022-11-24T12:11:22Z) - What are the best systems? New perspectives on NLP Benchmarking [10.27421161397197]
We propose a new procedure to rank systems based on their performance across different tasks.
Motivated by the social choice theory, the final system ordering is obtained through aggregating the rankings induced by each task.
We show that our method yields different conclusions on state-of-the-art systems than the mean-aggregation procedure.
arXiv Detail & Related papers (2022-02-08T11:44:20Z) - Anomaly Detection Based on Selection and Weighting in Latent Space [73.01328671569759]
We propose a novel selection-and-weighting-based anomaly detection framework called SWAD.
Experiments on both benchmark and real-world datasets have shown the effectiveness and superiority of SWAD.
arXiv Detail & Related papers (2021-03-08T10:56:38Z) - Semi-Supervised Learning with GANs for Device-Free Fingerprinting Indoor
Localization [6.939464860621602]
Device-free wireless indoor localization is a key enabling technology for the Internet of Things (IoT)
This paper proposes a semi-supervised, generative adversarial network (GAN)-based device-free fingerprinting indoor localization system.
arXiv Detail & Related papers (2020-08-17T06:32:13Z) - Foreseeing the Benefits of Incidental Supervision [83.08441990812636]
This paper studies whether we can, in a single framework, quantify the benefits of various types of incidental signals for a given target task without going through experiments.
We propose a unified PAC-Bayesian motivated informativeness measure, PABI, that characterizes the uncertainty reduction provided by incidental supervision signals.
arXiv Detail & Related papers (2020-06-09T20:59:42Z) - A Human Evaluation of AMR-to-English Generation Systems [13.10463139842285]
We present the results of a new human evaluation which collects fluency and adequacy scores, as well as categorization of error types.
We discuss the relative quality of these systems and how our results compare to those of automatic metrics.
arXiv Detail & Related papers (2020-04-14T21:41:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.