Related papers: Toward Effective Automated Content Analysis via Crowdsourcing

Toward Effective Automated Content Analysis via Crowdsourcing

URL: http://arxiv.org/abs/2101.04615v1
Date: Tue, 12 Jan 2021 17:14:18 GMT
Title: Toward Effective Automated Content Analysis via Crowdsourcing
Authors: Jiele Wu, Chau-Wai Wong, Xinyan Zhao, Xianpeng Liu
Abstract summary: We propose a quality-aware semantic data annotation system for online workers. With timely feedback on workers' performance quantified by quality scores, better informed online workers can maintain the quality of their labeling. Our results suggest that researchers can collect high-quality answers of subjective semantic features at a large scale.
Score: 6.89765603922453
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Many computer scientists use the aggregated answers of online workers to represent ground truth. Prior work has shown that aggregation methods such as majority voting are effective for measuring relatively objective features. For subjective features such as semantic connotation, online workers, known for optimizing their hourly earnings, tend to deteriorate in the quality of their responses as they work longer. In this paper, we aim to address this issue by proposing a quality-aware semantic data annotation system. We observe that with timely feedback on workers' performance quantified by quality scores, better informed online workers can maintain the quality of their labeling throughout an extended period of time. We validate the effectiveness of the proposed annotation system through i) evaluating performance based on an expert-labeled dataset, and ii) demonstrating machine learning tasks that can lead to consistent learning behavior with 70%-80% accuracy. Our results suggest that with our system, researchers can collect high-quality answers of subjective semantic features at a large scale.

Related papers

Active Learning Methods for Efficient Data Utilization and Model Performance Enhancement [5.4044723481768235]
This paper gives a detailed overview of Active Learning (AL), which is a strategy in machine learning that helps models achieve better performance using fewer labeled examples. It introduces the basic concepts of AL and discusses how it is used in various fields such as computer vision, natural language processing, transfer learning, and real-world applications.
arXiv Detail & Related papers (2025-04-21T20:42:13Z)
On the Role of Feedback in Test-Time Scaling of Agentic AI Workflows [71.92083784393418]
Agentic AI (systems that autonomously plan and act) are becoming widespread, yet their task success rate on complex tasks remains low.<n>Inference-time alignment relies on three components: sampling, evaluation, and feedback.<n>We introduce Iterative Agent Decoding (IAD), a procedure that repeatedly inserts feedback extracted from different forms of critiques.
arXiv Detail & Related papers (2025-04-02T17:40:47Z)
CritiQ: Mining Data Quality Criteria from Human Preferences [70.35346554179036]
We introduce CritiQ, a novel data selection method that automatically mines criteria from human preferences for data quality. CritiQ Flow employs a manager agent to evolve quality criteria and worker agents to make pairwise judgments. We demonstrate the effectiveness of our method in the code, math, and logic domains.
arXiv Detail & Related papers (2025-02-26T16:33:41Z)
Likelihood as a Performance Gauge for Retrieval-Augmented Generation [78.28197013467157]
We show that likelihoods serve as an effective gauge for language model performance. We propose two methods that use question likelihood as a gauge for selecting and constructing prompts that lead to better performance.
arXiv Detail & Related papers (2024-11-12T13:14:09Z)
Iterative Feature Boosting for Explainable Speech Emotion Recognition [17.568724398229232]
We present a new supervised SER method based on an efficient feature engineering approach. We pay particular attention to the explainability of results to evaluate feature relevance and refine feature sets. The proposed method outperforms human-level performance (HLP) and state-of-the-art machine learning methods in emotion recognition on the TESS dataset.
arXiv Detail & Related papers (2024-05-30T15:44:27Z)
Data Quality in Crowdsourcing and Spamming Behavior Detection [2.6481162211614118]
We introduce a systematic method for evaluating data quality and detecting spamming threats via variance decomposition. A spammer index is proposed to assess entire data consistency and two metrics are developed to measure crowd worker's credibility.
arXiv Detail & Related papers (2024-04-04T02:21:38Z)
Is Reference Necessary in the Evaluation of NLG Systems? When and Where? [58.52957222172377]
We show that reference-free metrics exhibit a higher correlation with human judgment and greater sensitivity to deficiencies in language quality. Our study can provide insight into the appropriate application of automatic metrics and the impact of metric choice on evaluation performance.
arXiv Detail & Related papers (2024-03-21T10:31:11Z)
rWISDM: Repaired WISDM, a Public Dataset for Human Activity Recognition [0.0]
Human Activity Recognition (HAR) has become a spotlight in recent scientific research because of its applications in various domains such as healthcare, athletic competitions, smart cities, and smart home. This paper presents the methods by which other researchers may identify and correct similar problems in public datasets.
arXiv Detail & Related papers (2023-05-17T13:55:50Z)
Active Learning of Ordinal Embeddings: A User Study on Football Data [4.856635699699126]
Humans innately measure distance between instances in an unlabeled dataset using an unknown similarity function. This work uses deep metric learning to learn these user-defined similarity functions from few annotations for a large football trajectory dataset.
arXiv Detail & Related papers (2022-07-26T07:55:23Z)
An Empirical Investigation of Commonsense Self-Supervision with Knowledge Graphs [67.23285413610243]
Self-supervision based on the information extracted from large knowledge graphs has been shown to improve the generalization of language models. We study the effect of knowledge sampling strategies and sizes that can be used to generate synthetic data for adapting language models.
arXiv Detail & Related papers (2022-05-21T19:49:04Z)
Compactness Score: A Fast Filter Method for Unsupervised Feature Selection [66.84571085643928]
We propose a fast unsupervised feature selection method, named as, Compactness Score (CSUFS) to select desired features. Our proposed algorithm seems to be more accurate and efficient compared with existing algorithms.
arXiv Detail & Related papers (2022-01-31T13:01:37Z)
Low-Regret Active learning [64.36270166907788]
We develop an online learning algorithm for identifying unlabeled data points that are most informative for training. At the core of our work is an efficient algorithm for sleeping experts that is tailored to achieve low regret on predictable (easy) instances.
arXiv Detail & Related papers (2021-04-06T22:53:45Z)
Improving Few-Shot Learning with Auxiliary Self-Supervised Pretext Tasks [0.0]
Recent work on few-shot learning shows that quality of learned representations plays an important role in few-shot classification performance. On the other hand, the goal of self-supervised learning is to recover useful semantic information of the data without the use of class labels. We exploit the complementarity of both paradigms via a multi-task framework where we leverage recent self-supervised methods as auxiliary tasks.
arXiv Detail & Related papers (2021-01-24T23:21:43Z)
Mining Implicit Entity Preference from User-Item Interaction Data for Knowledge Graph Completion via Adversarial Learning [82.46332224556257]
We propose a novel adversarial learning approach by leveraging user interaction data for the Knowledge Graph Completion task. Our generator is isolated from user interaction data, and serves to improve the performance of the discriminator. To discover implicit entity preference of users, we design an elaborate collaborative learning algorithms based on graph neural networks.
arXiv Detail & Related papers (2020-03-28T05:47:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.