Toward Effective Automated Content Analysis via Crowdsourcing
- URL: http://arxiv.org/abs/2101.04615v1
- Date: Tue, 12 Jan 2021 17:14:18 GMT
- Title: Toward Effective Automated Content Analysis via Crowdsourcing
- Authors: Jiele Wu, Chau-Wai Wong, Xinyan Zhao, Xianpeng Liu
- Abstract summary: We propose a quality-aware semantic data annotation system for online workers.
With timely feedback on workers' performance quantified by quality scores, better informed online workers can maintain the quality of their labeling.
Our results suggest that researchers can collect high-quality answers of subjective semantic features at a large scale.
- Score: 6.89765603922453
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Many computer scientists use the aggregated answers of online workers to
represent ground truth. Prior work has shown that aggregation methods such as
majority voting are effective for measuring relatively objective features. For
subjective features such as semantic connotation, online workers, known for
optimizing their hourly earnings, tend to deteriorate in the quality of their
responses as they work longer. In this paper, we aim to address this issue by
proposing a quality-aware semantic data annotation system. We observe that with
timely feedback on workers' performance quantified by quality scores, better
informed online workers can maintain the quality of their labeling throughout
an extended period of time. We validate the effectiveness of the proposed
annotation system through i) evaluating performance based on an expert-labeled
dataset, and ii) demonstrating machine learning tasks that can lead to
consistent learning behavior with 70%-80% accuracy. Our results suggest that
with our system, researchers can collect high-quality answers of subjective
semantic features at a large scale.
Related papers
- Likelihood as a Performance Gauge for Retrieval-Augmented Generation [78.28197013467157]
We show that likelihoods serve as an effective gauge for language model performance.
We propose two methods that use question likelihood as a gauge for selecting and constructing prompts that lead to better performance.
arXiv Detail & Related papers (2024-11-12T13:14:09Z) - Iterative Feature Boosting for Explainable Speech Emotion Recognition [17.568724398229232]
We present a new supervised SER method based on an efficient feature engineering approach.
We pay particular attention to the explainability of results to evaluate feature relevance and refine feature sets.
The proposed method outperforms human-level performance (HLP) and state-of-the-art machine learning methods in emotion recognition on the TESS dataset.
arXiv Detail & Related papers (2024-05-30T15:44:27Z) - Data Quality in Crowdsourcing and Spamming Behavior Detection [2.6481162211614118]
We introduce a systematic method for evaluating data quality and detecting spamming threats via variance decomposition.
A spammer index is proposed to assess entire data consistency and two metrics are developed to measure crowd worker's credibility.
arXiv Detail & Related papers (2024-04-04T02:21:38Z) - Is Reference Necessary in the Evaluation of NLG Systems? When and Where? [58.52957222172377]
We show that reference-free metrics exhibit a higher correlation with human judgment and greater sensitivity to deficiencies in language quality.
Our study can provide insight into the appropriate application of automatic metrics and the impact of metric choice on evaluation performance.
arXiv Detail & Related papers (2024-03-21T10:31:11Z) - rWISDM: Repaired WISDM, a Public Dataset for Human Activity Recognition [0.0]
Human Activity Recognition (HAR) has become a spotlight in recent scientific research because of its applications in various domains such as healthcare, athletic competitions, smart cities, and smart home.
This paper presents the methods by which other researchers may identify and correct similar problems in public datasets.
arXiv Detail & Related papers (2023-05-17T13:55:50Z) - Active Learning of Ordinal Embeddings: A User Study on Football Data [4.856635699699126]
Humans innately measure distance between instances in an unlabeled dataset using an unknown similarity function.
This work uses deep metric learning to learn these user-defined similarity functions from few annotations for a large football trajectory dataset.
arXiv Detail & Related papers (2022-07-26T07:55:23Z) - An Empirical Investigation of Commonsense Self-Supervision with
Knowledge Graphs [67.23285413610243]
Self-supervision based on the information extracted from large knowledge graphs has been shown to improve the generalization of language models.
We study the effect of knowledge sampling strategies and sizes that can be used to generate synthetic data for adapting language models.
arXiv Detail & Related papers (2022-05-21T19:49:04Z) - Compactness Score: A Fast Filter Method for Unsupervised Feature
Selection [66.84571085643928]
We propose a fast unsupervised feature selection method, named as, Compactness Score (CSUFS) to select desired features.
Our proposed algorithm seems to be more accurate and efficient compared with existing algorithms.
arXiv Detail & Related papers (2022-01-31T13:01:37Z) - Low-Regret Active learning [64.36270166907788]
We develop an online learning algorithm for identifying unlabeled data points that are most informative for training.
At the core of our work is an efficient algorithm for sleeping experts that is tailored to achieve low regret on predictable (easy) instances.
arXiv Detail & Related papers (2021-04-06T22:53:45Z) - Improving Few-Shot Learning with Auxiliary Self-Supervised Pretext Tasks [0.0]
Recent work on few-shot learning shows that quality of learned representations plays an important role in few-shot classification performance.
On the other hand, the goal of self-supervised learning is to recover useful semantic information of the data without the use of class labels.
We exploit the complementarity of both paradigms via a multi-task framework where we leverage recent self-supervised methods as auxiliary tasks.
arXiv Detail & Related papers (2021-01-24T23:21:43Z) - Mining Implicit Entity Preference from User-Item Interaction Data for
Knowledge Graph Completion via Adversarial Learning [82.46332224556257]
We propose a novel adversarial learning approach by leveraging user interaction data for the Knowledge Graph Completion task.
Our generator is isolated from user interaction data, and serves to improve the performance of the discriminator.
To discover implicit entity preference of users, we design an elaborate collaborative learning algorithms based on graph neural networks.
arXiv Detail & Related papers (2020-03-28T05:47:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.