A Pitfall of Learning from User-generated Data: In-depth Analysis of
Subjective Class Problem
- URL: http://arxiv.org/abs/2003.10621v1
- Date: Tue, 24 Mar 2020 02:25:52 GMT
- Title: A Pitfall of Learning from User-generated Data: In-depth Analysis of
Subjective Class Problem
- Authors: Kei Nemoto and Shweta Jain
- Abstract summary: We propose two types of classes in user-defined labels: subjective class and objective class.
We define this as a subjective class issue and provide a framework for detecting subjective labels in a dataset without oracle querying.
- Score: 1.218340575383456
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Research in the supervised learning algorithms field implicitly assumes that
training data is labeled by domain experts or at least semi-professional
labelers accessible through crowdsourcing services like Amazon Mechanical Turk.
With the advent of the Internet, data has become abundant and a large number of
machine learning based systems started being trained with user-generated data,
using categorical data as true labels. However, little work has been done in
the area of supervised learning with user-defined labels where users are not
necessarily experts and might be motivated to provide incorrect labels in order
to improve their own utility from the system. In this article, we propose two
types of classes in user-defined labels: subjective class and objective class -
showing that the objective classes are as reliable as if they were provided by
domain experts, whereas the subjective classes are subject to bias and
manipulation by the user. We define this as a subjective class issue and
provide a framework for detecting subjective labels in a dataset without
querying oracle. Using this framework, data mining practitioners can detect a
subjective class at an early stage of their projects, and avoid wasting their
precious time and resources by dealing with subjective class problem with
traditional machine learning techniques.
Related papers
- Towards Open-Domain Topic Classification [69.21234350688098]
We introduce an open-domain topic classification system that accepts user-defined taxonomy in real time.
Users will be able to classify a text snippet with respect to any candidate labels they want, and get instant response from our web interface.
arXiv Detail & Related papers (2023-06-29T20:25:28Z) - AutoWS: Automated Weak Supervision Framework for Text Classification [1.748907524043535]
We propose a novel framework for increasing the efficiency of weak supervision process while decreasing the dependency on domain experts.
Our method requires a small set of labeled examples per label class and automatically creates a set of labeling functions to assign noisy labels to numerous unlabeled data.
arXiv Detail & Related papers (2023-02-07T07:12:05Z) - On Guiding Visual Attention with Language Specification [76.08326100891571]
We use high-level language specification as advice for constraining the classification evidence to task-relevant features, instead of distractors.
We show that supervising spatial attention in this way improves performance on classification tasks with biased and noisy data.
arXiv Detail & Related papers (2022-02-17T22:40:19Z) - Debiased Pseudo Labeling in Self-Training [77.83549261035277]
Deep neural networks achieve remarkable performances on a wide range of tasks with the aid of large-scale labeled datasets.
To mitigate the requirement for labeled data, self-training is widely used in both academia and industry by pseudo labeling on readily-available unlabeled data.
We propose Debiased, in which the generation and utilization of pseudo labels are decoupled by two independent heads.
arXiv Detail & Related papers (2022-02-15T02:14:33Z) - Classification of Consumer Belief Statements From Social Media [0.0]
We study how complex expert annotations can be leveraged successfully for classification.
We find that automated class abstraction approaches perform remarkably well against domain expert baseline on text classification tasks.
arXiv Detail & Related papers (2021-06-29T15:25:33Z) - Streaming Self-Training via Domain-Agnostic Unlabeled Images [62.57647373581592]
We present streaming self-training (SST) that aims to democratize the process of learning visual recognition models.
Key to SST are two crucial observations: (1) domain-agnostic unlabeled images enable us to learn better models with a few labeled examples without any additional knowledge or supervision; and (2) learning is a continuous process and can be done by constructing a schedule of learning updates.
arXiv Detail & Related papers (2021-04-07T17:58:39Z) - End-to-End Learning from Noisy Crowd to Supervised Machine Learning
Models [6.278267504352446]
We advocate using hybrid intelligence, i.e., combining deep models and human experts, to design an end-to-end learning framework from noisy crowd-sourced data.
We show how label aggregation can benefit from estimating the annotators' confusion matrix to improve the learning process.
We demonstrate the effectiveness of our strategies on several image datasets, using SVM and deep neural networks.
arXiv Detail & Related papers (2020-11-13T09:48:30Z) - Adversarial Knowledge Transfer from Unlabeled Data [62.97253639100014]
We present a novel Adversarial Knowledge Transfer framework for transferring knowledge from internet-scale unlabeled data to improve the performance of a classifier.
An important novel aspect of our method is that the unlabeled source data can be of different classes from those of the labeled target data, and there is no need to define a separate pretext task.
arXiv Detail & Related papers (2020-08-13T08:04:27Z) - Deep Active Learning with Crowdsourcing Data for Privacy Policy
Classification [6.5443502434659955]
Active learning and crowdsourcing techniques are used to develop an automated classification tool named Calpric.
Calpric is able to perform annotation equivalent to those done by skilled human annotators with high accuracy while minimizing the labeling cost.
Our model is able to achieve the same F1 score using only 62% of the original labeling effort.
arXiv Detail & Related papers (2020-08-07T02:13:31Z) - Automatically Discovering and Learning New Visual Categories with
Ranking Statistics [145.89790963544314]
We tackle the problem of discovering novel classes in an image collection given labelled examples of other classes.
We learn a general-purpose clustering model and use the latter to identify the new classes in the unlabelled data.
We evaluate our approach on standard classification benchmarks and outperform current methods for novel category discovery by a significant margin.
arXiv Detail & Related papers (2020-02-13T18:53:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.