WOT-Class: Weakly Supervised Open-world Text Classification
- URL: http://arxiv.org/abs/2305.12401v2
- Date: Wed, 22 Nov 2023 23:56:43 GMT
- Title: WOT-Class: Weakly Supervised Open-world Text Classification
- Authors: Tianle Wang, Zihan Wang, Weitang Liu and Jingbo Shang
- Abstract summary: We work on a novel problem of weakly supervised open-world text classification.
We propose a novel framework WOT-Class that lifts strong assumptions.
Experiments on 7 popular text classification datasets demonstrate that WOT-Class outperforms strong baselines.
- Score: 41.77945049159303
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: State-of-the-art weakly supervised text classification methods, while
significantly reduced the required human supervision, still requires the
supervision to cover all the classes of interest. This is never easy to meet in
practice when human explore new, large corpora without complete pictures. In
this paper, we work on a novel yet important problem of weakly supervised
open-world text classification, where supervision is only needed for a few
examples from a few known classes and the machine should handle both known and
unknown classes in test time. General open-world classification has been
studied mostly using image classification; however, existing methods typically
assume the availability of sufficient known-class supervision and strong
unknown-class prior knowledge (e.g., the number and/or data distribution). We
propose a novel framework WOT-Class that lifts those strong assumptions.
Specifically, it follows an iterative process of (a) clustering text to new
classes, (b) mining and ranking indicative words for each class, and (c)
merging redundant classes by using the overlapped indicative words as a bridge.
Extensive experiments on 7 popular text classification datasets demonstrate
that WOT-Class outperforms strong baselines consistently with a large margin,
attaining 23.33% greater average absolute macro-F1 over existing approaches
across all datasets. Such competent accuracy illuminates the practical
potential of further reducing human effort for text classification.
Related papers
- Lidar Panoptic Segmentation in an Open World [50.094491113541046]
Lidar Panoptics (LPS) is crucial for safe deployment of autonomous vehicles.
LPS aims to recognize and segment lidar points wr.t. a pre-defined vocabulary of semantic classes.
We propose a class-agnostic point clustering and over-segment the input cloud in a hierarchical fashion, followed by binary point segment classification.
arXiv Detail & Related papers (2024-09-22T00:10:20Z) - XAI-CLASS: Explanation-Enhanced Text Classification with Extremely Weak
Supervision [6.406111099707549]
XAI-CLASS is a novel explanation-enhanced weakly-supervised text classification method.
It incorporates word saliency prediction as an auxiliary task.
XAI-CLASS outperforms other weakly-supervised text classification methods significantly.
arXiv Detail & Related papers (2023-10-31T23:24:22Z) - MEGClass: Extremely Weakly Supervised Text Classification via
Mutually-Enhancing Text Granularities [33.567613041147844]
MEGClass is an extremely weakly-supervised text classification method.
It exploits Mutually-Enhancing Text Granularities.
It can select the most informative class-indicative documents.
arXiv Detail & Related papers (2023-04-04T17:26:11Z) - LIME: Weakly-Supervised Text Classification Without Seeds [1.2691047660244335]
In weakly-supervised text classification, only label names act as sources of supervision.
We present LIME, a framework for weakly-supervised text classification.
We find that combining weakly-supervised classification and textual entailment mitigates shortcomings of both.
arXiv Detail & Related papers (2022-10-13T04:28:28Z) - Open Long-Tailed Recognition in a Dynamic World [82.91025831618545]
Real world data often exhibits a long-tailed and open-ended (with unseen classes) distribution.
A practical recognition system must balance between majority (head) and minority (tail) classes, generalize across the distribution, and acknowledge novelty upon the instances of unseen classes (open classes)
We define Open Long-Tailed Recognition++ as learning from such naturally distributed data and optimizing for the classification accuracy over a balanced test set.
arXiv Detail & Related papers (2022-08-17T15:22:20Z) - How does the degree of novelty impacts semi-supervised representation
learning for novel class retrieval? [0.5672132510411463]
Supervised representation learning with deep networks tends to overfit the training classes.
We propose an original evaluation methodology that varies the degree of novelty of novel classes.
We find that a vanilla supervised representation falls short on the retrieval of novel classes even more so when the semantics gap is higher.
arXiv Detail & Related papers (2022-08-17T10:49:10Z) - Open-World Semi-Supervised Learning [66.90703597468377]
We introduce a new open-world semi-supervised learning setting in which the model is required to recognize previously seen classes.
We propose ORCA, an approach that learns to simultaneously classify and cluster the data.
We demonstrate that ORCA accurately discovers novel classes and assigns samples to previously seen classes on benchmark image classification datasets.
arXiv Detail & Related papers (2021-02-06T07:11:07Z) - Binary Classification from Multiple Unlabeled Datasets via Surrogate Set
Classification [94.55805516167369]
We propose a new approach for binary classification from m U-sets for $mge2$.
Our key idea is to consider an auxiliary classification task called surrogate set classification (SSC)
arXiv Detail & Related papers (2021-02-01T07:36:38Z) - No Subclass Left Behind: Fine-Grained Robustness in Coarse-Grained
Classification Problems [20.253644336965042]
In real-world classification tasks, each class often comprises multiple finer-grained "subclasses"
As the subclass labels are frequently unavailable, models trained using only the coarser-grained class labels often exhibit highly variable performance across different subclasses.
We propose GEORGE, a method to both measure and mitigate hidden stratification even when subclass labels are unknown.
arXiv Detail & Related papers (2020-11-25T18:50:32Z) - Learning and Evaluating Representations for Deep One-class
Classification [59.095144932794646]
We present a two-stage framework for deep one-class classification.
We first learn self-supervised representations from one-class data, and then build one-class classifiers on learned representations.
In experiments, we demonstrate state-of-the-art performance on visual domain one-class classification benchmarks.
arXiv Detail & Related papers (2020-11-04T23:33:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.