Coarse2Fine: Fine-grained Text Classification on Coarsely-grained
Annotated Data
- URL: http://arxiv.org/abs/2109.10856v1
- Date: Wed, 22 Sep 2021 17:29:01 GMT
- Title: Coarse2Fine: Fine-grained Text Classification on Coarsely-grained
Annotated Data
- Authors: Dheeraj Mekala, Varun Gangal, Jingbo Shang
- Abstract summary: We introduce a new problem called coarse-to-fine grained classification, which aims to perform fine-grained classification on coarsely annotated data.
Instead of asking for new fine-grained human annotations, we opt to leverage label surface names as the only human guidance.
Our framework uses the fine-tuned generative models to sample pseudo-training data for training the classifier, and bootstraps on real unlabeled data for model refinement.
- Score: 22.81068960545234
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Existing text classification methods mainly focus on a fixed label set,
whereas many real-world applications require extending to new fine-grained
classes as the number of samples per label increases. To accommodate such
requirements, we introduce a new problem called coarse-to-fine grained
classification, which aims to perform fine-grained classification on coarsely
annotated data. Instead of asking for new fine-grained human annotations, we
opt to leverage label surface names as the only human guidance and weave in
rich pre-trained generative language models into the iterative weak supervision
strategy. Specifically, we first propose a label-conditioned finetuning
formulation to attune these generators for our task. Furthermore, we devise a
regularization objective based on the coarse-fine label constraints derived
from our problem setting, giving us even further improvements over the prior
formulation. Our framework uses the fine-tuned generative models to sample
pseudo-training data for training the classifier, and bootstraps on real
unlabeled data for model refinement. Extensive experiments and case studies on
two real-world datasets demonstrate superior performance over SOTA zero-shot
classification baselines.
Related papers
- Scribbles for All: Benchmarking Scribble Supervised Segmentation Across Datasets [51.74296438621836]
We introduce Scribbles for All, a label and training data generation algorithm for semantic segmentation trained on scribble labels.
The main limitation of scribbles as source for weak supervision is the lack of challenging datasets for scribble segmentation.
Scribbles for All provides scribble labels for several popular segmentation datasets and provides an algorithm to automatically generate scribble labels for any dataset with dense annotations.
arXiv Detail & Related papers (2024-08-22T15:29:08Z) - Co-training for Low Resource Scientific Natural Language Inference [65.37685198688538]
We propose a novel co-training method that assigns weights based on the training dynamics of the classifiers to the distantly supervised labels.
By assigning importance weights instead of filtering out examples based on an arbitrary threshold on the predicted confidence, we maximize the usage of automatically labeled data.
The proposed method obtains an improvement of 1.5% in Macro F1 over the distant supervision baseline, and substantial improvements over several other strong SSL baselines.
arXiv Detail & Related papers (2024-06-20T18:35:47Z) - Liberating Seen Classes: Boosting Few-Shot and Zero-Shot Text Classification via Anchor Generation and Classification Reframing [38.84431954053434]
Few-shot and zero-shot text classification aim to recognize samples from novel classes with limited labeled samples or no labeled samples at all.
We propose a simple and effective strategy for few-shot and zero-shot text classification.
arXiv Detail & Related papers (2024-05-06T15:38:32Z) - Soft Curriculum for Learning Conditional GANs with Noisy-Labeled and
Uncurated Unlabeled Data [70.25049762295193]
We introduce a novel conditional image generation framework that accepts noisy-labeled and uncurated data during training.
We propose soft curriculum learning, which assigns instance-wise weights for adversarial training while assigning new labels for unlabeled data.
Our experiments show that our approach outperforms existing semi-supervised and label-noise robust methods in terms of both quantitative and qualitative performance.
arXiv Detail & Related papers (2023-07-17T08:31:59Z) - Label Semantic Aware Pre-training for Few-shot Text Classification [53.80908620663974]
We propose Label Semantic Aware Pre-training (LSAP) to improve the generalization and data efficiency of text classification systems.
LSAP incorporates label semantics into pre-trained generative models (T5 in our case) by performing secondary pre-training on labeled sentences from a variety of domains.
arXiv Detail & Related papers (2022-04-14T17:33:34Z) - Cluster & Tune: Boost Cold Start Performance in Text Classification [21.957605438780224]
In real-world scenarios, a text classification task often begins with a cold start, when labeled data is scarce.
We suggest a method to boost the performance of such models by adding an intermediate unsupervised classification task.
arXiv Detail & Related papers (2022-03-20T15:29:34Z) - Label Hallucination for Few-Shot Classification [40.43730385915566]
Few-shot classification requires adapting knowledge learned from a large annotated base dataset to recognize novel unseen classes.
We propose an alternative approach to both of these two popular strategies.
We show that our method outperforms the state-of-the-art on four well-established few-shot classification benchmarks.
arXiv Detail & Related papers (2021-12-06T20:18:41Z) - Bridging Non Co-occurrence with Unlabeled In-the-wild Data for
Incremental Object Detection [56.22467011292147]
Several incremental learning methods are proposed to mitigate catastrophic forgetting for object detection.
Despite the effectiveness, these methods require co-occurrence of the unlabeled base classes in the training data of the novel classes.
We propose the use of unlabeled in-the-wild data to bridge the non-occurrence caused by the missing base classes during the training of additional novel classes.
arXiv Detail & Related papers (2021-10-28T10:57:25Z) - Towards Cross-Granularity Few-Shot Learning: Coarse-to-Fine
Pseudo-Labeling with Visual-Semantic Meta-Embedding [13.063136901934865]
Few-shot learning aims at rapidly adapting to novel categories with only a handful of samples at test time.
In this paper, we advance the few-shot classification paradigm towards a more challenging scenario, i.e., cross-granularity few-shot classification.
We approximate the fine-grained data distribution by greedy clustering of each coarse-class into pseudo-fine-classes according to the similarity of image embeddings.
arXiv Detail & Related papers (2020-07-11T03:44:21Z) - Automatically Discovering and Learning New Visual Categories with
Ranking Statistics [145.89790963544314]
We tackle the problem of discovering novel classes in an image collection given labelled examples of other classes.
We learn a general-purpose clustering model and use the latter to identify the new classes in the unlabelled data.
We evaluate our approach on standard classification benchmarks and outperform current methods for novel category discovery by a significant margin.
arXiv Detail & Related papers (2020-02-13T18:53:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.