Imbalanced Multi-label Classification for Business-related Text with
Moderately Large Label Spaces
- URL: http://arxiv.org/abs/2306.07046v1
- Date: Mon, 12 Jun 2023 11:51:50 GMT
- Title: Imbalanced Multi-label Classification for Business-related Text with
Moderately Large Label Spaces
- Authors: Muhammad Arslan and Christophe Cruz
- Abstract summary: We evaluated four different methods for multi label text classification using a specific imbalanced business dataset.
Fine tuned BERT outperforms the other three methods by a significant margin, achieving high values of accuracy.
These findings highlight the effectiveness of fine tuned BERT for multi label text classification tasks, and suggest that it may be a useful tool for businesses.
- Score: 0.30458514384586394
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this study, we compared the performance of four different methods for
multi label text classification using a specific imbalanced business dataset.
The four methods we evaluated were fine tuned BERT, Binary Relevance,
Classifier Chains, and Label Powerset. The results show that fine tuned BERT
outperforms the other three methods by a significant margin, achieving high
values of accuracy, F1 Score, Precision, and Recall. Binary Relevance also
performs well on this dataset, while Classifier Chains and Label Powerset
demonstrate relatively poor performance. These findings highlight the
effectiveness of fine tuned BERT for multi label text classification tasks, and
suggest that it may be a useful tool for businesses seeking to analyze complex
and multifaceted texts.
Related papers
- Drawing the Same Bounding Box Twice? Coping Noisy Annotations in Object
Detection with Repeated Labels [6.872072177648135]
We propose a novel localization algorithm that adapts well-established ground truth estimation methods.
Our algorithm also shows superior performance during training on the TexBiG dataset.
arXiv Detail & Related papers (2023-09-18T13:08:44Z) - Exploring Structured Semantic Prior for Multi Label Recognition with
Incomplete Labels [60.675714333081466]
Multi-label recognition (MLR) with incomplete labels is very challenging.
Recent works strive to explore the image-to-label correspondence in the vision-language model, ie, CLIP, to compensate for insufficient annotations.
We advocate remedying the deficiency of label supervision for the MLR with incomplete labels by deriving a structured semantic prior.
arXiv Detail & Related papers (2023-03-23T12:39:20Z) - Binary Classification with Positive Labeling Sources [71.37692084951355]
We propose WEAPO, a simple yet competitive WS method for producing training labels without negative labeling sources.
We show WEAPO achieves the highest averaged performance on 10 benchmark datasets.
arXiv Detail & Related papers (2022-08-02T19:32:08Z) - Trustable Co-label Learning from Multiple Noisy Annotators [68.59187658490804]
Supervised deep learning depends on massive accurately annotated examples.
A typical alternative is learning from multiple noisy annotators.
This paper proposes a data-efficient approach, called emphTrustable Co-label Learning (TCL)
arXiv Detail & Related papers (2022-03-08T16:57:00Z) - Enhancing Label Correlation Feedback in Multi-Label Text Classification
via Multi-Task Learning [6.1538971100140145]
We introduce a novel approach with multi-task learning to enhance label correlation feedback.
We propose two auxiliary label co-occurrence prediction tasks to enhance label correlation learning.
arXiv Detail & Related papers (2021-06-06T12:26:14Z) - Multitask Learning for Class-Imbalanced Discourse Classification [74.41900374452472]
We show that a multitask approach can improve 7% Micro F1-score upon current state-of-the-art benchmarks.
We also offer a comparative review of additional techniques proposed to address resource-poor problems in NLP.
arXiv Detail & Related papers (2021-01-02T07:13:41Z) - Unsupervised Label Refinement Improves Dataless Text Classification [48.031421660674745]
Dataless text classification is capable of classifying documents into previously unseen labels by assigning a score to any document paired with a label description.
While promising, it crucially relies on accurate descriptions of the label set for each downstream task.
This reliance causes dataless classifiers to be highly sensitive to the choice of label descriptions and hinders the broader application of dataless classification in practice.
arXiv Detail & Related papers (2020-12-08T03:37:50Z) - Layer-wise Guided Training for BERT: Learning Incrementally Refined
Document Representations [11.46458298316499]
We propose a novel approach to fine-tune BERT in a structured manner.
Specifically, we focus on Large Scale Multilabel Text Classification (LMTC)
Our approach guides specific BERT layers to predict labels from specific hierarchy levels.
arXiv Detail & Related papers (2020-10-12T14:56:22Z) - Interaction Matching for Long-Tail Multi-Label Classification [57.262792333593644]
We present an elegant and effective approach for addressing limitations in existing multi-label classification models.
By performing soft n-gram interaction matching, we match labels with natural language descriptions.
arXiv Detail & Related papers (2020-05-18T15:27:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.