Company classification using zero-shot learning
- URL: http://arxiv.org/abs/2305.01028v2
- Date: Thu, 26 Oct 2023 20:19:37 GMT
- Title: Company classification using zero-shot learning
- Authors: Maryan Rizinski, Andrej Jankov, Vignesh Sankaradas, Eugene Pinsky,
Igor Miskovski, Dimitar Trajanov
- Abstract summary: We propose an approach for company classification using NLP and zero-shot learning.
We evaluate our approach on a dataset obtained through the Wharton Research Data Services (WRDS)
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In recent years, natural language processing (NLP) has become increasingly
important in a variety of business applications, including sentiment analysis,
text classification, and named entity recognition. In this paper, we propose an
approach for company classification using NLP and zero-shot learning. Our
method utilizes pre-trained transformer models to extract features from company
descriptions, and then applies zero-shot learning to classify companies into
relevant categories without the need for specific training data for each
category. We evaluate our approach on a dataset obtained through the Wharton
Research Data Services (WRDS), which comprises textual descriptions of publicly
traded companies. We demonstrate that the approach can streamline the process
of company classification, thereby reducing the time and resources required in
traditional approaches such as the Global Industry Classification Standard
(GICS). The results show that this method has potential for automation of
company classification, making it a promising avenue for future research in
this area.
Related papers
- Annotation Guidelines-Based Knowledge Augmentation: Towards Enhancing Large Language Models for Educational Text Classification [11.69740323250258]
We propose the Guidelines-based Knowledge Augmentation (AGKA) approach to improve Large Language Models (LLMs)
AGKA employs GPT 4.0 to retrieve label definition knowledge from annotation guidelines, and then applies the random under-sampler to select a few typical examples.
The study results demonstrate that AGKA can enhance non-fine-tuned LLMs, particularly GPT 4.0 and Llama 3 70B.
arXiv Detail & Related papers (2024-06-03T03:09:01Z) - Transductive Learning for Textual Few-Shot Classification in API-based
Embedding Models [46.79078308022975]
Few-shot classification involves training a model to perform a new classification task with a handful of labeled data.
We introduce a scenario where the embedding of a pre-trained model is served through a gated API with compute-cost and data-privacy constraints.
We propose a transductive inference, a learning paradigm that has been overlooked by the NLP community.
arXiv Detail & Related papers (2023-10-21T12:47:10Z) - Classifying Organizations for Food System Ontologies using Natural
Language Processing [9.462188694526134]
We have created NLP models that can automatically classify organizations associated with environmental issues.
As input, the NLP models are provided with text snippets retrieved by the Google search engine for each organization.
We believe NLP models represent a promising approach for harvesting information to populate knowledge graphs.
arXiv Detail & Related papers (2023-09-19T19:07:48Z) - Bias and Fairness in Large Language Models: A Survey [73.87651986156006]
We present a comprehensive survey of bias evaluation and mitigation techniques for large language models (LLMs)
We first consolidate, formalize, and expand notions of social bias and fairness in natural language processing.
We then unify the literature by proposing three intuitive, two for bias evaluation, and one for mitigation.
arXiv Detail & Related papers (2023-09-02T00:32:55Z) - Exploring Open-Vocabulary Semantic Segmentation without Human Labels [76.15862573035565]
We present ZeroSeg, a novel method that leverages the existing pretrained vision-language model (VL) to train semantic segmentation models.
ZeroSeg overcomes this by distilling the visual concepts learned by VL models into a set of segment tokens, each summarizing a localized region of the target image.
Our approach achieves state-of-the-art performance when compared to other zero-shot segmentation methods under the same training data.
arXiv Detail & Related papers (2023-06-01T08:47:06Z) - Open World Classification with Adaptive Negative Samples [89.2422451410507]
Open world classification is a task in natural language processing with key practical relevance and impact.
We propose an approach based on underlineadaptive underlinesamples (ANS) designed to generate effective synthetic open category samples in the training stage.
ANS achieves significant improvements over state-of-the-art methods.
arXiv Detail & Related papers (2023-03-09T21:12:46Z) - Guiding Generative Language Models for Data Augmentation in Few-Shot
Text Classification [59.698811329287174]
We leverage GPT-2 for generating artificial training instances in order to improve classification performance.
Our results show that fine-tuning GPT-2 in a handful of label instances leads to consistent classification improvements.
arXiv Detail & Related papers (2021-11-17T12:10:03Z) - Few-Shot Named Entity Recognition: A Comprehensive Study [92.40991050806544]
We investigate three schemes to improve the model generalization ability for few-shot settings.
We perform empirical comparisons on 10 public NER datasets with various proportions of labeled data.
We create new state-of-the-art results on both few-shot and training-free settings.
arXiv Detail & Related papers (2020-12-29T23:43:16Z) - Transfer Learning for Information Extraction with Limited Data [2.201264358342234]
This paper presents a practical approach to fine-grained information extraction.
We first exploit BERT to deal with the limitation of training data in real scenarios.
We then stack BERT with Convolutional Neural Networks to learn hidden representation for classification.
arXiv Detail & Related papers (2020-03-06T08:08:20Z) - Open-set learning with augmented categories by exploiting unlabelled
data [1.2691047660244337]
This research is the first to generalise between observed-novel and unobserved-novel categories within a new learning policy called open-set learning with augmented category.
We introduce Open-LACU as a unified policy of positive and unlabelled learning, semi-supervised learning and open-set recognition.
The proposed Open-LACU achieves state-of-the-art and first-of-its-kind results.
arXiv Detail & Related papers (2020-02-04T15:32:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.